NLP for Books on Mexico ~1500s AD

I’ve been listening to a digital copy of Historia verdadera de la conquista de la Nueva España Historia verdadera de la conquista de la Nueva España. Tomo I / Bernal Díaz del Castillo; introducción y notas por Joaquín Ramírez Cabañas | Biblioteca Virtual Miguel de Cervantes about the Spanish conquest of Mexico.

There are many references to dates, places visited, settled and journeys taken, including why and how the places were named by the adventurers. I wanted to get the full picture of where these people were and when, which is accessible in bits and pieces out there, but I wanted it conveniently located as I’m going through the book.

This question took me down the path of training a natural language processor, spaCy using the chapter name text with manually added tags for people (“PERSON”) and places (“GPE”), so I could get a cleaner list of people and places, that would be more accurate than the vanlla Spanish model output. I used this guy’s help: NER-Training-Spacy-3.0/NER Training Data Annotation.ipynb at main · dreji18/NER-Training-Spacy-3.0 · GitHub for annotating and formatting a file to train spacy.

I then worked yesterday to create a folium-exported map of the locations found per chapter. This was minimally valuable, but is missing the context from the book; next step!

Going back to the false positives of placenames from the output, I think it might be worth running a test for similarity of names as written to current placenames, or just doing the work from scratch for good measure. From there, it’s easy to add markers about what the significance of that place is.

As the placename spellings are different than modern day spellings or perhaps different than the common uses, it takes some manual effort to verify and geocode these places effectively, to map a route or an event.

To read the digital book text and see the visualization of the journey at once would be neat. I’m just not there yet, still working in Jupyter Notebooks to test this concept.

spaCy: https://course.spacy.io/en

This is a bit of a departure from the work of adding content to OHM, but I thought I would share the background of what I’m stuck in, in case it could be interesting for anyone looking to add certain places or events mentioned and confirmed from the existing literature.

3 Likes

@alexhisto - this is very cool - I often highlight placenames when I read and would love to have a way to extract placenames from any text. Having that might enable the creation of an atlas complementary to each book, or to confirm that these placenames are in OHM, etc.

Have you looked at the World Historical Gazetteer as a tool to help reconcile different spellings?

1 Like