Mapping Tip
Remember to always get the sources of the start_dates, but this can at least help you know which settlements all have a start_date available. Primarily useful for quickly mapping settlements in unmapped countries like South America and Africa.
Use the Wikidata Query Service to retrieve a list of settlements with inception dates and coordinates from any country. Then download the list as a GeoJson and import it into your favourite editor.
You should of course review the inception dates to see if they are correct, because wikidata might not always be correct.
As a bonus you could create chronology relations for each settlement and link them to Wikidata, that way when you query Wikidata you could filter out the items that have been mapped already.
Happy mapping everyone!
I’ve uploaded about 200 settlements already in Central America. If no one has any objections to how I do it, then I’ll do Mexico and South America yet.
I appreciate the general approach. In fact, I hope our main data consumers will soon be able to leverage Wikidata in a more automated fashion for secondary attributes about a feature (besides the primary feature type and date tags), to minimize the need for complex duplicate features in OHM.
As you mentioned, Wikidata can have data quality issues too. In the U.S. and some other countries, Wikidata is notorious for conflating a human settlement and its incorporated government, which leads to inaccuracies like very late start dates or end dates for places that still exist. A couple years ago, @jeffmeyer tried importing places across parts of the U.S. from Wikidata but had to roll back some of it. I could see this being a particular problem in Mexico, where INEGI tracks municipality histories, which were probably imported to Wikipedia in some form, but doesn’t track settlement histories. If you know of any problematic sources, you could filter the SPARQL query by reference.
Another thing to pay attention to is conflating with existing places in OHM. For example, Peru already has extensive place coverage. Manual review will go a long way toward avoiding duplication.
1 Like
Luckily I had mapped most of the departments in those countries so I kind of knew which ones were actually municipal dates. And yes, I review each city before I upload.
1 Like
I added a column that retrieves the instance_of value and then I filtered those out in Colombia that had municipality in their field, from 767 results it went down to 60.
Update: I’ve imported 900 settlements from wikidata in Uruguay, Colombia, Panama, Costa Rica, Honduras, Nicaragua, El Salvador, Guatemala and Mexico. 512 of those were from Mexico.
I’m now using this query, it exports only the needed tags and has pagination to handle larger results. Just increment the offset by 300 if you’re fetching more than 300.
SELECT ?settlement ?name ?start_date ?end_date ?coordinates ?source ?wikidata ?place
WHERE {
?settlement wdt:P31/wdt:P279* wd:Q486972 .
?settlement wdt:P17 wd:Q241 . # set the country id here
?settlement wdt:P571 ?raw_start_date .
OPTIONAL { ?settlement wdt:P576 ?raw_end_date . }
OPTIONAL { ?settlement wdt:P625 ?coordinates . }
BIND(?settlement AS ?source)
BIND(STRAFTER(STR(?settlement), "http://www.wikidata.org/entity/") AS ?wikidata)
BIND("city" AS ?place)
BIND(
IF(BOUND(?raw_start_date),
CONCAT(
STR(YEAR(?raw_start_date)),
"-",
IF(MONTH(?raw_start_date) < 10, CONCAT("0", STR(MONTH(?raw_start_date))), STR(MONTH(?raw_start_date))),
"-",
IF(DAY(?raw_start_date) < 10, CONCAT("0", STR(DAY(?raw_start_date))), STR(DAY(?raw_start_date)))
),
""
) AS ?start_date
)
BIND(
IF(BOUND(?raw_end_date),
CONCAT(
STR(YEAR(?raw_end_date)),
"-",
IF(MONTH(?raw_end_date) < 10, CONCAT("0", STR(MONTH(?raw_end_date))), STR(MONTH(?raw_end_date))),
"-",
IF(DAY(?raw_end_date) < 10, CONCAT("0", STR(DAY(?raw_end_date))), STR(DAY(?raw_end_date)))
),
""
) AS ?end_date
)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?settlement rdfs:label ?name .
}
}
ORDER BY ?settlement
LIMIT 300
OFFSET 0 # Increment by 300: 0, 300, 600, ... If the results are more than 300