Inaccurate countries in search results

I was confused for a moment when my search for “Sevilla” returned a result labeled “Sevilla, Roman Republic (BCE146-BCE121)” that the site thought has existed since 1248 CE. The BCE dates were nowhere to be found on the place=city node.

It turns out the extra date range comes from a Roman Republic boundary relation that’s completely unrelated to modern-day Sevilla. Part of the problem is that Nominatim needs to keep a consistent time period when qualifying a place name. But part of it is also that the surrounding Roman Republic relation has its own date range in name=*, which no software would know to strip out. We need to stop putting date ranges in these names.

2 Likes

While I agree they should be removed in principle, in practice if you’re editing a boundary relation you’re almost certainly going to be doing it in JOSM, and managing multiple iterations of a boundary without them is a nightmare. (Once JOSM supports appending them (or there’s a plugin) I’m on board and will remove any I find but until then I see them as a necessary evil.)

I’d also say that the search thinking Sevilla is in the Roman Republic is a far bigger problem than the Roman Republic having the dates in its name – being confused is the correct response here regardless of the visible date range. The date range being visible just makes it more obvious that it isn’t doing what it’s supposed to. It also doesn’t seem to be limited to cities etc – it often shows country 1, country 2 (e.g. it shows Kingdom of Prussia (1795-1802), Poland-Lithuania (1637-1657), both of which are admin_level=2) which is pretty useless, since that is always going to be incorrect. (For whatever reason its go-to for most of central Europe is also a few iterations of Nazi Germany, which is more than a little problematic.) For now I would favour not showing the second part at all (at least for administrative boundaries, I’m not sure about settlements), as displaying Bohemia, German Reich [1557-1742] is probably worse than Bohemia (1557-1742), German Reich (1941-1945) [1557-1742] (at least with the latter you know its broken rather implying some kind of political claim).

Maybe these are known issues but there are other problems with Nominatim right now. (I know this isn’t really the place for this but I don’t know how to submit a bug report and don’t use GitHub.)

Some boundaries don’t seem to show up at all in the search for some reason. If you search for “Belgium” (or its other names) all that shows up is “Frankish Kingdom (486-496)”. If you search for “Sweden” all you get is “Kalmar Union (1397-1472)”. If you search for “Reichskreis” (Imperial Circle) most of them show up but not the “Österreichischer Reichskreis” (the Austrian Circle) or the “Niederrheinisch-Westfälischer Reichskreis” (the Lower-Rhenish-Westphalian Circle). Indeed, if you search for “Austria” or “Österreich” only “Austria (1920-1921)” shows up (so not only does it not show other iterations of the country “Austria”, it also doesn’t show for example “Upper Austria” or the “Duchy of Austria” or “Austria-Hungary”). “Germany”, similarly, only brings up “German Reich (1940-1941)”. (In fact most modern country names seem to do this, many with relations which don’t make sense, e.g. “France” → “Western Roman Empire (395-407)”.) For all those where only one iteration shows up clicking on that single result does not take you to the relation, but rather places a pointer on the map where it thinks the country is (I say “thinks” because “Western Roman Empire (395-407)” puts a pointer in the mid-Atlantic, also weird). I thought it might be that accented characters were somehow messing with it – Ö for the Austria(n Circle), ë for Belgium, ä for Westphalian – but it doesn’t seem to have an issue with other accented characters that I’ve tried – it finds Bohemia (“Čechy”) and the Saxon/Lower Saxon/Upper Saxon circles (“…S/sächsischer…”) fine – and Germany, Sweden etc don’t have any. All of them show up fine when using “query features”, however…

If you click on a boundary in the list the date will sometimes change, incorrectly. For example, if you query a boundary and choose an iteration of the German Reich, be that Nazi Germany, the Wiemar Republic or the German Empire, the date is set to 1871-04-16. I do not know where it is getting this date from – they are members of two chronologies which both have start_dates of 1871-01-18 (“Germany” and “German Reich”), and the first boundary relation (in both) has a start_date of 1871-05-04. (The first date is the proclamation of the German Emperor, the second is the first constitution coming into force.) The only places I can think of are the label node for “Deutsches Reich”, which does have a start_date of 1871-04-16 (this is the ratification date of the constitution but should probably be changed) or it is somehow pulling the date of the adoption of the first flag listed on Wikidata (unlikely). On the other hand not all relations do this – Poland-Lithuania boundaries for example leave the date alone (they also have a label node).

There is another issue though that I’ve just found which ties in to the previous ones: it is often simply wrong! For example, if you search for Glogau or Głogów (the same town in Silesia/Poland under its German and Polish names) it shows up as Town Glogau, Poland-Lithuania (1637-1657) [1253 – 1945] (or ... Głogów, Poland-Lith... [1946 – ]). Glogau/Głogów was never within Poland-Lithuania! It was part of Silesia which was within the HRE or Prussia for the entire life of the PLC. My only guess is there’s something weird going on with a chronology relation or something, so Nominatim sees “Poland-Lithuania (1637-1657)” as representing “Poland” and sees that the Glogau/Głogów nodes are within modern Poland or the medieval kingdom of Poland or something and somehow conflates them. Whatever the case, it is not working as intended. In this case it cannot even be tied to label nodes as the PLC’s is only used by PLC relations.

(If it is relevant both the “Deutsches Reich” (“German Reich”) and “Rzeczpospolita Obojga Narodów” (“Polish-Lithuanian Commonwealth”) chronology relations are tagged with place=country; “Germany” and “Polska” (“Poland (full history)”) are not.)

1 Like

Yes, the basic issue is that Nominatim doesn’t know anything about dates. It only knows that place X lies within country Y spatially, but it doesn’t know that X and Y have no temporal overlap. Moreover, Nominatim has a hard-coded performance optimization that spatially indexes every feature according to OpenStreetMap’s country boundaries and attempts to use OHM’s country boundaries for any details about those countries. This often fails, so that many results stand alone without belonging to any country.

Nominatim’s developer has indicated that fixing this behavior would require fundamentally changing how the geocoder works, so the suggestion is to disable “address” computation so that you can only search by name without qualifying the search by larger entities. This isn’t ideal either.

Indeed, the Nazi Germany boundary relation’s centroid lies just within OpenStreetMap’s Czechia boundary relation, a mere 20 kilometers from the present-day Czech–German–Polish tripoint. This causes Nominatim to equate it with Czechia. If you search for “Czechia”, you get Nazi Germany. If you search for something in present-day Czechia, the result will be associated with Nazi Germany as the country. Of course, other country boundaries also have a centroid within Czechia – such as Czechia – but I guess Nominatim is breaking the tie in favor of the largest such boundary by area.

I think this explains your example of the Polish–Lithuanian Commonwealth as well.

Needless to say, this is wrong and we need to fix it somehow. Disabling address computation will paper over the problem to some extent, but not if you’re searching for a country.

The Query Features function queries the Overpass API instead of Nominatim. The OverpassQL query isn’t limited to the date in the time slider. In a sense, this is consistent with the fact that it returns a feature even if the renderer doesn’t render it, but this functionality would be more useful if you had a choice of whether to limit the results to the selected date.

@Alphathon - thank you for taking the time to thoroughly describe what is definitely a big problem. We need to fix this and it may take a bit of time, but the good news (I guess!) is that this problem shows off how much data we are starting to accrue in the OHM database. But… if we cannot search well, that will be irrelevant. My guess right now is that this will be the top dev priority once we finish our localized labels efforts.