Importing every building in the Netherlands (BAG)

Yes, you read that right. I think it might be possible to import every currently existing building (and maybe even some former ones) in the Netherlands, including start_date. I will need some help with that though, as I’ve never done an import before, let alone one of such a scale.

First, as this is my first post on the forum, let me shortly introduce myself. In the past 8 months, I’ve been adding things on OHM all around the world, but mostly in the Netherlands. My main focus has been my hometown of Zwijndrecht. I’ve read a lot about its history, which I knew very little about before. It feels like, as a result, I appreciate my town much more than ever before. Now, when I cycle through certain neighbourhoods or past certain buildings, I know all kinds of stories about them.
A nice side project has been the fortified city of Willemstad, which I visited for the first time in the process of mapping it. It was a peculiar experience to walk around the city, which felt both new and familiar at the same time because of the research I had done on it already.

When adding buildings, I’ve often relied on the BAG Viewer for the start_date. This database, maintained by the Dutch government, includes the year of construction of every currently standing building in the Netherlands.

Recently, I found out the BAG database is also available as a downloadable public domain dataset on nationaalgeoregister.nl. This means it could be imported to OHM, as it also uses a public domain licence. I think this would be a huge improvement to mapping the history of the Netherlands. It would provide a base layer of sorts, on top of which historic things can be added. It also means mappers in the Netherlands can devote a lot of the time spent on mapping buildings to other things, such as highways and landcover.

It appears that even some no longer existing buildings are in the BAG database. I found out about this when I revisited buildings I mapped some time ago which have since been removed. See, for example, this one on BAG Viewer (on OHM here). I have not found a way to make these buildings show up on the map of the BAG Viewer though. And they also don’t appear to be included in the dataset on nationaalgeoregister.nl. Maybe we could contact the people at BAG about this.

I also think it could be beneficial to contact the Dutch OpenStreetMap community. They imported the BAG in 2013 and still use it to update the map with new buildings. Maybe we could make use of the plugin they’ve created for this. (See this wiki page for more information. Only in Dutch, so you might need a translation service to read it.)

But first I wanted to let the OHM community know about this. Maybe users with more experience with imports have some advice. I’m very excited about this idea, but I also have a lot of questions about how to approach such an import. Practical issues, like what file format to use and how to import such a large dataset without crashing JOSM. But I also wonder about other things, such as what we would need to do with existing buildings that are also in the dataset: delete them or connect their nodes to the buildings from BAG, so edit history isn’t lost? Another thing we’d need to decide is how to tag the imported buildings.

Anyway, I’d like to hear what you all think about this! I really think the import could be an enormous improvement to the map and make mapping in the Netherlands a lot easier.

3 Likes

First of all, welcome to OHM, and thanks for your efforts in Zwijndrecht! BAG sounds like an ideal source for an import. Substantial building coverage would provide a solid foundation for mappers to focus on points of interest and streets and also work backwards in time before this dataset’s vintage. Usually it’s a stroke of good luck if a municipal building dataset contains even partial coverage of start dates, let alone a whole country!

Yes, ideally we’d coordinate efforts between the two projects. Even though OSM’s license is somewhat incompatible with OHM’s, any improvements we make in OHM based on this dataset would be quite suitable for import into OSM. OSM’s Dutch community will also be able to advise us on any major known issues with the dataset that they’ve had to account for over the years.

We tend not to be as strict as OSM about import procedures, but you’ll still need to familiarize yourself with the tooling. This could be a good learning experience for you and hopefully a case study for the rest of us. The import documentation mentions some Python scripts that haven’t been changed for many years. I don’t know if they’re straightforward to run in a more modern environment. Maybe the mappers who incrementally update OSM are using different tools these days?

Typically, a building import splits up the overall dataset into more manageable chunks, each one in a different .osm file. For example, you could break it down by municipality or postal code. I assume this is one of the things the existing scripts do. If there are a lot of these chunks, we can give you access to create a project on the tasking manager to keep track of your progress or coordinate with other mappers.

JOSM has a Replace Geometry function that’s ideal for preserving as much history as possible when updating a feature to an external source.

Kadaster has extensive documentation about BAG’s historical data model, so it’s something they’re tracking deliberately. Maybe historical features aren’t included in the NGR dataset because it isn’t required by the INSPIRE program. Still the fact that they publish this subset of the data in the public domain suggests potential openness to sharing the full data. Would you be willing to inquire about it? I or the other OHM advisors would be more than happy to support you, but it would be great to have someone domestic lead the conversation.

Thanks for the helpful comments!

I will post a message on the Dutch part of the OSM forum soon, asking them for advice.

Using the tasking manager to keep track of the progress sounds very useful. Whether we split by municipality or postal code, there will certainly be a lot of chunks. The Replace Geometery also sounds like a very useful tool, I hadn’t heard about it before.

Thank you for the link to the documentation on BAG’s historical data model. I’ll read the document and contact the Kadaster after that. I might send you a direct message if I need help with the technical details.

1 Like

Update: it turns out the complete BAG dataset, including historical data, is also publicly available to download.
Dutch OSM user Sander_H has been very helpful by generating GeoJSON files for every placename in the Netherlands using this dataset. They can be found here. These files include both currently existing and removed buildings. For removed ones, the date on which they were marked as demolished in the BAG is set as end_date.

We could start importing using these GeoJSON files. However, they are not completely without flaws:

  1. There are some demolished buildings that have an unknown year of construction. These have all been tagged with start_date=9999. I’m not sure what’s best to do with these: not import them, as they can’t be correctly dated, or importing them with an additional fixme tag? In other cases the year of construction seems to be an approximation. Other sources sometimes have a more specific year (e.g. 1853 instead of 1850).
  2. There are also some buildings that overlap each other. See this screenshot for an example: (the unselected building has existed from 1882 till today)

    This is probably the result of the building on the right being expanded at a certain time, such that it now contained the building on the left, which was thus marked as demolished. (Satellite imagery for comparison.)

It turns out things can get quite tricky with importing historical data :grimacing:
So, what would be the best approach right now? I personally think first just uploading all currently existing buildings might end up to be the most useful thing. After that we could add the historical data from BAG, making sure to manually check for things like the two problems mentioned above. But let me know if there are other ideas! (Maybe based on experience with previous similar imports.)

(By the way, I’ve also been thinking about how to approach the source tagging of this import. There will certainly be situations were we want to change one tag of an imported building while keeping the others at what is in BAG. An example of this could be when we find the correct start_date of a building that was originally set to 9999. Geometry and end_date would then still be sourced by BAG. So splitting into start_date:source, end_date:source and geometry:source would be useful, I think. But we can discuss that more extensively later, for now the things I mentioned above are most important.)

Wonderful, I’m glad the local community has been able to help you put this together!

If you have a last-modified timestamp, you can set an EDTF date indicating some uncertainty. For example, if the building was constructed in 1900 and the record was last modified in 2000, then tag start_date=1900 end_date=2000 end_date:edtf=1900/2000. This assumes there wouldn’t be much reason for BAG to modify the record after noting the building’s demolition. Otherwise, you could choose something more conservative as the end_date=*, such as any created timestamp.

This is very common among even dedicated historic building registers. Sometimes you encounter more detailed documentation from a preservationist who explains that the estimate is based on a particular lintel design that the architect preferred during a certain period of their career or that peaked in popularity on a certain year. Or the building is associated with someone who was known to live there by a certain year.

If the dataset explicitly says circa 1850, then you could tag start_date=1850 start_date:edtf=1850~. Otherwise, just tag start_date=1850 along with start_date:source=*, and then you can come back later and manually replace it with start_date=1853 and a different start_date:source=*.

Ideally, if a building is expanded, then we’d tag an end date on the older building and add them to a common chronology relation. But if the older building was actually demolished and the new one took it over, then we only need to tag an end date on the older building.

Do you know if expansion or demolition is usually the reason for an overlap, or if there could be other reasons? The JOSM validator (and other tools) can automatically detect overlapping buildings, so we can deal with them either before or after the import. But if ref:bag=* or some other field associates the two buildings with each other, then it would be nice to fix the overlap beforehand, so that other mappers don’t get confused when they start building upon this coverage.

In OSM, building imports are often divided into two phases, saving the buildings that overlap existing OSM data for a later, slower phase. This avoids upsetting people whose existing work would be affected. We could take the same approach here and add a third for historical buildings, but so far what you’ve described doesn’t sound so bad compared to other datasets I’ve seen, so it isn’t totally unreasonable to include historical buildings in the first phase. For example, if the record timestamps are a too coarse to serve as good end dates, then you could import the demolished buildings first, then subsequently query for end_date:edtf=* end_date:source=BAG to find buildings to refine more manually.

In other words, we have options. I guess it depends how much work you want to do upfront versus doing later, potentially with the help of others who are motivated by the initial import. In past imports, I’ve found that having a detailed import documentation page is helpful, but even better is a community project page that lets people sign up to help with specific tasks and tracks the progress toward those tasks.