Importing the Newberry Library's Atlas of Historic County Boundaries

To be clear - I am not suggesting we import anything from OSM & specifically am stating that we will not import any county-related info from OSM. I only included it as a point of reference, in case anyone was hoping to use OSM as a standard for comparison.

1 Like

Just to put it out there to begin with, most of the county lines in OSM are objectively terrible… the source dataset was only intended for 1:100,000 scale mapping (large scale topo maps) and includes many random meanders when viewed at the scale we work on. I’ve made a start at using the historical references in this dataset, and the BLM PLSS corners, to fix county lines in Alabama, and they are often a good hundred feet off.

The real issue is, how “accurate” are the lines in this dataset, as opposed to how precise. They are clearly more precise than the TIGER lines… they show meanders that are missing from that data. If the actual accuracy is to a level better than what is added by the transformation, then it would seem worth doing the new upload… OTOH, if they are themselves offset by more than the 1-2 meters corrected by the transformation, then it would seem somewhat pointless.

It’s hard to really know the answer without comparing them to lines based on the PLSS corners (which are presumably the best data we can find). Comparing portions of the first county I updated to with PLSS (Autauga County, Alabama) it seems like the Newberry lines vary from “right on” to about 30 feet off… so, drastically better than the TIGER data. At the same time, it does seem like the general “direction” of the coordinate transforms (northwest) would bring the lines closer to the ones I drew from PLSS.

I think it really comes down to a judgement call… do you feel like it’s worth the effort?


NW corner of Autauga County, OHM in iD.

Same location, PLSS-based lines in JOSM.

It seems to me that the difference is probably close to the same magnitude as the effect of the coordinate transform.

1 Like

Plus, in most cases, changes to a boundary over time would be more interesting than the precise contour of that boundary at a point in time, at least until we’re able to build up OHM to an OSM-esque level of detail. @jeffmeyer, do you have a sense of the distance spanned by a typical county boundary change, or perhaps the average area gain/loss in a change that doesn’t involve splitting or merging counties?

I’ve gone though the history (from the sources in this) for three different Alabama Counties so far (Autauga, Baldwin, and Barbour).

In the first two cases, even the “reshuffling” changes were fairly dramatic, either from one township/range line to another, or (in the case of Baldwin) moving between river branches in the Mobile River delta… these were moves on the scale of miles.

For Barbour County the same thing was generally true (one move was ~40 square miles), although there several smaller changes “to accommodate a local landowner”.

That’s really not a good sample set, but I haven’t seen “nitpicky” changes that wouldn’t be apparent on even a large scale topo map. The smaller ones were:

  • “to include one hundred and seventy acres in section thirty-four, township thirteen, range twenty-nine, now part and parcel of the estate of Jesse Lee, of Barbour county, within the boundaries of Barbour county”
  • “to include sixty acres in section nine, and ninety acres in the southeast corner of section eight, in township thirteen, range twenty-eight, now part and parcel of the estate of Americus C. Mitchell, of Russell county, within the boundaries of Russell county”
  • to include “the south-east quarter of section eighteen, in township twelve, and range twenty-six, and sixteen acres on the south side of the northeast quarter of section eighteen, in township twelve, of range twenty-six, now part and parcel of the real estate of W. M. Russell of Barbour county, within the boundary of Barbour county”

These all moved the line (over some distance) about a half mile.

For comparison, my current record for how far a TIGER corner was just “wrong” was ~610 feet.

1 Like

For county equivalents, the centroid points have novel primary tags such as place=parish and place=non-county area. I think these values may be too granular. If these are all county-equivalents, then they should be tagged place=county. We can use some other tag to capture the locally understood terminology in the local language.

Since place=* is a key that renderers and geocoders use to make generalizations about places, it needs to be machine-readable, even at the expense of some semantic precision. None of the conventional place=* values in OSM completely aligns with the literal word in English in any jurisdiction. This is by design, so that data consumers don’t have to reinvent the wheel for every jurisdiction.

“County” has various meanings across geographies and time periods, but I think the various meanings are close enough from a data consumer’s standpoint that aligning with OSM’s definition won’t be much of a problem in practice. No software will assume a place=county is the domain of a count, nor will it assume that a place=county is headed by an elected county judge.

By contrast, place=parish is problematic because it risks confusion with parish=*, for religious institutions and boundaries. This import facilitates mapping of various denominations’ diocesan boundaries, which may enable us to map parish boundaries in some places. Civil parishes should be distinguished because it’s the more figurative application of the word “parish”, used in fewer parts of the world and only more recently.

I don’t think this issue is a blocker for the import, since I anticipate that the tile generator will soon generate these centroids for us automatically, allowing us to delete them in favor of the boundary relations’ tags. The boundary relations have the advantage that we can already differentiate between, say, administrative and religious_administration boundaries.

1 Like

ACK!

I forgot to move the place tags to import:county_type tags used on the relations. I’ll take care of that shortly… my JOSM is busy right now… :slight_smile:

I’ll save my thoughts and rants about place for a separate thread. (shakes fist at sky)

Please keep this feedback coming, as I’m sure there will be plenty of other tidying up to do!

Congratuations! It looks like all the boundary relations have finally been imported (and reimported after a mixup between start_date and start_event). What’s left to do for this import? It would be really nice to have those chronology relations in place so we can remove the parenthetical from each boundary relation’s name and link each county’s Wikidata item to an OHM relation that represents the county overall.

I wanted to come back to this point, in case it’s been a source of hesitation:

I’ve created many chronology relations about things for which Wikidata lacks an item, let alone an “evolution of” item. But every extant county and most former counties already have items, so we should try to link them up where possible. It’s perfectly fine to link a “Springfield County” item to a chronology relation without creating a “territorial evolution of Springfield County” item, but you can do that too if you want.

OpenRefine makes it very easy to add statements to Wikidata’s county items that link to the corresponding chronology relations. All you need to do is come up with a CSV of relation IDs, names, and states, and maybe some dates, and then you can use the point-and-click interface to match the relations to items in bulk. If you’ve already performed this conflation and just need to go in the other direction, OpenRefine supports uploading the edits directly to Wikidata.

Correctly tagging the chronology relation and Wikidata item will enable users to craft federated queries about questions that are otherwise difficult to answer. Some queries are possible with the current modeling of wikidata tags on individual unrelated boundary relations, but it requires some gymnastics that are more difficult and less performant.

Ha! Have no fear… there’s plenty of work left to do - I was briefly distracted with some prep for meetings this week, but here’s what I see as remaining:

  • create chronology relations for counties
  • fix labels
  • publish OHM county chronology relation IDs in Wikidata

At this point, the counties will be done & I hope to finish over the next 2 weeks.

After that, I still need to:

  • Repeat this process for the states boundaries, which should be expedited by lessons learned in the county boundary import.
  • Create country boundaries for the United States over time, chronology relation, etc.
  • Then, remove the old, very long & unbroken segments used in the original import.

And, then… there are a few more US admin-level 2 & 4 type things related to the Civil War, but we’ll see.

Oh, I was under the impression that the state boundaries were already fully imported. I was about ready to start redrawing some state and county boundaries along Lake Erie that the Newberry dataset got wrong. Should I hold off until you’re done with reimporting these boundaries?

The state boundary import was originally completed in a highly curated fashion, by hand, without the individual edge segments. So, there are now overlapping ways - the old state boundaries (long) and the newer county boundaries (shorter) that need to be used to rebuild the state boundaries. So… yes, I’d wait, but I don’t expect it will be too too long.

Will you be reusing the existing state boundary relations or blowing them away in favor of new ones?

Plan is to blow away unmodified old segments and then examine those that have been modified to see how many, etc.

That’s fine, but the relations will remain, correct? There are so many links to the relation IDs in OHM documentation, Wikidata, and elsewhere that I’m wary of deleting and replacing them, even if we technically reserve the right to.

That’s correct - the goal is just to move the relation members from the old to the new with caveats for changed ways, which will need review.

1 Like

This test chronology relation looks good, though the members aren’t sorted chronologically. Do you have a convenient method to sort the members in JOSM? I documented the requirement for sorted members prospectively, under the assumption that it would facilitate usage and make some queries more convenient, but we can relax it if it turns out to be impractical.

1 Like

I think I’ll be able to chronologically order them when they’re created. Programmatically. The Old testament confirm first.

Thanks for uploading the chronology relations. They look good in general, but I noticed that each chronology for an extant county has an end_date of 2020-12-31 (example). Was this an intentional choice, perhaps a flag that someone should hand-verify whether the county has experienced other changes since 2020?

I’ve started patching up counties in Ohio along Lake Erie. There has been one indefinite boundary due to a vague statute but, other than that, nothing too bad so far. :crossed_fingers: Cuyahoga County has a start_event mentioning a boundary change that took place in the lake but wasn’t mapped, only because it happened to occur on the same day as another overland boundary change. The Newberry dataset helpfully annotated the boundary with “[not mapped]”. It felt good to be able to remove this annotation.

Looking around the country, there are 14 more boundaries with the same “[not mapped]” annotation. This would be an interesting exercise for anyone interested in extending the state of knowledge about the country’s historical boundaries.

1 Like

My screwup - those are what the Newberry downloads used, but I meant to replace them with nothing before uploading. I’ll fix.

I think it’d be better to assume no changes and fix counties that have changed on a more ad-hoc basis than to use the date as a flag, as the number of counties with changes should be << than the total number of county chronos.

1 Like

Could you kindly leave NJ out of the reconciliation process? I’ve already done the reconciliation there in my efforts to do NJ’s municipalities, but as with the county project it’s still a work in progress

2 Likes