USA - GNIS Import

The Geographic Names Information System (GNIS) of the United States Geological Survey (USGS) contains many coordinates of former mines, schools, and other cultural features such as cemeteries and populated places.

Imports to OpenStreetMap of this US public domain dataset has proven that many of the features are obsolete, non-current historic features such as closed schools, abandoned mines, ghost towns, etc. However these historic features may be of interest to OpenHistoricalMap users and help flesh out the database and help provide references for further research, georeferencing historic maps, etc.

Again- the difficulty experienced with the OSM import of the GNIS was scrubbing obsolete features or otherwise tagging as historic.

I propose a limited feature class-by-class import, pre-scrubbed against current features which are otherwise already well-mapped in OSM.

Example-

GNIS Schools- scrub against the Homeland Infrastructure Foundation-Level Data (HIFLD) of public and private schools, colleges and universities. Add arbitrary start date of 1900-01-01, end-date 1960-01-01. Start with one state (ex. Michigan) and proceed from there after QA/QC verification

1 Like

I definitely agree with breaking down the problem into chunks by geography and class. Maybe also by the source that GNIS used, since a limited number of sources account for the vast majority of GNIS records.

I think we should try to avoid tagging completely arbitrary start_date=* and end_date=* values if we can. An arbitrary date that masquerades as a certain date is worse than no date at all, because there’s no good way to detect it. Obviously we won’t be able to ascertain a precise start and end date of every feature as part of the import process, but we can still narrow it down a little further and explicitly tag the uncertainty.

Any date that isn’t completely certain should be paired with EDTF tags. This school was added to GNIS on July 11, 1979, so we can set start_date=1979-07-11 start_date:edtf=../1979-07-11 end_date=1979-07-11 end_date:edtf=1979-07-11/... If a mapper confirms that the school still exists at that location, they can simply delete the end_date=* and end_date:edtf=* tags. Alternatively, we could tag only β€œ(historic)” features with end dates and rely on mappers to spot and retag other features that are no longer around. (As it happens, the actual school opened in 1914, moved to its final location in 1923, and closed in 1962.)

1 Like

For reference, here are some statistics I posted in Slack last year:


Most frequent citations for official names in GNIS (as of the August 2021 archive):

  • 733,534 β€” U.S. Geological Survey. Geographic Names Phase I data compilation (1976-1981). 31-Dec-1981. Primarily from U.S. Geological Survey 1:24,000-scale topographic maps (or 1:25K, Puerto Rico 1:20K) and from U.S. Board on Geographic Names files. In some instances, from 1:62,500 scale or 1:250,000 scale maps.
  • 34,422 β€” U.S. Geological Survey. Geographic Names Post Phase I Map Revisions. Various editions. 01-Jan-2000.
  • 19,753 β€” US Geological Survey Water Resources Division, Groundwater Site Inventory Data Base. 9505
  • 15,999 β€” U.S. Army Corps of Engineers. Dams and Reservoirs List, Washington, DC. 31-Dec-1981. A listing of impounded bodies of water and associated information.
  • 10,102 β€” Census County/Townships, CDP’s and incorporated cities - Bureau of Census, Geography Division, coordinates are located at the centroid. If known, the year the data were compiled follows:

Generated using csvkit:

csvgrep -d '|' -c 'FEATURE_NAME_OFFICIAL' -m 'Y' AllNames_20210825.txt | csvstat --freq -c 'CITATION'

Most frequently cited authors:

  • 1,184,030 β€” U.S. Geological Survey
  • 80,957 β€” U.S. Army Corps of Engineers
  • 59,859 β€” Bureau of Census, Geography Division
  • 26,495 β€” TechniGraphics, Inc.
  • 25,085 β€” Internet site or email other than USGenWeb
  • 24,663 β€” U.S. Board on Geographic Names
  • 23,349 β€” US Geological Survey Water Resources Division
  • 20,520 β€” State Department of Transportation
  • 19,762 β€” Sanborn Fire Insurance Maps
  • 19,446 β€” Telephone Directory
  • 18,463 β€” USGS Mapping Center Field Reports
  • 14,270 β€” Federal Aviation Administration
  • 12,624 β€” U.S. Bureau of Mines Mineral Industry Locator System
  • 11,468 β€” directoriesUSA
  • 11,256 β€” U.S. Department of Agriculture, U.S. Forest Service
  • 10,742 β€” Rennick, Robert M.
  • 10,673 β€” American Business Directories
  • 10,627 β€” Ramsay Place-Name Card Collection
grep -oE '^.FEATURE_ID.*CITATION|[0-9]+\|[^|]+\|Y?\|(U\.S\. [^,]+, )*(U\.S\.)?.+?\. ' AllNames_20210825.txt | csvgrep -d '|' -c 'FEATURE_NAME_OFFICIAL' -m 'Y' | csvstat --freq -c 'CITATION' --freq-count=20

Most frequently cited websites:

grep -oE '^.FEATURE_ID.*CITATION|[0-9]+\|[^|]+\|Y?\|(U\.S\. [^,]+, )*(U\.S\.)?.+?\. ' AllNames_20210825.txt | csvgrep -d '|' -c 'FEATURE_NAME_OFFICIAL' -m 'Y' | csvgrep -c 'CITATION' -m 'www' | csvstat --freq -c 'CITATION' --freq-count=20
1 Like