Please don't forget to use the source=* tag (Case study!)

I’ve been coming across a few objects without source=* tags & I wanted to make a gentle reminder that mapping without tagging source=* can create problems for other mappers.

This tagging of [pretty much] every historical object with a source=* tag is one key difference between OHM & OSM, so it can be easy to forget or overlook, especially for OSMers. And… it can seem to slow you down… I mean, really, another tag? Yes, please!

Here’s a small example of some of the conflict it can create:
Calvary Cemetery, San Francisco

This type=chronology relation has 3 members, with overlapping and conflicting dates. But… this way and this other way have no source, which is a bummer.

A third way, which is clearly much larger than the other 2 has some source tags:

[Note: source:#:tiles=* is interesting, because maybe the source is great, but the warping stinks, which could explain differences in map tracing.]

So, here’s the rub: we have 2 ways without sources, and 1 way with a source, but that source might be wrong. So, I’d like to deconflict, but it can’t be done without the sources for the first 2 ways.

And, in the course of looking at the wikipedia=* tags, I have learned there’s a different source from the same year as the source of the third way & of course, these are different. But! At least there’s a starting point.

2 Likes

Are there any good resources to look at for preferred syntax or contents? I have seen a mix of how people use source=* and that has led to confusion on my part. I would definitely like to consume some resources to improve this, and I’m sure this may be helpful for others learning OHM.

2 Likes

Ok - you’ve made me realize that some (maybe just me?) OHM mappers have been iterating through trial and error to some practices that make sense to them without validating or checking in with the community.

So, before updating the clearly out-of-date wiki page for these tags, how about we have a discussion here?

Taginfo for source=* shows a lot of variation that could be standardized.

Here’s my proposal, with plenty of room for improvement - let’s hash it out. : )

  1. The main thing is to source your data. Any source tagging, regardless of format is better than no source tagging. Ideally, all of our source tagging would be standardized, but there are many forms of citing sources in the non-OHM data world.

  2. The sourcing should travel with the object. So, each object needs its own source information. This may seem onerous, but a commitment to this principle helps ensure that all downloaded data, regardless of filtering, is either properly attributed or could be attributed.

  3. Values in the keyspace [?] describe the key and not vice versa. E.g., start_date:source=* is the source for the start_date and source:3:date=* is the date (YYYY-MM-DD, of course) of source:3.

  4. More sources are better than fewer sources. To accommodate multiple sources, source tags can (should?) be enumerated. I believe there’s a hand-wavy rule in academia about getting 3 different primary sources for validation, but I’m not sure where that comes from or if maps always count as a primary source. Regardless, it’s helpful for tracking where object characteristics are sourced. So, the base key is something like source:[#]=* Where there’s no number, that’s fine, as others / later users could add sources with numbers and not overwrite those keys.

  5. Linking to sources is the most important way for online mappers to validate another mappers sourcing. So, the base source:[#]=* should point to a URL. This will speed other’s ability to answer the question of “why is that mapped that way?” And, “which version of the source did the mapper use?” It will also help clarify who is hosting that resource in the case of identical sources from different libraries, which may be in different conditions. Whether these should be Internet Archive links, I don’t know. I’d prefer not, just for readability, but that may require users to click a subsequent link / visit IA if the primary link is dead.

  6. Human readable names are helpful. So, including source:[#]:name=* should be text and ideally be either the name of the source as labeled by the source or a name that sufficiently identifies the source and differentiates from other similar maps. e.g. “1869 SF map” isn’t enough to tell whether that is the 1869 US Coast Survey SF Peninsula map or the 1869 Britton and Rey City & County of SF map or an altogether different 1869 map of San Francisco.

  7. Source attribution needs a better practice. I’m not sure what the right answer is here, but it would be best if we had some sort of way to track the exact attribution a source requests. That said, it might lead to an unnecessary amount of baggage on every object (yes, beyond what we’re already adding). So maybe a URL link to the attribution statement? It would be amazing to be able to auto-generate a list of sources for any bbox as well as for the entire database with some degree of accuracy that doesn’t depend on manual efforts. Maybe source:[#]:attribution=[url]?

  8. Georectification can create different results with the exact same source. Including a link to warped tiles or another warped source (e.g. IIIF) can help explain apparent discrepancies with an identified source. This link can also be used to enable other mappers to pick up where someone has left off and to start tracing items the original mapper may have left out. Hence, source:[#]:tiles=[tms x/y/z url]. Or, similarly, source:[#]:wmts=[wmts url].

  9. Other subkeys may be valuable and could be used, but might be less important. Examples include source:[#]:license=*, source:[#]:date=*, etc. I’m not sure if source:[#]:license=* is redundant to just license=*, so better thought is needed here.

  10. Source metadata should be included where relevant. Examples include information about markings on a map source, like a map key. source:[3]:ref=a) Ye Olde Malt Shoppe. Ideally, this is metadata about the source that might be informative to an OHM mapper, but that couldn’t normally be described by the object’s geometry. Another example might be source:[3]:id=[unique source id], to help directly tie the object to its source. This could be useful when importing datasets, especially if we wanted to subsequently report OHM-based modifications to the object to the data provider.

  11. Source metadata that’s not historically relevant or helpful should be discarded. Examples include things like “last modified” or “SHAPELEN” or “SHAPEAREA” of “created…”.

To summarize:

Encouraged:

tag value
source:[#]=* [url]
source:[#]:name=* [text]
source:[#]:tiles=* [tms x/y/z url]
source:[#]:attribution=* [url or text, esp. for datasets]
source:[#]:id=* [id text, esp. for datasets]

Optional:
Whatever makes sense to the mapper.

I’m sure there are plenty of concepts, goals, or practices I’ve missed or misstated, so please help get us to a better place. : ) (and now, I’m off to go ‘fix’ a few things…)

2 Likes

Thanks for the detailed discussion kickoff @jeffmeyer ! This has certainly already helped myself immensely.

With this syntax would source:start_date = * be easily confused with start_date:source = * If Im following, the first would then mean “the start date of the source” and the second is the “the source of start_date”. Wording the tags like this makes a ton of sense, and is a helpful visualizer for the concept.

Would this make any sense?

source: = Short name or Title ex) “USGS”
source::name = Full name or extended title ex) “USGS, Elmoro,CO 1892”
source::url = the original link.

Maybe this is too verbose. It could be better to just ask the minimum to be source: = url, which I would be perfectly okay with.

Could source::type =* be a useful optional tag?

types could include;

  • map
  • photograph
  • newspaper
  • drawing?

Again, this could be too verbose.

This is the most helpful bit, and I’m perfectly okay if we ended up settling on this, but I’m also curious what other people think.

1 Like

Ah, cracking open OSM’s old religious wars, I see. :wink: In OSM, the source:* format seems to have won out pretty decisively. I don’t know all the arguments on either side, but OHM has some considerations that probably never came up in OSM. If we want to depart from OSM, we’ll have to be very deliberate about it with improved editor support. We’ll also have to keep in mind these differences as we build tools to help mappers transfer content from OSM to OHM when it no longer suits OSM.

The *:source format makes a lot of sense if we’re going to tag each citation in structured format, isolating each piece of metadata into a separate tag. OSM never needed to do this, because the principle of on-the-ground verifiability normally limited sources to primary observation (untagged), tracing from imagery layers (automatically tagged on changesets), and imported online databases (which are detailed on import pages on the wiki). Rarely would someone need to cite, say, a newspaper article by its author, title, publication, date, and page.

So far, no editor supports the *:# indexed syntax. In OSM, this syntax only ever became entrenched as part of two tagging schemes, seamarks and street parking restrictions. (The latter scheme was eventually deprecated in favor of conditional tagging.) Other experiments with indices apparently flopped, like this proposal for boundary stone positioning or some of the ideas around encoding parts of highway destination signs. I’m curious if this was just because the proponents of these schemes never pushed them as far as they needed to, or if there was a flaw that prevented editors and data consumers from supporting the indices.

To the extent that mappers have been tagging both the title and URL to a source, there seems to be two approaches to tagging them side-by-side:

  • source contains the title, and source:url contains the URL.
  • source contains the URL, and source:name contains the title.

The former assumes that every source has a title, while the latter assumes that every source is reachable online at a permalink. Neither assumption holds in every situation, so it seems like we’d need to allow source to go untagged sometimes, or expect software to gracefully handle either a URL or a non-URL in source.

How does this principle square with OHM’s convention of elaborating on the sources on community project pages on the wiki? To the extent that mappers follow this convention, the project pages stand a better chance of staying up-to-date with URL changes and the like. But we’d need a consistent convention for referring to sources that are detailed elsewhere. I suppose some licenses also impose restrictions on how directly or indirectly we may incorporate a citation.

I’m reminded of the two common practices for citing sources on Wikidata. Some references (citations) are detailed in situ while others are represented by a simple stated in (P248) statement, sometimes with an extra pinpoint ID. In the long arc of time, every distinct work used as a source would have its own item there, but detailing the source’s metadata in situ is a reasonable first step. This would result in indirect citations if you only look at the raw XML, but as long as the user-facing website always resolves the citations, making them usable, then any charges of plagiarism can be avoided.

We could even outsource some of this metadata to Wikidata, relying on that project to supply the more tangential details about a source. As part of its WikiCite initiative, Wikipedia has been developing an approach in which it would store only the QID of a Wikidata item, plus pinpoint information like page numbers and excerpts. The main challenge is that we’d need to establish two-way communication with the Wikidata community so that items about sources don’t get deleted by accident. The Name Suggestion Index project has some experience managing this risk with the items it needs; overall, the partnership with Wikidata has been highly beneficial.

2 Likes

@Minh_Nguyen commented on this as well, but even with my small but ancient background in OSM, I’ve always thought of this as:

a : b, describes a : c, describes b : d, describes c : …

Which is where date is the date of the source in source:date=*
Much as edtf describes the type of start_date in start_date:edtf=*.

One so far undiscussed benefit (if you read tag lists… :wink: ) of this approach is that when tags are listed alphabetically, all related tags are automatically listed in a browsable, easy to follow format, as shown here:

I think either putting the name in source:[x]:name=* and the url in source:[x]=* or essentially the reverse, just swapping :url for :name appropriately is fine. Key is having a slot for both pieces of info. I think even having a secondary or officially long name is cool, too, if that’s what the mapper wants, but I think the better convention would be to have that live in a :short_name or :full_name or whatever subkey. And, my only goal with whatever is the text name subkey is that mappers include at least a minimum of finding a specific source and not be left to find 2 sources that fit that name string.

It’s really up to the mapper, but - in general - I think we’d want to avoid duplicating information that might be otherwise easily divined from other tags. E.g., if a name included the word “map”, I don’t think we’d need a tag. Same for “picture” or “article”. That said, it would be very interesting to be able to query source data types easily. And… I think you’re onto some rich pools of validating data that have yet to be tapped in photos and newspapers. Don’t forget art! So… have at it!

2 Likes

Perish the thought! If I’d known that was what I was cracking open, I’d have chosen discretion as the better part of valor. I was, for the record, an OSM religious war draft dodger!

In this case, source:* format means like source:start_date=* is the source of the start_date=* tag, correct?

This pulls a thread I failed to include in my prior post: the general source:a:b:c=* tagging is more about the geometry drawn on the map. The *:source tags are about the characteristics of what the geometry represents. I think. Is that correct?

This would be very good to know. I think the editors do support the syntax, but only as free-form text entries, not as any sort of supported validation, autocomplete, etc.

One catch is that the OHM inspector supports enumerated tags for at least a couple of UI-exposed items: images and additional information links, which were intended to help users enrich the left-nav description of what they’ve mapped. See here.

and here:


Totally agree.

I’d suggest that few of our conventions are rock-solid yet, or certainly rock-solid enough to use as definitive guidance. I do think community project pages are important, especially for describing process, goals, inspirations, etc. But, at the end of the day, I’m expecting people to download the entirety of the OHM database for information and not the wiki. Likewise, I believe there will always be more OHM projects than project pages, even as valuable as they are.

Indeed, and the closer we can put the citations to the actual data being cited, the better. We could even be a standard-bearer in this area. Or, at least try to be.

Totally agree that this should be our target - to have some sort of citation lookup system and I hope we can draft off of / act as a poster child for Wikicite. Also agree with your concerns about 2-way comms. That said, until then, I think we’re a little stuck… unless someone here has a clever idea? :pray:t3:

1 Like

Yes, OSM uses source:name in the manner that you propose using name:source.

OSM rarely tags the source of geometry on the feature, since source and imagery_used tags on the changeset are considered to “travel with” OSM data. But there is considerable use of source:geometry and source:position for specialized purposes, such as positions of survey monuments.

This is true, though I don’t think we can expect any sort of comprehensiveness to source tagging until the editors do more to help mappers manage their sources.

source:wikidata without needing any other source:* tags?

1 Like

Just to add my methodology to the conversation/confusion. When I first started OHM editing I was adding the full source citation to source=* including the url where available. Multiple sources were separated by a semi-colon. Maybe I got this from a misintepretation of the OHM wiki, or another users contributions, or just plain ignorance, I’m not sure.

I recently realised this was a lot of work, not sustainable, and difficult to read. I may have breached a character count occasionally as well without realising it.

It seems that a better way (or at least simpler) is to adopt an academic citation methodology where the in-text source is abbreviated with a unique ‘year-author/title’ reference in source=* (again separated by semicolons for multiples) that is fully expanded elsewhere with url, attribution, mapwarper id, and explanatory notes as necessary. The elsewhere I’m using is the community project page for the area I am working on and is linked via source:url=*, examples:
relation - Relation: ‪C to 1878‬ (‪2749498‬) | OpenHistoricalMap
project - Open Historical Map/Projects/Newcastle - OpenStreetMap Wiki (note I have not been including licensing but will correct that)

This is something akin to the source:name=* with source:url=* recommendation

Sources that are not of broad use (eg a newspaper article) generally wouldn’t be referenced this way but via a full attribution using source=*, although I would adopt whatever method comes from this discussion.

Obvious flaws with this method are that it is a double-step process for a later user and relies on the reference list url/page being maintained and not corrupted in the future, although the abbreviated reference is still useful (if well written) as there are usually a limited number of sources that mappers for that area should also be familiar with. I’d argue that a project reference page is highly useful to document how an area has been mapped and possibly more useful than the fragmented ‘source=full attribution’ approach as the story behind the mapping can be documented or at least be more evident.

Also missing from this method is the tagging of the sources for the various aspects of a feature, e.g. the location may be from one source (a map) but the start_date comes from elsewhere. Dealing with start_date is easy with start_date:source=, but is there a convention for ‘location’ or geometry, as in geometry:source=? I am guilty of loading up fixme=* and notes=* with some of this stuff, or wikipedia=* if that is where some of the source data resides. Also, given the discussion of indexing, I’m now not sure that my use of semi-colons is appropriate.

3 Likes

What styles do GIS journals use? I know, author-date is expected for most including history, and apparently geography. Computer fields and Wikipedia are used to IEEE style. Then the object can simply point to this list on wiki, when there are multiple references.
Personally I like the Chicago short note style with short titles. More readable and recognizable to me, especially as footnotes or endnotes. I know the 255-char limit means a choice has to be made.
There are much variations even between author-date styles. This includes using author/cartographer (same?) vs publisher, produced vs (re)published year, and presence of titles or numbering. Also in a text, the title is usually introduced before adding the bracketed author-date, so it can’t be said as totally dropped.
(Disclaimer: Have only been helping with edits to refine existing work in OHM, not added any references myself yet. But the citation format is one of my concerns. )

1 Like

The very problem I see in the source:*= prefix in OSM for sources of attributes is the metadata source:url= etc are made an exception. However, to bridge between the 2 projects for readability to users, and possibly software interchangeability (despite OSM iD only showing a source= field now), I suggest fully moving to a stub object:source= when the source= applies to the entire object. This and eg geometry:source= would refer to the data, similar to ::geom in Overpass. This will make the format more consistent and reliable. No guessing what source:name= means. If the lack of source= specifically is accepted, using object:source:name= or object:source:url= instead of source:name= and source:url= won’t be farfetched?

1 Like

Most of the time I’ve mapped a feature without adding a source, it’s because the geometry comes from one of the built-in imagery layers. The changeset’s imagery_used tag records the layer(s) automatically. In the past, some OSM editors like Potlatch made it easy to tag the background layer as a source on the feature, but these tags usually got stale very fast and rarely carried any real meaning. Not only can other mappers tweak the geometry without touching the source, but the contents of the background layer itself are also ephemeral, subject to change at any time without a backup. I think this risk distinguishes these sources from the static sources used to tag attributes or map older geometries.

Even when I trace from background imagery, I make a best effort to tag the sources I used for attributes such as start_date and name. But sometimes I omit the dates because I’ve only been able to ascertain them by inference. For example, I’ve determined the start date of this on-ramp via the dates of the road alignment and overpass it connects. I probably should add a note to that effect, but I wouldn’t want to matter-of-factly source the date to a source that remains silent on the on-ramp. Putting words in someone’s mouth is almost as bad as plagiarizing them.

By the way, there’s a proposal to warn when citing certain sources too vaguely:

Ok - this has been a mind-expanding thread, and conversation appears to have tapered off, so I’d like to attempt to summarize and consolidate the discussion to see if we can turn it into an action plan for getting to some more finished documentation. Hopefully, I haven’t mischaracterized the discussion so far.

Geometry vs. Metadata sourcing:
Differentiating between these two categories of what we map in OHM was very helpful to me, especially in regard to how we tag them.

It seems all of the source tags fall into these 2 categories, and the original post implied:

  • Geometry tagging: use source: without a prefix, as if it’s actually geom:source:xxxx=* as described by @Kovoschiz here. These will most often refer to items shown on a source map.
  • Metadata tagging: append :source to the metadata prefix key. These source tags often refer to information not obviously derived from a source map.

This leads to a few questions also identified in the thread - e.g. Should we:

  • Explicitly use a geometry prefix of some sort? @kovoschiz pointed this out. Should we have keys like geom:source, node:source, way:source, etc.? If so, which one? Or, should we just assume that without a prefix, “source” refers to the implied “geom”? Benefits of an explicit geometry key would include clarity and consistency across keys and less confusion about what the source reflected. Costs would be longer keys and some retooling (probably true of whatever we decide (for now… see below)).

Other source-related questions

Should we:

  • Require a minimum base geometry (implicit or explicit) source tag of some sort? @Minh_Nguyen pointed out that having a subkey like source:name might imply that the source key has a value. Or, could/should we support as “pick-one-of-three” or “one-of-two” from source=name or url, source:url=*, and source:name=*.

  • If we have a minimum required source tag, should that be a short text citation? @Ashton747 proposed a this system here and @AndrewS_OHM added to it with an added wrinkle of maybe including multiple sources separated by semicolons. Andrew also pointed out that most mappers for a particular area would know the maps that are available for a particular point in time. [my experience is: “I wish!”]

  • Store bibliographies on a community project wiki page and link to that using source:url? This was suggested by Andrew & Minh raised a similar point. Or support both that system and direct URLs to sources? e.g. an ohm_project=url key? Benefits are that community project pages provide a much better description of how the data was sourced and aggregated than a simple link. Cost is that not all mapping is big enough to justify creating a project page. And, community pages aren’t machine readable for gathering sources.

  • Encourage a particular citation formatting? @Kovoschiz brought this up and proposed an abbreviated format. If so, which one? Here’s the Chicago format for map citations, as hosted by the Library of Congress. Yowza.

  • Adopt an approach for inference-based sourcing, particularly for start_date and end_date tags with unclear sources, but some degree of confidence? @Minh_Nguyen talked about this and I’m guilty of very incorrectly citing start_date=arbitrary and end_date=arbitrary when those tags, while arbitrary, were also most likely bound by some other assumptions that my tagging didn’t reflect.

Static vs. Dynamic Sourcing
All of our current modes of tagging are fairly static, rigid (e.g. no rdb triplets) and unliked (outside of an explicit URL…). Much more interesting alternatives like Wikicite, as mentioned by @Minh_Nguyen, are on the foreseeable horizon.

If you haven’t looked at Minh’s examples above (2nd para from bottom of post), which often cite a source and have a means of adding a “pinpoint” lookup within that source, please do so. It’s not too far of a leap to think of a source as a map and a pinpoint as a literal… point… on the source map. Very exciting and the way map citations are moving, especially with computer vision and machine learning.

My take is that backend and frontend tooling and user training still require too much work to go to a solution like this now, but maybe that perspective is too entrenched in the past and we should move forward now? Or, at least, maybe support both methods, so the future is available now for those who want to press forward?

Even if we don’t move forward now, we should definitely think of how our current sourcing might be migrated to a richer structure in the future and not back ourselves into a hole.

Moving Forward
I’ve set up some polls to get a sense of general consensus on these topics across our forum dwellers and turn the consensus into some “as of now” documentation to use as a reference while we continue to discuss next steps. Not sure if polls are the best method for resolving anything, but we can give them a shot, no?

The way I see it, source is just a more general statement of a source. It could refer to the geometry and often does, or it could refer to everything about the feature unless otherwise specified (by another *:source tag. Maybe using a single source tag for multiple characteristics of a feature is suboptimal, but the alternative can be very repetitive and possibly obscure the most relevant use of the source.

For example, when I sketched in this house, I gave some newspaper sources mainly to establish the house’s existence and dates. The geometry was an educated guess based on the negative space between buildings and some local knowledge – I used to pass by the house every day and only learned of its historical significance after it burned to the ground. If I were to tag it as source=Minh Nguyen or source=Inferred from surrounding buildings in Esri while crediting the newspaper articles only as name:source etc., I think that would be deeply misleading, as I’m no expert. Instead, crediting the newspapers in source would center them as the most important sources, and if I really want credit, then I can put it in geometry:source.

I think this approach simplifies your model somewhat: there’s only one kind of source key, but the unqualified one is an umbrella source for the feature, which can be further refined by more specific keys. There’s very strong precedent for this approach in OSM’s source:geometry and source:position keys (other than the reversed syntax).

Yes, Wikipedia has a similar standard for citing maps, even for citing OSM. There’s also a full complement of non-map citation templates, some of which have been ported over to the OSM Wiki. Over time, I fully expect that non-map citations will outnumber map citations in OHM, so we’ll need more robust tools for managing citations.

Some Wikipedians swear by the built-in Citoid tool or the third-party Zotero service to manage possible citations while conducting research. Nothing stops a mapper from also using Zotero (which is standard-issue in academia). You could use it to generate all the metadata tags and paste them en masse into iD’s raw tag editor.

Let’s just keep it simple for now. If the wiki has a bibliography page for common local citations, stick the URL of this page in source or whatnot. If you pass in |ref=harv to a wiki citation template such as {{cite journal}}, it’ll automatically produce a named anchor that you can include in the source tag to point to the individual citation. Then you can use source:id for the pinpoint. Here’s a live example.

Later, we could add some HTML microformats to these templates for machine readability or move the citations to data items for easier querying. This is all doable today with minimal effort, but I think we’d want to get a better feel for the requirements around citations before stubbing it out.

2 Likes

Thanks to all!! I’ve attempted to further distill this discussion into a Wiki page describing OHM source tagging. Please take a look and see if I’ve made any minor or major errors and let’s get it cleaned up. : )

1 Like