Ah, cracking open OSM’s old religious wars, I see. In OSM, the source:*
format seems to have won out pretty decisively. I don’t know all the arguments on either side, but OHM has some considerations that probably never came up in OSM. If we want to depart from OSM, we’ll have to be very deliberate about it with improved editor support. We’ll also have to keep in mind these differences as we build tools to help mappers transfer content from OSM to OHM when it no longer suits OSM.
The *:source
format makes a lot of sense if we’re going to tag each citation in structured format, isolating each piece of metadata into a separate tag. OSM never needed to do this, because the principle of on-the-ground verifiability normally limited sources to primary observation (untagged), tracing from imagery layers (automatically tagged on changesets), and imported online databases (which are detailed on import pages on the wiki). Rarely would someone need to cite, say, a newspaper article by its author, title, publication, date, and page.
So far, no editor supports the *:#
indexed syntax. In OSM, this syntax only ever became entrenched as part of two tagging schemes, seamarks and street parking restrictions. (The latter scheme was eventually deprecated in favor of conditional tagging.) Other experiments with indices apparently flopped, like this proposal for boundary stone positioning or some of the ideas around encoding parts of highway destination signs. I’m curious if this was just because the proponents of these schemes never pushed them as far as they needed to, or if there was a flaw that prevented editors and data consumers from supporting the indices.
To the extent that mappers have been tagging both the title and URL to a source, there seems to be two approaches to tagging them side-by-side:
source
contains the title, andsource:url
contains the URL.source
contains the URL, andsource:name
contains the title.
The former assumes that every source has a title, while the latter assumes that every source is reachable online at a permalink. Neither assumption holds in every situation, so it seems like we’d need to allow source
to go untagged sometimes, or expect software to gracefully handle either a URL or a non-URL in source
.
How does this principle square with OHM’s convention of elaborating on the sources on community project pages on the wiki? To the extent that mappers follow this convention, the project pages stand a better chance of staying up-to-date with URL changes and the like. But we’d need a consistent convention for referring to sources that are detailed elsewhere. I suppose some licenses also impose restrictions on how directly or indirectly we may incorporate a citation.
I’m reminded of the two common practices for citing sources on Wikidata. Some references (citations) are detailed in situ while others are represented by a simple stated in (P248) statement, sometimes with an extra pinpoint ID. In the long arc of time, every distinct work used as a source would have its own item there, but detailing the source’s metadata in situ is a reasonable first step. This would result in indirect citations if you only look at the raw XML, but as long as the user-facing website always resolves the citations, making them usable, then any charges of plagiarism can be avoided.
We could even outsource some of this metadata to Wikidata, relying on that project to supply the more tangential details about a source. As part of its WikiCite initiative, Wikipedia has been developing an approach in which it would store only the QID of a Wikidata item, plus pinpoint information like page numbers and excerpts. The main challenge is that we’d need to establish two-way communication with the Wikidata community so that items about sources don’t get deleted by accident. The Name Suggestion Index project has some experience managing this risk with the items it needs; overall, the partnership with Wikidata has been highly beneficial.