On what date does end_date end?

Minh_Nguyen · May 2, 2024, 1:10am

Does end_date=2024-05-01 indicate that May 1st was the feature’s last full day in existence (“full”), or does it indicate that the feature went out of existence at some point during May 1st (“partial”)? Does end_date=2024 mean the feature could have already gone away this year (partial), or that it definitely won’t go away until sometime next year (full)? The end_date documentation is silent on the matter.

I’ve encountered many examples of either interpretation. Two of the largest boundary imports to date have set full end_date tags: the city limits of San José, California, and the Newberry Library Atlas of Historic County Boundaries. Each member of a chronology relation has an end_date that is one less than the following member’s start_date. For example, Bullfrog County, Nevada, was abolished on May 3, 1989, according to the Newberry Library, but the boundary and its chronology relation have an end_date of May 2. Nothing in the database says May 3 explicitly, because the county was abolished without replacement.

On the other hand, many individual features set partial start_date tags, particularly when they aren’t part of chronology relations. For example, the Bermuda Railway ceased operation at some point on May 1, 1931. The end_date value therefore matches the date given in the source. I think most laypeople would come up with this interpretation when mapping or using the map. It’s also consistent with how start_date is always interpreted partially, not as the first full day in existence.

In the time zone boundary project that I recently completed, most of the changes took place at a specific time of day (2 o’clock in the morning local time). Since start_date and end_date don’t accept times of day, I set start_date:edtf and end_date:edtf to a matching value that includes the time. This is the partial interpretation. I didn’t consider the full approach at the time, but I think it would’ve resulted in some weird tagging. The end_date:edtf of one boundary relation would have to come some fraction of a second before the next boundary relation’s start_date:edtf. Both tags would fall on the same day; however, the former relation’s end_date would specify the day that comes before the same relation’s end_date:edtf. In some cases, a boundary lasted merely an hour, so end_date would have to fall one day before start_date – both on a single element.

If we prefer the full date interpretation, then much of the software stack will need to be modified. In iD, if you set the date filter to May 1, 2024, it includes features tagged end_date=2024-05-01 or 2024-05 or 2024 but excludes features tagged end_date=2024-05-02 or 2024-06 or 2025. The validator also allows two features to overlap spatially if they have matching end_date and start_date values. For its part, the Leaflet time slider plugin interprets every start_date or end_date value as falling at the stroke of noon on that day. Regardless, if you set the time slider to May 1, 2024, it shows features with start_date or end_date set to 2024-05-01 or 2024-05 or 2024.

If we prefer the full date interpretation, at least 967 chronology relation members and 84 chronology relations would need to be mass-retagged for consistency, according to QLever. If we prefer the partial date interpretation, at least 13,451 chronology relation members and 830 chronology relations will need to be retagged, the vast majority of them from the San José and Newberry imports. Either way, we need to manually review any element that isn’t part of a chronology relation, because there’s no way to know what the mapper intended just by looking at the tags.

jeffmeyer · May 2, 2024, 3:37am

This is a question I’ve been carefully ignoring for a while.

I think the question relates to a conflict between being technically correct and what renders well in our current map animation.

Technically correct date attributes (e.g., having the end_date for one object be the same value (or maybe 1 second later) as the start_date for whatever replaces it (in the case of an instantaneous cutover) currently render as showing both objects existing at the same time on the day of the cutover.

My take is that the tags should be technically correct and the site should accommodate whatever is required to manage around having two things that are sequential in existence showing up at the same time.

I believe that might involve some expansion of data types (date only to datetime, but I could be wrong there), development of a time-of-day respecting slider or, barring the time-of-day slider, some sort of logic that does the per-day assignment automatically.

Minh_Nguyen · May 2, 2024, 9:32am

This is even more noticeable when we only know the dates down to the year. It can look funky sometimes, but it doesn’t really bother me: it basically shows that the time slider has stopped midway through a transition from one feature to the other. If it doesn’t already, either the tile layer or the stylesheet could sort features by their start dates to ensure that the newer feature always draws over the older one, to avoid confusion.

If we align with the “partial” interpretation of end_date, it becomes possible to express that the changeover happened at the stroke of midnight between two days. That does occur much more often in the case of boundaries than with less abstract features like roads or shops that would have a closing time. But currently there’s no way to distinguish a presumed midnight cutover from the absence of intraday information.

IanH · May 2, 2024, 11:03pm

I think we have to start with some basic assumption about the default times to used with a date tag. That would probably be 00:00 for the start_date and 24:00 for the end_date. This should carry though for incomplete dates, such as a starting and ending years. Though we could come up with some simple calculations for determining how to handle fuzzy edft ranges in a consistent way. Done well, we can make it possible to represent movement over time.

Minh_Nguyen · May 2, 2024, 11:46pm

I think this assumption could still result in either interpretation I laid out, depending on whether you think the feature goes away no earlier than or no later than that assumed ending time.

jeffmeyer · May 6, 2024, 8:33pm

Can you explain more here about looking funky? I thought the site translated dates with only years so that there was a New Year’s Eve ball drop transition for year-only gaps.

Minh_Nguyen · May 6, 2024, 10:02pm

Yes, though sometimes this can result in inconsistencies when you’re viewing something in the middle of the year, like roads crisscrossing a construction zone because the construction ended on the same year the roads were completed.

Minh_Nguyen · May 30, 2024, 9:52pm

In 2022, the ISO date standard was amended to reintroduce 24:00:00 as an alias for 00:00:00 to refer to the precise end of the day, instantaneously (ISO 8601-1:2019/Amd 1:2022). So if someone wants to clarify that something ended at the stroke of midnight, but no later, they could say, for instance, end_date:edtf=1969-12-31T24:00. Whether that goes with end_date=1969-12-31 or end_date=1970-01-01 would depend on one’s interpretation of end_date.