[OHM] OHM - is the data model broken?

Thu Jul 16 13:12:18 UTC 2020

Dear Jeff

Great, I am happy to have started a discussion on this topic and there seems to be a need for formal guidelines or even technical change. I would be interested to get together and discuss this in a call. I agree that the data model is not "broken" but it would benefit from clarification or extension. At least from my experience, I was often confused since there was no "right way" to do things and multiple ways that each felt not quite correct.

Btw: I just found a case where geometry and meta-data clearly need to be separated and that is very hard (not impossible) to model with the current data model:
- https://openhistoricalmap.org/node/2088657540 is the statue of Edward Colston which was tossed in the harbor in Bristol on June 6 [1]
- https://openhistoricalmap.org/node/2088657539 was the new location of the statue for four days [2]
- Now the statue has been replaced (temporarily) with a another one on the same geometry [3]

this is a case where we would need to re-use the same geometry (node 2088657540) for the 1895-2020 statue as well as for the new (2020-07-15 ..) statue since they are at the same place. 

1. https://www.bbc.com/news/uk-52954305
2. https://www.bbc.com/news/uk-england-bristol-53004748
3. https://www.bbc.co.uk/news/uk-england-bristol-53414463

Gesendet: Dienstag, 14. Juli 2020 um 16:58 Uhr
Von: "Jeff Meyer" <jeff at gwhat.org>
An: "Hannes Röst" <hannesroest at gmx.ch>
Cc: "Open Historical Map" <historic at openstreetmap.org>, "Thomas Schwotzer" <thomas.schwotzer at htw-berlin.de>
Betreff: Re: [OHM] OHM - is the data model broken?

Hi Hannes (cc: Thomas Schwotzer, in case he'd like to add any observations & insight to this discussion!):

First, let me say that I don't believe the OHM data model is "broken" but that I do believe it may require quite a few workarounds and may not be up to the challenge of 100% of dataset needs. Second, Thomas has put his money where his mouth is by building a separate system - much respect to him for doing what he's done.

Third - many thanks to Hannes for taking the time to research and think through this topic. Hopefully, others will join in!

And, last major point - perhaps we should have a tech discussion / get together to go through these issues in real time? Maybe have some guest speakers / presentations? I'd be glad to help coordinate, especially if others present.

To the background & topics of this thread. : )

I think there was some resistance to adopting a new data model and associated stack when Thomas wrote to us years ago and we were reluctant to veer away from OSM too much, given our own traction issues and other dev needs. I also think we should be open to modification and extension where feasible and where needs dictate. 

That said, I think there's a *lot* of power in relations that could be solved with some of the examples that Hannes has done a great job of outlining. Between relations, using external sources of stable identifiers & pointing at them (rather than worrying about inherent OSM instabilities), and possibly embedding some concept of preceded_by=* and followed_by=* and Hannes' transition=* idea. Namespace modifications still scare me a bit, for a variety of reasons, including tooling and embedding information in labels, etc.

Bottom line, this is a benevolent mapocracy, and it feels like there are a bunch of us noodling on the same pasta, so let's get together to talk about it. Who's in?

- Jeff

p.s. I highly recommend looking at https://www.whosonfirst.org[https://www.whosonfirst.org/docs/contributing/] for some additional inspiration on this topic. The guys who put this together have pretty deep data structure and tagging backgrounds & are very OSM savvy, so there's a lot of relevant synthesis embedded in their implementation.

On Mon, Jul 13, 2020 at 10:20 AM Hannes Röst <hannesroest at gmx.ch[mailto:hannesroest at gmx.ch]> wrote:

Dear all

I was reading this thread https://lists.openstreetmap.org/pipermail/historic/2019-February/001186.html[https://lists.openstreetmap.org/pipermail/historic/2019-February/001186.html] and the arguments made by Thomas which make a lot of sense. First I would like to thank Thomas for his paper and putting thought into this and I hope he reads this and has some comments on my arguments (I am aware that other people have thought about these problems for much longer than me, so that is why I tried to go back and read the old mails on the list). I agree that his data model makes a lot of sense and is sometimes necessary to accurately describe historic and geographic objects. I was reading his paper and looked at how OHDM models geographic data, using a separation between geometries and meta-data. He correctly identifies that it will not always be possible to have a 1:1 relation between the geometry (node/ways) and meta-data and they therefore should be separated. I also read https://wiki.openstreetmap.org/wiki/Open_Historical_Map/Tags[https://wiki.openstreetmap.org/wiki/Open_Historical_Map/Tags] (and section #Representation_of_change_in_historical_road_networks) which contains some of the ideas already. I realized that many of the issues that I described in my previous email are due to this assumption, for example lets take the case of the https://en.wikipedia.org/wiki/London_Bridge_(disambiguation)[https://en.wikipedia.org/wiki/London_Bridge_(disambiguation)] which describes at least 5 different entities:

- a roman bridge
- one or more medieval bridges
- old London bridge Q56739974 (1209-1831)
- new London bridge Q56739652 (in London 1831 to 1968, then dismantled and brought to the US to be rebuild stone by stone as "London Bridge (Lake Havasu City)", see Q1868889)
- current London bridge Q130206 (1973-)

so the issue is that there is a "concept" of a London bridge, namely a crossing at this particular location, then there are specific instances of geometries of wood, stone, metal to form a physical bridge and then there is the physical continuity of one of these specific bridges being built at one place and then being dismantled and rebuilt somewhere else using the same physical stones. It is clear that we cannot model this relation with tags alone, but it is my belief that if we work on this example than we may have a pretty good model that can model pretty complicated spatio-temporal relationships of physical objects and geometries. A second example is a building that over time had different functions and may have been expanded at some point in time with further geometry added or removed due to construction / demolition (for example a church of monastry).

Basically what we need is a n:m relationship between nodes and spatio-temporal concepts. A single building contains multiple nodes but a single node may be part of multiple buildings over its lifetime which have different attributes (tags) associated with it. On top of that, the data model by Thomas allows the n:m relation itself to have start/end time, something that may not be very easy to do right now in OHM (see below). Currently to me it seems that there are two ways to approach this problem:

i) using date namespaces (see https://wiki.openstreetmap.org/wiki/Proposed_features/Date_namespace[https://wiki.openstreetmap.org/wiki/Proposed_features/Date_namespace])
ii) using relations for true n:m mapping

For (ii), we can use relations which are *already* available in OSM/OHM. Currently there are a limited number of relation types: https://wiki.openstreetmap.org/wiki/Types_of_relation[https://wiki.openstreetmap.org/wiki/Types_of_relation] and I think we would could to expand the list for OHM and introduce new types in order to implement the n:m relationship between geometry and concepts.

For relations to work properly, I suggest that we create a new relation type of a spatio-temporal concept / continuity "type=spatio_temporal" which would relate to an entity that is conceptually linking the individual geometries on the ground over time and space.

Lets look at some examples:

Example 1: A church gets converted into a night club
Example 2: A church gets expanded with an additional wing, later burns down
Example 3: A bridge gets moved to a new geographic location
Example 4: A bridge gets replaced by a newer bridge without re-using any existing building material

We can solve (1) using either (i) or (ii), for example we could use "building:1700-1950=church" and "amenity:1950-=nightclub" and this is the most economical solution: all other tags may be shared and there is only a single way for the whole buildings, so it takes the least amount of storage and is very intuitive for editors. We can also use (ii) and create two relations, one for the nightclub and one for the church, allowing clean use of "start_date" and "end_date" in each the relation to make the history explicit and confirm with tagging guidelines. This is also pratical since the nodes and the way would *not* be duplicated and only stored once in the database. The temporal history is clear since it is explicit that the same building stood there since 1700 and has been used for 2 purposes.

We can solve (2) using (i) by creating a way for the original church, tagging it appropriately (start_date/end_date) and then for the extension create a new way that re-uses some of the old nodes and adds new nodes, tagging it appropriately (start_date/end_date). The temporal history should be mostly clear since it is clear that some of the nodes are re-used and therefore part of the building was used to create a new building. It is also economical since nodes are re-used and not duplicated in the database. Alternatively we could use two relations to achieve the same goal as above, basically leading to 2 ways for old/new and 2 relations for old/new. Both approaches lead to some duplication on the tag level (eg the name appears twice and is stored twice). But it does become a bit muddy here, since its not the *same* way that is in either relation but two different ways with some shared nodes. So some information is lost with this approach (namely about spatio-temporal continuity of an entity), so adding the two ways to a relation of "type=spatio_temporal" would make sense here to be explicit instead of implicit and to avoid duplication (otherwise we have 2 ways with the same "name=" tag, leading to issue with search and updates). This would also make LOD easier since its more likely that external resources like Wikidata / Wikipedia would have information on the spatio-temporal concept and not on the geometry which has changed over time. Otherwise, where would we add the wikidata tag, to way1 or way2? or both?

We could solve (3) as well using a relation of "type=spatio_temporal" which contains the name of the object and way1 (former location) as a member and way2 (new location) with the move being implicit as way1 has end_date which is before start_date of way2.

We can currently model (4) by just creating two bridges that are at the same location, have their own wikidata tags etc, but that loses some information since they do not capture information about their relation to each other. Also here, a relation of "type=spatio_temporal" would help and we can add both bridges to the relation. We can add as many bridges as we like (eg all 5 London bridges) and some of these bridges may actually use the exact same way (same nodes) if they are built at the same location and some may not if they were built a few meters up/downstream. In this case tags that are building-specific would stay on each individual geometry while some tags such as the "name=" tag would be in the relation, assuming all the bridges had the same name.

Second, we could also use relations to indicate how the transition happened between two geometries, e.g. if a building burns down and only part of it remains, we could use a relation "type=temporal_transition" or "type=historical_event" "type=event" which would have "event_type=fire" and "start_date=XX" and "end_date=XX" for the date of the *event*, allowing us to model multi-day events. Together with a tagging of way1 as "before" and way2 as "after" this could clearly indicate what happened to particular building and we could model events such as https://en.wikipedia.org/wiki/Great_Fire_of_London[https://en.wikipedia.org/wiki/Great_Fire_of_London] . The interesting part here is that the relation *itself* could then link to the Wikidata event Q164679 while Wikidata could link to OHM for people to get an idea of the extent of the fire and affected buildings. (look at the current Wikidata page, it only has "coordinate location" pointing to London, more information is clearly necessary). To model a physical move, we could use "type=event" and "event=physical_move", using the "role" field with way1 as "before" and way2 as "after" (similar to how we use inner/outer for multipolygons). For new buildings/replacements/additions we could use "event=construction" in the relation and indicate how long construction took etc and link to the corresponding wikidata articule, such as Q811095 for "Construction of the World Trade Center" (the original) and Q5164470 for "Construction of One World Trade Center" (the rebuilding after 9/11) .This would be a *very* rich way to describe events, but isnt that what we are aiming for?

Now one issue remains: the current n:m relationship in OHM does not allow the *relationship* itself to have a start/end time as described in the paper by Thomas. However, I am not sure how big of a problem that is. I have a really hard time to come up with an example where this would be necessary and the start/end time could not be stored in either the geometry or the relation (of course its cleaner to store it only in one place, so its nicer from a design point of view, but from a functional point of view I struggle to see an necessity). We could approximate this in three ways (i) using parent/child relations where the child is the "intermediary" and only stores start/end times. We (ii) could modify the relation_members table (see https://wiki.openstreetmap.org/wiki/File:OSM_DB_Schema_2016-12-13.svg[https://wiki.openstreetmap.org/wiki/File:OSM_DB_Schema_2016-12-13.svg] ) adding 2 columns (start_date / end_date) which would break most editors and all OSM-compatibility and seems like a bad idea or (iii) hack the "role" column by adding date ranges: for example a bridge that was moved from way1 to way2 could have a single relation called "London Bridge" and a member of the relation is way1 in London with the role field equal to "1831-1968" while way2 is the bridge in the US with the role field equal to "1968-".  Now I also think this is a bad idea, but it is possible :-) Given these options, if this case ever comes up then probably (i) is the easiest way to solve this.

I hope I have discussed some ideas on how to solve the problems I have run into during my own mapping. I have tried to use some previous ideas voiced here on the list, such as what Thomas proposed. I have tried very hard to come up with proposals that would work without changing the data structure of OHM (except the one idea of changing the relation_members table) which I think will be crucial for OHM since it would allow people to continue using all resources/editors that OSM produces and it will be backwards compatible. I think this is important to not deviate too much from OSM code and database layout if possible for the project to succeeed.

Let me know what you think

Hannes
 _______________________________________________
Historic mailing list
Historic at openstreetmap.org[mailto:Historic at openstreetmap.org]
https://lists.openstreetmap.org/listinfo/historic 
 --

Jeff Meyer
206-676-2347
osm: Open Historical Map (OHM)[http://wiki.openstreetmap.org/wiki/Open_Historical_Map] / my OSM user page[http://www.openstreetmap.org/user/jeffmeyer]

t: @OpenHistMap