[OHM] OHM - is the data model broken?
Jeff Meyer
jeff at gwhat.org
Tue Jul 14 20:58:38 UTC 2020
Hi Hannes (cc: Thomas Schwotzer, in case he'd like to add any observations
& insight to this discussion!):
First, let me say that I don't believe the OHM data model is "broken" but
that I do believe it may require quite a few workarounds and may not be up
to the challenge of 100% of dataset needs. Second, Thomas has put his money
where his mouth is by building a separate system - much respect to him for
doing what he's done.
Third - many thanks to Hannes for taking the time to research and think
through this topic. Hopefully, others will join in!
And, last major point - perhaps we should have a tech discussion / get
together to go through these issues in real time? Maybe have some guest
speakers / presentations? I'd be glad to help coordinate, especially if
others present.
To the background & topics of this thread. : )
I think there was some resistance to adopting a new data model and
associated stack when Thomas wrote to us years ago and we were reluctant to
veer away from OSM too much, given our own traction issues and other dev
needs. I also think we should be open to modification and extension where
feasible and where needs dictate.
That said, I think there's a *lot* of power in relations that could be
solved with some of the examples that Hannes has done a great job of
outlining. Between relations, using external sources of stable identifiers
& pointing at them (rather than worrying about inherent OSM instabilities),
and possibly embedding some concept of preceded_by=* and followed_by=* and
Hannes' transition=* idea. Namespace modifications still scare me a bit,
for a variety of reasons, including tooling and embedding information in
labels, etc.
Bottom line, this is a benevolent mapocracy, and it feels like there are a
bunch of us noodling on the same pasta, so let's get together to talk about
it. Who's in?
- Jeff
p.s. I highly recommend looking at https://www.whosonfirst.org
<https://www.whosonfirst.org/docs/contributing/> for some additional
inspiration on this topic. The guys who put this together have pretty deep
data structure and tagging backgrounds & are very OSM savvy, so there's a
lot of relevant synthesis embedded in their implementation.
On Mon, Jul 13, 2020 at 10:20 AM Hannes Röst <hannesroest at gmx.ch> wrote:
> Dear all
>
> I was reading this thread
> https://lists.openstreetmap.org/pipermail/historic/2019-February/001186.html
> and the arguments made by Thomas which make a lot of sense. First I would
> like to thank Thomas for his paper and putting thought into this and I hope
> he reads this and has some comments on my arguments (I am aware that other
> people have thought about these problems for much longer than me, so that
> is why I tried to go back and read the old mails on the list). I agree that
> his data model makes a lot of sense and is sometimes necessary to
> accurately describe historic and geographic objects. I was reading his
> paper and looked at how OHDM models geographic data, using a separation
> between geometries and meta-data. He correctly identifies that it will not
> always be possible to have a 1:1 relation between the geometry (node/ways)
> and meta-data and they therefore should be separated. I also read
> https://wiki.openstreetmap.org/wiki/Open_Historical_Map/Tags (and
> section #Representation_of_change_in_historical_road_networks) which
> contains some of the ideas already. I realized that many of the issues that
> I described in my previous email are due to this assumption, for example
> lets take the case of the
> https://en.wikipedia.org/wiki/London_Bridge_(disambiguation) which
> describes at least 5 different entities:
>
> - a roman bridge
> - one or more medieval bridges
> - old London bridge Q56739974 (1209-1831)
> - new London bridge Q56739652 (in London 1831 to 1968, then dismantled
> and brought to the US to be rebuild stone by stone as "London Bridge
> (Lake Havasu City)", see Q1868889)
> - current London bridge Q130206 (1973-)
>
> so the issue is that there is a "concept" of a London bridge, namely a
> crossing at this particular location, then there are specific instances of
> geometries of wood, stone, metal to form a physical bridge and then there
> is the physical continuity of one of these specific bridges being built
> at one place and then being dismantled and rebuilt somewhere else using the
> same physical stones. It is clear that we cannot model this relation with
> tags alone, but it is my belief that if we work on this example than we may
> have a pretty good model that can model pretty complicated spatio-temporal
> relationships of physical objects and geometries. A second example is a
> building that over time had different functions and may have been expanded
> at some point in time with further geometry added or removed due to
> construction / demolition (for example a church of monastry).
>
> Basically what we need is a n:m relationship between nodes and
> spatio-temporal concepts. A single building contains multiple nodes but a
> single node may be part of multiple buildings over its lifetime which have
> different attributes (tags) associated with it. On top of that, the data
> model by Thomas allows the n:m relation itself to have start/end time,
> something that may not be very easy to do right now in OHM (see below).
> Currently to me it seems that there are two ways to approach this problem:
>
> i) using date namespaces (see
> https://wiki.openstreetmap.org/wiki/Proposed_features/Date_namespace)
> ii) using relations for true n:m mapping
>
> For (ii), we can use relations which are *already* available in OSM/OHM.
> Currently there are a limited number of relation types:
> https://wiki.openstreetmap.org/wiki/Types_of_relation and I think we
> would could to expand the list for OHM and introduce new types in order to
> implement the n:m relationship between geometry and concepts.
>
> For relations to work properly, I suggest that we create a new relation
> type of a spatio-temporal concept / continuity "type=spatio_temporal" which
> would relate to an entity that is conceptually linking the individual
> geometries on the ground over time and space.
>
> Lets look at some examples:
>
> Example 1: A church gets converted into a night club
> Example 2: A church gets expanded with an additional wing, later burns down
> Example 3: A bridge gets moved to a new geographic location
> Example 4: A bridge gets replaced by a newer bridge without re-using any
> existing building material
>
> We can solve (1) using either (i) or (ii), for example we could use
> "building:1700-1950=church" and "amenity:1950-=nightclub" and this is the
> most economical solution: all other tags may be shared and there is only a
> single way for the whole buildings, so it takes the least amount of storage
> and is very intuitive for editors. We can also use (ii) and create two
> relations, one for the nightclub and one for the church, allowing clean use
> of "start_date" and "end_date" in each the relation to make the history
> explicit and confirm with tagging guidelines. This is also pratical since
> the nodes and the way would *not* be duplicated and only stored once in the
> database. The temporal history is clear since it is explicit that the same
> building stood there since 1700 and has been used for 2 purposes.
>
> We can solve (2) using (i) by creating a way for the original church,
> tagging it appropriately (start_date/end_date) and then for the extension
> create a new way that re-uses some of the old nodes and adds new nodes,
> tagging it appropriately (start_date/end_date). The temporal history should
> be mostly clear since it is clear that some of the nodes are re-used and
> therefore part of the building was used to create a new building. It is
> also economical since nodes are re-used and not duplicated in the database.
> Alternatively we could use two relations to achieve the same goal as above,
> basically leading to 2 ways for old/new and 2 relations for old/new. Both
> approaches lead to some duplication on the tag level (eg the name appears
> twice and is stored twice). But it does become a bit muddy here, since its
> not the *same* way that is in either relation but two different ways with
> some shared nodes. So some information is lost with this approach (namely
> about spatio-temporal continuity of an entity), so adding the two ways to a
> relation of "type=spatio_temporal" would make sense here to be explicit
> instead of implicit and to avoid duplication (otherwise we have 2 ways with
> the same "name=" tag, leading to issue with search and updates). This would
> also make LOD easier since its more likely that external resources like
> Wikidata / Wikipedia would have information on the spatio-temporal concept
> and not on the geometry which has changed over time. Otherwise, where would
> we add the wikidata tag, to way1 or way2? or both?
>
> We could solve (3) as well using a relation of "type=spatio_temporal"
> which contains the name of the object and way1 (former location) as a
> member and way2 (new location) with the move being implicit as way1 has
> end_date which is before start_date of way2.
>
> We can currently model (4) by just creating two bridges that are at the
> same location, have their own wikidata tags etc, but that loses some
> information since they do not capture information about their relation to
> each other. Also here, a relation of "type=spatio_temporal" would help and
> we can add both bridges to the relation. We can add as many bridges as we
> like (eg all 5 London bridges) and some of these bridges may actually use
> the exact same way (same nodes) if they are built at the same location and
> some may not if they were built a few meters up/downstream. In this case
> tags that are building-specific would stay on each individual geometry
> while some tags such as the "name=" tag would be in the relation, assuming
> all the bridges had the same name.
>
> Second, we could also use relations to indicate how the transition
> happened between two geometries, e.g. if a building burns down and only
> part of it remains, we could use a relation "type=temporal_transition" or
> "type=historical_event" "type=event" which would have "event_type=fire" and
> "start_date=XX" and "end_date=XX" for the date of the *event*, allowing us
> to model multi-day events. Together with a tagging of way1 as "before" and
> way2 as "after" this could clearly indicate what happened to particular
> building and we could model events such as
> https://en.wikipedia.org/wiki/Great_Fire_of_London . The interesting part
> here is that the relation *itself* could then link to the Wikidata event Q164679
> while Wikidata could link to OHM for people to get an idea of the extent of
> the fire and affected buildings. (look at the current Wikidata page, it
> only has "coordinate location" pointing to London, more information is
> clearly necessary). To model a physical move, we could use "type=event"
> and "event=physical_move", using the "role" field with way1 as "before" and
> way2 as "after" (similar to how we use inner/outer for multipolygons). For
> new buildings/replacements/additions we could use "event=construction" in
> the relation and indicate how long construction took etc and link to the
> corresponding wikidata articule, such as Q811095 for "Construction of the
> World Trade Center" (the original) and Q5164470 for "Construction of One
> World Trade Center" (the rebuilding after 9/11) .This would be a *very*
> rich way to describe events, but isnt that what we are aiming for?
>
> Now one issue remains: the current n:m relationship in OHM does not allow
> the *relationship* itself to have a start/end time as described in the
> paper by Thomas. However, I am not sure how big of a problem that is. I
> have a really hard time to come up with an example where this would be
> necessary and the start/end time could not be stored in either the geometry
> or the relation (of course its cleaner to store it only in one place, so
> its nicer from a design point of view, but from a functional point of view
> I struggle to see an necessity). We could approximate this in three ways
> (i) using parent/child relations where the child is the "intermediary" and
> only stores start/end times. We (ii) could modify the relation_members
> table (see
> https://wiki.openstreetmap.org/wiki/File:OSM_DB_Schema_2016-12-13.svg )
> adding 2 columns (start_date / end_date) which would break most editors and
> all OSM-compatibility and seems like a bad idea or (iii) hack the "role"
> column by adding date ranges: for example a bridge that was moved from way1
> to way2 could have a single relation called "London Bridge" and a member of
> the relation is way1 in London with the role field equal to "1831-1968"
> while way2 is the bridge in the US with the role field equal to "1968-".
> Now I also think this is a bad idea, but it is possible :-) Given these
> options, if this case ever comes up then probably (i) is the easiest way to
> solve this.
>
> I hope I have discussed some ideas on how to solve the problems I have run
> into during my own mapping. I have tried to use some previous ideas voiced
> here on the list, such as what Thomas proposed. I have tried very hard to
> come up with proposals that would work without changing the data structure
> of OHM (except the one idea of changing the relation_members table) which I
> think will be crucial for OHM since it would allow people to continue using
> all resources/editors that OSM produces and it will be backwards
> compatible. I think this is important to not deviate too much from OSM code
> and database layout if possible for the project to succeeed.
>
> Let me know what you think
>
> Hannes
>
> _______________________________________________
> Historic mailing list
> Historic at openstreetmap.org
> https://lists.openstreetmap.org/listinfo/historic
>
--
Jeff Meyer
206-676-2347
osm: Open Historical Map (OHM)
<http://wiki.openstreetmap.org/wiki/Open_Historical_Map> / my OSM user page
<http://www.openstreetmap.org/user/jeffmeyer>
t: @OpenHistMap
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/historic/attachments/20200714/38a9ce6c/attachment-0001.htm>
More information about the Historic
mailing list