[osmosis-dev] Reading OSM History dumps

Sun Aug 22 18:44:54 BST 2010

Am 22.08.2010 08:26, schrieb Brett Henderson:
> Hi Peter,
>
> This all sounds very interesting and will no doubt have many uses that I
> can't anticipate.
>
> I can't give you much assistance but will try to answer any specific
> questions you have.  My wife is going to give birth sometime within the
> next month which means my priorities are about to change drastically ;-)
Oh, congratulations on this!

> You seem to have thought about most of the complexities of the problem
> already so you know what you're dealing with.
I think that all is solvable using just enough logic :) I did the demo 
implementation in PHP to see if this is possible and I think I know the 
OSM data structure enough to know what it means.

But I don't know Osmosis and Java enough to know how tow to implement 
the simple multi-level arrays from PHP in a way that will work with 
those really big files.

What I need is a store that can
  - store all versions of a Node*
  - access a specific version of a node
  - access all versions of a node
  - the oldest version of a node that has been created before Date X

*not only the Node's location but also the Meta-Info (Timestamp, User, 
UserID) because you would want to have this as the Meta-Info on the 
generated intermediate Way-Versions.

I looked into the three implementations of NodeLocationStore (especially 
the InMemoryNodeLocationStore) and I was thinking how I could extend the 
really simple fixed-size memory store to be able to store a complete 
Node and index by Id and Version at the same time.

Because there is no fixed number of versions per Node I can't go with a 
simple offset=NodeID*NodeSize calculation but I have to write the nodes 
one after another just as they come in and save the Offsets in a List, 
but I'm not sure how to build a List that allows Random Access to the 
offset to all versions of a node as well as to a specific version in Java.

I also found the IndexedObjectStore class in 
org.openstreetmap.osmosis.core.store and I thought about extending it to 
track three Indexes (NodeID, Version and Timestamp). Do you know if this 
would be workable?

> You mentioned the problem of obtaining test data.  I'd suggest using:
> http://planet.openstreetmap.org/history/
They are in .osc format but I need a task to convert from .osc to 
history-.osm and back, too.

> That is a full history from day one of the project up until now.  It is
> already in the OSM change format that Osmosis understands.  Cutting
> bounding boxes out of full history data is a difficult (but not
> impossible)
In regard to the Node-Moded-In/-Out problem, yes. At the moment I'm 
working with self-including history files, that contain all referenced 
items from version 1 on. When I start to convert .osc files into 
history-.osm files I will have to deal with objects with incomplete 
histories (when a node has been moved I only know its new position). 
There is a need to feed in a second data-source like an already existing 
database.

 > problem that you may have to solve in order to move
> forward.  In order to build way linestrings for all way versions and for
> all node versions impacting the way you will have to solve a similar
> problem to understanding how to cut bbox data so you may be able to kill
> a couple of birds with one stone.
I'm not really sure if this will work as all I'm focusing on now is to 
get a complete dump analyzed, but we may get closer to this goal.

> One thing to note is that I'm currently changing the simple schema a bit
> to improve performance.
Yes I tracked that and it like the step towards hstore as I already used 
it a lot with osm2pgsql.

Peter