[osmosis-dev] Reading OSM History dumps

Wed Aug 25 14:14:28 BST 2010

Hi all

After a little playing around I now got an idea of how I'm going to 
implement everything. I'll keep as close as possible at the regular 
simple schema and at the way the pgsql tasks work.

Just as with the optional linestring/bbox builder, the history import 
tasks will serve more then one scheme. I'm leaving relations out, again.

the regular simple scheme
-> its the basis of all but not capable of holding history data

+ history columns
-> create and populate an extra column in way_nodes to store
    the way version.
-> change the PKs of way_nodes to allow
    more then one version of an element

+ way_nodes version builder
-> create and populate an extra column in way_nodes that holds the node
    version that corresponds to the way's timestamp

+ minor version builder
-> create and populate an extra column in ways and way_nodes to store
    the ways minor versions, which are generated by changes to the nodes
    of the way between version changes of the way self.

+ from-to-timestamp builder
-> create and populate an extra column in the nodes and ways table that
    specifies the date until which a version of an item was "the current
    one". After that time, the next version of the same item was
    "current" (or the item was deleted). the tstamp field in contrast
    contains the starting date from which an item was "current".

+ linestring / bbox builder
-> just the same as with the regular simple scheme, works for all
    version and minor-version rows

Until the end of the week I'll get a pre snapshot out that can populate 
the history table with version columns and changed PKs. The database 
created from this can be used to test Scotts SQL-Only solution [1].

It will also contain a first implementation of the way_nodes version 
builder but only with an example implementation of the NodeStore, that 
performs bad on bigger files.

Brett, the pgsql tasks currently write (in COPY mode) all data to temp 
files first. The process seems to be

PlanetFile -> NodeStoreTempFile -> CopyFormatTempFile -> PgsqlCopyImport

in osm2pgsql the copy data is pushed to pgsql via unix pipes (5 or 6 
COPY transactions running at the same time in different connections). 
This approach skips the CopyFormatTempFile stage. Is there any special 
reason this approach isn't used in the pgsnapshot package?

Peter

[1] <http://lists.openstreetmap.org/pipermail/dev/2010-August/020308.html>