[OSM-dev] Osmosis Plans

Tue Sep 25 05:31:26 BST 2007

Hi All,

Osmosis has reached a point where it is reasonably feature complete.  I 
still have plans to build an automated tool for generating short 
interval replications between databases but this may be some time off.

I'm curious what thoughts people have on its usefulness.

It can be used for a number of things due to the different ways tasks 
can be plugged together but here are a number of potential use cases:
1. Planet dumping.
2. Planet difference generation.
3. Planet importing into offline mysql database.
4. Database snapshotting.
5. Changeset derivation from database.
6. Changeset application to offline mysql database.
7. Polygon extraction.
8. User activity reporting.
9. Replication to alternative schema.

For each of the above here are some thoughts:
1. Currently planet.rb is still being used.  Osmosis produces similar 
output and adds the user attribute to all entities.  Jon Burgess has 
written a new planet dumper in C solving planet.rb performance problems 
which may eliminate any reasons for moving to osmosis as the dumping 
tool.  Is the user attribute considered useful and if so should it be 
added to the planet.c implementation?
2. Again, Jon Burgess has written an excellent C implementation of a 
planet difference generator that is somewhat faster than osmosis (mainly 
due to osmosis parsing data into full objects incurring a large date 
parsing overhead and writing data back into string format).  Again, 
osmosis is not necessary here.
3. I'm not aware of any tools capable of importing a planet into a db as 
quickly as osmosis so for the meantime it appears osmosis is still 
useful here.
4. The snapshot task is not as fast as dumping current tables but it has 
a number of advantages.  It can be used as a starting point for db 
changeset derivation (therefore only has to be done once), can be used 
to obtain a snapshot for any point in OSM history, and it avoids all 
referential integrity problems with the current planet creation 
process.  However it is useless at the moment due to the old TIGER data 
in the history tables.
5. This is the most useful part of osmosis, but I'm not hearing a lot of 
interest in this at the moment.  I'd love to see an end use of osm data 
hook into this capability (eg. tiles at home or mapnik) but I'm not sure if 
this is seen as useful.  Obviously the TIGER data needs to be cleaned up 
but a real use of this capability would provide reasons for doing so.
6. Dependent on 5, and can be used for several things.  A. It allows a 
db to be updated to the current state much quicker than a new import.  
B. It can be used to mark data as changed (eg. automate map re-render 
requests).
7. There are other scripts in the repository that already do polygon 
extraction.  Not sure if osmosis is seen as useful here.
8. This was a task I knocked up in half an hour to show a summary of all 
users activity in an osm file.  For example, combined with the above 
polygon task it can be used to show all active users in Australia.  This 
capability can be easily extended to report on anything in the osm 
file.  Again, not sure how useful people see this feature, perhaps 
quickly hackable perl scripts are more appropriate.
9. This is something I hoped osmosis would facilitate but it is 
dependent on regular changeset derivation being established.  It seems 
to be that there are a myriad of uses for osm data that don't fit well 
into the existing schema (eg. routing, mapnik rendering, etc).  If 
osmosis can provided a regular feed into these alternative schemas then 
it should allow them to provide services based on access to current data.

I'm keen to hear people's thoughts.  I'm not sure what I should focus 
on.  I believe the replication features would be useful to help the 
project scale to a much larger size.

I do have a way of working around the TIGER problem.  If the old TIGER 
user public edit flag is enabled and a snapshot of the database 
performed using osmosis it will be simple to identify the problem 
entities.  From there I can produce a changeset to apply to the snapshot 
planet which will remove the offending entities.  Assuming there aren't 
too many other inconsistencies between current and history tables 
osmosis can then be used properly.  It can then be used to extract 
weekly changesets or perhaps daily changesets making it possible to 
greatly improve access to OSM data.  Thoughts?

Cheers,
Brett