[OSM-dev] Osmosis Plans
brett at bretth.com
Tue Sep 25 05:31:26 BST 2007
Osmosis has reached a point where it is reasonably feature complete. I
still have plans to build an automated tool for generating short
interval replications between databases but this may be some time off.
I'm curious what thoughts people have on its usefulness.
It can be used for a number of things due to the different ways tasks
can be plugged together but here are a number of potential use cases:
1. Planet dumping.
2. Planet difference generation.
3. Planet importing into offline mysql database.
4. Database snapshotting.
5. Changeset derivation from database.
6. Changeset application to offline mysql database.
7. Polygon extraction.
8. User activity reporting.
9. Replication to alternative schema.
For each of the above here are some thoughts:
1. Currently planet.rb is still being used. Osmosis produces similar
output and adds the user attribute to all entities. Jon Burgess has
written a new planet dumper in C solving planet.rb performance problems
which may eliminate any reasons for moving to osmosis as the dumping
tool. Is the user attribute considered useful and if so should it be
added to the planet.c implementation?
2. Again, Jon Burgess has written an excellent C implementation of a
planet difference generator that is somewhat faster than osmosis (mainly
due to osmosis parsing data into full objects incurring a large date
parsing overhead and writing data back into string format). Again,
osmosis is not necessary here.
3. I'm not aware of any tools capable of importing a planet into a db as
quickly as osmosis so for the meantime it appears osmosis is still
4. The snapshot task is not as fast as dumping current tables but it has
a number of advantages. It can be used as a starting point for db
changeset derivation (therefore only has to be done once), can be used
to obtain a snapshot for any point in OSM history, and it avoids all
referential integrity problems with the current planet creation
process. However it is useless at the moment due to the old TIGER data
in the history tables.
5. This is the most useful part of osmosis, but I'm not hearing a lot of
interest in this at the moment. I'd love to see an end use of osm data
hook into this capability (eg. tiles at home or mapnik) but I'm not sure if
this is seen as useful. Obviously the TIGER data needs to be cleaned up
but a real use of this capability would provide reasons for doing so.
6. Dependent on 5, and can be used for several things. A. It allows a
db to be updated to the current state much quicker than a new import.
B. It can be used to mark data as changed (eg. automate map re-render
7. There are other scripts in the repository that already do polygon
extraction. Not sure if osmosis is seen as useful here.
8. This was a task I knocked up in half an hour to show a summary of all
users activity in an osm file. For example, combined with the above
polygon task it can be used to show all active users in Australia. This
capability can be easily extended to report on anything in the osm
file. Again, not sure how useful people see this feature, perhaps
quickly hackable perl scripts are more appropriate.
9. This is something I hoped osmosis would facilitate but it is
dependent on regular changeset derivation being established. It seems
to be that there are a myriad of uses for osm data that don't fit well
into the existing schema (eg. routing, mapnik rendering, etc). If
osmosis can provided a regular feed into these alternative schemas then
it should allow them to provide services based on access to current data.
I'm keen to hear people's thoughts. I'm not sure what I should focus
on. I believe the replication features would be useful to help the
project scale to a much larger size.
I do have a way of working around the TIGER problem. If the old TIGER
user public edit flag is enabled and a snapshot of the database
performed using osmosis it will be simple to identify the problem
entities. From there I can produce a changeset to apply to the snapshot
planet which will remove the offending entities. Assuming there aren't
too many other inconsistencies between current and history tables
osmosis can then be used properly. It can then be used to extract
weekly changesets or perhaps daily changesets making it possible to
greatly improve access to OSM data. Thoughts?
More information about the dev