[OSM-dev] OWL + OSM Activity Server
Paweł Paprota
ppawel at fastmail.fm
Thu Oct 18 11:07:10 BST 2012
Hi Matt/Ian,
I'd like to continue this thread as since yesterday I have been looking
into OWL code (specifically the C++ and the database parts) and I have
some thoughts.
One major difference in our approach is how we process OSC files.
Correct me if I'm wrong but OWL goes through an OSC file just like the
XML parser (libxml TextReader to be exact) does - top-down, element by
element. Every change is examined and saved to the database.
What I'm doing (see [1]) is I scan the whole OSC file to find all
distinct changesets and then process each changeset. So in effect I
split the OSC file into smaller OSC files that only contain changes for
one changeset.
I need to do it this way because of the nature of an activity - right
now I consider one changeset as one "mapping" activity. This means that
for example I need to feed the whole changeset at once to Changemonger
in order to get back nice changeset description.
It also means that I need to have the database in a state just before
given changeset is applied in order to find changeset's geometry and
allow Changemonger to query the database and get correct state as well.
Next is scalability. I see that in OWL there is a quite sophisticated
"tiling" algorithm that distributes chagnes between database tables.
This looks scalable to me :-) One question would be how do you get the
data out of this structure - I guess that bounding box can be
transformed into a list of tiles, then into a list of tables and then
you do one huge SQL UNION? I have not yet looked into the Rails part of
OWL to see how the data is used.
In any case, I'm interested in your opinion on the following:
1. Have a table of changesets (or corresponding activities - does not
matter) with a geometry column that represent changeset's geometry -
node coordinates, way linestrings, relations translated into geometry etc.
I guess at this point that would be about ~15M rows judging where
changeset id is nowadays.
2. Simply use PostGIS and the GIST index on the geometry column to query
the thing. Perhaps first do a simple bounding box vs bounding box query
to narrow down the suspects and then ST_Intersects.
So the question is - do you think this scales? Specifically, how long
would a typical query take? I plan to do some tests with artificial data
since I don't have the server resources to generate such table from real
data (OSC files "since forever" and a planet database).
I'm asking this question because this is how I currently do things. So
if the answer is "no" then it's clear that I need to go into OWL.
And finally a bunch of random questions:
1. How many changesets and changes do you currently have in your planet
OWL instance?
2. Perhaps you answered that already but just to clarify - do you
already have the code to respond to bounding box/tile queries (e.g.
"return all changes within X"?) on this new database structure with >4k
tables?
3. If yes to (2), how long does a query take?
[1]
https://github.com/ppawel/osm-activity-publishers/blob/master/changeset-publisher/process_osc.rb#L139
Paweł
More information about the dev
mailing list