[OSM-dev] Relations without members?

Sat Oct 13 03:12:53 BST 2007

On 10/12/07, Brett Henderson <brett at bretth.com> wrote:
> Karl Newman wrote:
> > On 10/12/07, Brett Henderson <brett at bretth.com> wrote:
> >
> >> As for your new requirements, could you define what you would like done
> >> with each entity type at the area edges (I say area because this applies
> >> to more general polygon extraction also).  We can then figure out a way
> >> of building a tool to support them.
> >>
> > Well, for polygons (closed ways), I was thinking about creating
> > synthetic nodes right at the border as Frederik suggested and joining
> > the polygon straight down/over to the continuation of the way where it
> > comes back inside the bounding box. Then when the sections are
> > recombined, they'll butt together. It may be possible to treat this
> > the same manner as I was thinking for ways (keep one stray node), but
> > it could be a real headache when dealing with nested polygons if the
> > nested polygons ALSO are split by the border. Maybe the synthetic
> > nodes at the border is the best way to treat this all around.
> >
> Okay, this sounds difficult.  Probably not something I better commit to
> helping you with, my TODO list is already pretty long :-)  It almost
> sounds like you need to write the data into a new schema optimised for
> this purpose that allows you to track which entities are synthetic and
> which are based on real osm entities.  But you'll be a better judge of
> that than me.
Possibly. I'm willing to write the changes, that's no problem. You may
be right about the need for a new schema, but if I end up making this
part of the Osmosis pipeline, it will be an endpoint only, so no need
to worry about passing it on to another process. That means the
internal representation can be whatever works for me. I'm thinking it
could just be a decorator or composite (or whatever turns out to be
the appropriate patterns) for the existing Node and Way (and WayNode?)
classes to contain the additional information to support routing.
>
> Are you planning to try to use diffs to keep this up to date or are you
> happy to run from a complete planet dump each time?  Managing
> application of diffs is likely to open up a whole new layer of complexity.
No plan to support diffs--just operate on a complete planet file (or
subsection). This project is going to be difficult enough.
> >> Perhaps one way of supporting problem ways is to keep all nodes where
> >> one or two nodes stray outside the box, but drop all other nodes in
> >> between the first and last stray node.  This would make the ways run off
> >> and back onto the map in the correct line.  Not sure how that affects
> >> any routing algorithms.
> >>
> > That would probably be fine, but would have to be split into two
> > separate ways (between the two stray nodes) by the routing algorithm.
> > For Garmin GPS maps, the stray nodes need to be marked as "external"
> > to link the routing with the external nodes in the continued way in
> > the adjacent section.
> >
> Yeah, okay.  Splitting ways also requires creation of synthetic ids
> which leads to the exact problem I was trying to avoid with nodes.  I
> just shifted the problem.  That's without considering polygons.  Again,
> this sounds hard ... but interesting :-)
Hard is okay. :-)

> >> As for supporting streaming, that is more difficult.  Classes such as
> >> SimpleObjectStore will help and are what I used for file-based merge
> >> sorting of planet sized files where I need to revisit objects many times
> >> during a sort.  Although SimpleObjectStore is very simple and is not
> >> indexed in any way, other classes such as ChunkedObjectStore go some way
> >> towards solving the problem.  A fully indexed object store with random
> >> lookups will require some more work though, it may even be worth
> >> thinking about a proper database like berkeley db for this although that
> >> is not necessary if you're prepared to get your hands dirty and roll
> >> your own.  Another problem you'll face when doing this is speed, my
> >> current object stores use java object serialisation which is very slow,
> >> I'd like to implement a custom serialiser and deserialiser for osm data
> >> types which should provide huge performance speedups.
> >>
> >>
> > Yes, it may be necessary to work with a database--the nature of this
> > project is really a random-access relational problem which may be
> > well-suited to a database. The concept of assigning types to ways and
> > setting up the routing between ways via shared nodes is not difficult,
> > but the problem is how to do it with massive data sets without pulling
> > everything into memory. It's easy to deal with the edge problem once
> > by manually dealing with the ways that cross the borders, but I
> > anticipate generating new GPS maps on a regular basis as the map data
> > is updated, so it needs to be something that can be automated.
> >
> If there's anything I can help with let me know.  While I'd love to see
> osmosis used as a base for this, it sounds like osmosis isn't going to
> offer you a huge amount of support as it stands.  If you do go down the
> path of writing your own custom file storage though feel free to use my
> code as a base and submit me patches for smarter/faster ways of doing
> it.  I'm still thinking about writing a custom serialiser/deserialiser
> for osmosis which would help with temporary storage of large amounts of
> data, it's just not right at the top of the list at the moment but my
> list changes daily depending on what people need.
Well, the main thing I like about Osmosis is the well-built pipeline
and support for different data sources. There's some great design
patterns in there.

What I had in mind initially is to consume all the nodes and stuff
them into a temporary storage file. Then I'd read all the ways and
match the tags against a set of rules. For each way that matches a
rule, mark each referenced node as required so it will be pulled in
when the nodes are read again. Also, for a way which is a routable
type, mark all of its nodes as such. If a node is marked as routable
at least twice, then it becomes a routing node. The ways we're
interested in (i.e., those which match a rule) will have to be read
again, too, to construct the output. What happens next is a little
fuzzy. At some point, it will need to have in memory the way and it's
GPS map output type, the latitudes and longitudes of each node, and,
for routable ways, which nodes are routing nodes. Then when relations
come into play, particularly turn restrictions, I'll have to know each
involved way and their routing nodes. Hmm... The more I think about
it, the more it seems that this is really better suited as a database
application. I don't know. I'll think about it some more and maybe
I'll come up with something more clever.

Karl