[OSM-dev] Update of TIGER ruby import code
Al Wold
alwold at gmail.com
Mon Jul 9 05:30:21 BST 2007
I found planetosm-excerpt-area.pl, but it doesn't work on the files from the
TIGER converter. The XML parsing in there is really ghetto, and it expects
the attributes to be ordered in a specific way. It also makes the
assumption that id's that are referenced will always be defined before they
are referenced in the file. I don't see that anywhere in the spec.
Finally, it only includes segments/ways that are fully within the bounding
box.
I started working on a Ruby rewrite that actually uses a real XML parser,
but it's looking like I will either need to build an in-memory tree (which
is really not going to be very feasible given the amount of data), or do
about 5 passes. Here's the sequence I can imagine:
- Pass 1: collect nodes that are within the bounding box
- Pass 2: collect segments that have nodes from pass 1; add nodes that
weren't collected in pass 1 to a list
- Pass 3: collect ways that have segments from pass 2; add segments that
weren't collected in pass 2
- Pass 4: collect segments from pass 3; add nodes to list
- Pass 5: collect nodes from list generated in earlier steps
Anyway, it gets really hairy. Anyone got any good ideas? Maybe if I wrote
a really memory-conscious way of storing the data, I could get it down to 2
passes.
-Al
On 7/6/07, Frederik Ramm <frederik at remote.org> wrote:
>
> Hi,
>
> Al Wold wrote:
> > Are you guys using JOSM to play with the data after it has been
> > converted? It seems like JOSM has a pretty rough time, even with small
> > counties. I'd be generally interested in the procedure you guys use to
> > do QA on the data, since I'm pretty new to this.
>
> I'm not doing any TIGER work myself but if you have osm files that are
> too large to work on them in JOSM, you could try using one of the
> excerpt tools, probably osm-excerpt-area.pl (in SVN under applications /
> utils somewhere), to cut a piece out of it to examine more closely in
> JOSM.
>
> It is well-known that JOSM needs some optimising in terms of memory
> usage, and this is being worked on; I don't know however if it'll
> complete before you've finished your TIGER work ;-)
>
> Bye
> Frederik
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070708/996761e6/attachment.html>
More information about the dev
mailing list