[OSM-dev] Update of TIGER ruby import code

Mon Jul 9 06:18:20 BST 2007

I'd be interested to hear how you go with this.  I wrote a bounding box 
task for my osmosis tool and ran into similar problems.

For that tool I cheated and modified ways to exclude ways and segments 
that weren't within the box.  In other words I didn't drop ways that 
strayed outside the box, I modified them to only use data in the box.  
Ugly and not a good solution.

Al Wold wrote:
> I found planetosm-excerpt-area.pl, but it doesn't work on the files 
> from the TIGER converter.  The XML parsing in there is really ghetto, 
> and it expects the attributes to be ordered in a specific way.  It 
> also makes the assumption that id's that are referenced will always be 
> defined before they are referenced in the file.  I don't see that 
> anywhere in the spec.  Finally, it only includes segments/ways that 
> are fully within the bounding box.
>
> I started working on a Ruby rewrite that actually uses a real XML 
> parser, but it's looking like I will either need to build an in-memory 
> tree (which is really not going to be very feasible given the amount 
> of data), or do about 5 passes.  Here's the sequence I can imagine:
>
> - Pass 1: collect nodes that are within the bounding box
> - Pass 2: collect segments that have nodes from pass 1; add nodes that 
> weren't collected in pass 1 to a list
> - Pass 3: collect ways that have segments from pass 2; add segments 
> that weren't collected in pass 2
> - Pass 4: collect segments from pass 3; add nodes to list
> - Pass 5: collect nodes from list generated in earlier steps
>
> Anyway, it gets really hairy.  Anyone got any good ideas?  Maybe if I 
> wrote a really memory-conscious way of storing the data, I could get 
> it down to 2 passes.