[OSM-dev] Update of TIGER ruby import code

Mon Jul 9 09:18:28 BST 2007

Hi,

> I found planetosm-excerpt-area.pl, but it doesn't work on the files  
> from the TIGER converter.  The XML parsing in there is really  
> ghetto, and it expects the attributes to be ordered in a specific  
> way.  It also makes the assumption that id's that are referenced  
> will always be defined before they are referenced in the file.  I  
> don't see that anywhere in the spec.

You are right on both counts. Many of these tools are designed for  
efficiency in working with JOSM output and/or the planet file, not  
for full adherence to XML. Using a proper XML parser will often  
increase the processing time by an order of magnitude. However, it  
*may* be worthwile to spend time making the TIGER convertor's output  
look like planet data (re. ordering of attributes, and grouping of  
nodes/segments/ways) because you might encounter similar assumptions  
in other tools written for OSM.

The node-segment-way  ordering, specifically, makes a lot of things  
easier as you have already seen:

> - Pass 1: collect nodes that are within the bounding box
> - Pass 2: collect segments that have nodes from pass 1; add nodes  
> that weren't collected in pass 1 to a list
> - Pass 3: collect ways that have segments from pass 2; add segments  
> that weren't collected in pass 2
> - Pass 4: collect segments from pass 3; add nodes to list
> - Pass 5: collect nodes from list generated in earlier steps

These arise directly from your desire to work on un-ordered files  
*and* return full ways. If you're willing to accept the clipping of  
ways at the (first segment outside the) bounding box and if your file  
were ordered, you can do it in one pass. JOSM can work with  
incomplete ways, to a degree (although this ability may be removed in  
the future as the new API doesn't require it any more).

Really depends on what you want to do with the data. I'd say a  
sensible QA would be well possible with such (clipped) data. In a  
way, clipping at the bounding box has its advantages - with the new  
API, we're seeing the problem that people get full ways extending far  
beyond their bounding box, and nothing to indicate that the nodes and  
segments making up these ways are perhaps used by other segments and  
ways as well (but those, lying fully outside the box, haven't been  
downloaded). You would have the same problem if you were to extract  
full ways - you'd have to somehow add a visual indication of the  
bounding box saying "anything outside this box is likely incomplete".

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'