[OSM-dev] Update of TIGER ruby import code
Frederik Ramm
frederik at remote.org
Mon Jul 9 09:18:28 BST 2007
Hi,
> I found planetosm-excerpt-area.pl, but it doesn't work on the files
> from the TIGER converter. The XML parsing in there is really
> ghetto, and it expects the attributes to be ordered in a specific
> way. It also makes the assumption that id's that are referenced
> will always be defined before they are referenced in the file. I
> don't see that anywhere in the spec.
You are right on both counts. Many of these tools are designed for
efficiency in working with JOSM output and/or the planet file, not
for full adherence to XML. Using a proper XML parser will often
increase the processing time by an order of magnitude. However, it
*may* be worthwile to spend time making the TIGER convertor's output
look like planet data (re. ordering of attributes, and grouping of
nodes/segments/ways) because you might encounter similar assumptions
in other tools written for OSM.
The node-segment-way ordering, specifically, makes a lot of things
easier as you have already seen:
> - Pass 1: collect nodes that are within the bounding box
> - Pass 2: collect segments that have nodes from pass 1; add nodes
> that weren't collected in pass 1 to a list
> - Pass 3: collect ways that have segments from pass 2; add segments
> that weren't collected in pass 2
> - Pass 4: collect segments from pass 3; add nodes to list
> - Pass 5: collect nodes from list generated in earlier steps
These arise directly from your desire to work on un-ordered files
*and* return full ways. If you're willing to accept the clipping of
ways at the (first segment outside the) bounding box and if your file
were ordered, you can do it in one pass. JOSM can work with
incomplete ways, to a degree (although this ability may be removed in
the future as the new API doesn't require it any more).
Really depends on what you want to do with the data. I'd say a
sensible QA would be well possible with such (clipped) data. In a
way, clipping at the bounding box has its advantages - with the new
API, we're seeing the problem that people get full ways extending far
beyond their bounding box, and nothing to indicate that the nodes and
segments making up these ways are perhaps used by other segments and
ways as well (but those, lying fully outside the box, haven't been
downloaded). You would have the same problem if you were to extract
full ways - you'd have to somehow add a visual indication of the
bounding box saying "anything outside this box is likely incomplete".
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the dev
mailing list