[OSM-dev] Update of TIGER ruby import code

Mon Jul 9 23:47:20 BST 2007

So, does it seem like it would be a good idea to clamp down the spec and say
that any entities that are referenced by id must be defined in the file
before they are referenced?  That would help to avoid problems in the
future, and I think it would be a lot easier for generators to order the
data than it would be to try to implement something like my 5-pass
algorithm.

I'd like to avoid writing another version of this script that makes
shortcuts and doesn't work when you throw it a variation, but it does seem
like it will take quite a bit to make it work on unordered files :).  I
think for now, I will just have it generate a warning when it encounters an
undefined id, so it will work on ordered files, but at least let the user
know it is not working if it encounters a problem.

On the subject of ways that go outside of the box, I think it makes the most
sense to have a command line option.  If you divide a large file into
sections, it might be nice to not have to merge ways from adjacent boxes, so
that can be an option to have it do another pass and collect missing
segments from any ways that are in the box.

-Al

On 7/9/07, Frederik Ramm <frederik at remote.org> wrote:
>
> Hi,
>
> > I found planetosm-excerpt-area.pl, but it doesn't work on the files
> > from the TIGER converter.  The XML parsing in there is really
> > ghetto, and it expects the attributes to be ordered in a specific
> > way.  It also makes the assumption that id's that are referenced
> > will always be defined before they are referenced in the file.  I
> > don't see that anywhere in the spec.
>
> You are right on both counts. Many of these tools are designed for
> efficiency in working with JOSM output and/or the planet file, not
> for full adherence to XML. Using a proper XML parser will often
> increase the processing time by an order of magnitude. However, it
> *may* be worthwile to spend time making the TIGER convertor's output
> look like planet data (re. ordering of attributes, and grouping of
> nodes/segments/ways) because you might encounter similar assumptions
> in other tools written for OSM.
>
> The node-segment-way  ordering, specifically, makes a lot of things
> easier as you have already seen:
>
> > - Pass 1: collect nodes that are within the bounding box
> > - Pass 2: collect segments that have nodes from pass 1; add nodes
> > that weren't collected in pass 1 to a list
> > - Pass 3: collect ways that have segments from pass 2; add segments
> > that weren't collected in pass 2
> > - Pass 4: collect segments from pass 3; add nodes to list
> > - Pass 5: collect nodes from list generated in earlier steps
>
> These arise directly from your desire to work on un-ordered files
> *and* return full ways. If you're willing to accept the clipping of
> ways at the (first segment outside the) bounding box and if your file
> were ordered, you can do it in one pass. JOSM can work with
> incomplete ways, to a degree (although this ability may be removed in
> the future as the new API doesn't require it any more).
>
> Really depends on what you want to do with the data. I'd say a
> sensible QA would be well possible with such (clipped) data. In a
> way, clipping at the bounding box has its advantages - with the new
> API, we're seeing the problem that people get full ways extending far
> beyond their bounding box, and nothing to indicate that the nodes and
> segments making up these ways are perhaps used by other segments and
> ways as well (but those, lying fully outside the box, haven't been
> downloaded). You would have the same problem if you were to extract
> full ways - you'd have to somehow add a visual indication of the
> bounding box saying "anything outside this box is likely incomplete".
>
> Bye
> Frederik
>
> --
> Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070709/a9ba353a/attachment.html>