[OSM-dev] Update of TIGER ruby import code

Mon Jul 9 07:00:23 BST 2007

How about this -

The TIGER->OSM Ruby script could be rigged without a tremendous amount of
trouble to return N subsets of M/N size, instead of one big file. The
problem with that is how to refer to the same border node in two different
subsets before either node has been checked in.

I think I might support an Ruby command line utility for the postprocessing
and uploading of XML files produced by the TIGER->OSM script. Such a script
could handle the coalescing logic that we currently have in the core
TIGER->OSM script, et cetera.

-B

On 7/8/07, Al Wold <alwold at gmail.com> wrote:
>
> I found planetosm-excerpt-area.pl, but it doesn't work on the files from
> the TIGER converter.  The XML parsing in there is really ghetto, and it
> expects the attributes to be ordered in a specific way.  It also makes the
> assumption that id's that are referenced will always be defined before they
> are referenced in the file.  I don't see that anywhere in the spec.
> Finally, it only includes segments/ways that are fully within the bounding
> box.
>
> I started working on a Ruby rewrite that actually uses a real XML parser,
> but it's looking like I will either need to build an in-memory tree (which
> is really not going to be very feasible given the amount of data), or do
> about 5 passes.  Here's the sequence I can imagine:
>
> - Pass 1: collect nodes that are within the bounding box
> - Pass 2: collect segments that have nodes from pass 1; add nodes that
> weren't collected in pass 1 to a list
> - Pass 3: collect ways that have segments from pass 2; add segments that
> weren't collected in pass 2
> - Pass 4: collect segments from pass 3; add nodes to list
> - Pass 5: collect nodes from list generated in earlier steps
>
> Anyway, it gets really hairy.  Anyone got any good ideas?  Maybe if I
> wrote a really memory-conscious way of storing the data, I could get it down
> to 2 passes.
>
> -Al
>
> On 7/6/07, Frederik Ramm <frederik at remote.org> wrote:
> >
> > Hi,
> >
> > Al Wold wrote:
> > > Are you guys using JOSM to play with the data after it has been
> > > converted?  It seems like JOSM has a pretty rough time, even with
> > small
> > > counties.  I'd be generally interested in the procedure you guys use
> > to
> > > do QA on the data, since I'm pretty new to this.
> >
> > I'm not doing any TIGER work myself but if you have osm files that are
> > too large to work on them in JOSM, you could try using one of the
> > excerpt tools, probably osm-excerpt-area.pl (in SVN under applications /
> > utils somewhere), to cut a piece out of it to examine more closely in
> > JOSM.
> >
> > It is well-known that JOSM needs some optimising in terms of memory
> > usage, and this is being worked on; I don't know however if it'll
> > complete before you've finished your TIGER work ;-)
> >
> > Bye
> > Frederik
> >
> >
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070708/667f2c0c/attachment.html>