How about this -  The TIGER->OSM Ruby script could be rigged without a tremendous amount of trouble to return N subsets of M/N size, instead of one big file. The problem with that is how to refer to the same border node in two different subsets before either node has been checked in.

I think I might support an Ruby command line utility for the postprocessing and uploading of XML files produced by the TIGER->OSM script. Such a script could handle the coalescing logic that we currently have in the core TIGER->OSM script, et cetera.

<br><br>-B<br><br><div><span class="gmail_quote">On 7/8/07, <b class="gmail_sendername">Al Wold</b> <<a href="mailto:alwold@gmail.com">alwold@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

I found planetosm-excerpt-area.pl, but it doesn't work on the files from the TIGER converter.  The XML parsing in there is really ghetto, and it expects the attributes to be ordered in a specific way.  It also makes the assumption that id's that are referenced will always be defined before they are referenced in the file.  I don't see that anywhere in the spec.  Finally, it only includes segments/ways that are fully within the bounding box.

<br><br>I started working on a Ruby rewrite that actually uses a real XML parser, but it's looking like I will either need to build an in-memory tree (which is really not going to be very feasible given the amount of data), or do about 5 passes.  Here's the sequence I can imagine:

<br><br>- Pass 1: collect nodes that are within the bounding box<br>- Pass 2: collect segments that have nodes from pass 1; add nodes that weren't collected in pass 1 to a list<br>- Pass 3: collect ways that have segments from pass 2; add segments that weren't collected in pass 2

- Pass 4: collect segments from pass 3; add nodes to list - Pass 5: collect nodes from list generated in earlier steps Anyway, it gets really hairy.  Anyone got any good ideas?  Maybe if I wrote a really memory-conscious way of storing the data, I could get it down to 2 passes.

<br><span class="sg"><br>-Al</span><div><span class="e" id="q_113a93f720a41250_2"><br><br><div><span class="gmail_quote">On 7/6/07, <b class="gmail_sendername">Frederik Ramm</b> <<a href="mailto:frederik@remote.org" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">

frederik@remote.org</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi,<br><br>Al Wold wrote:<br>> Are you guys using JOSM to play with the data after it has been<br>> converted?  It seems like JOSM has a pretty rough time, even with small<br>> counties.  I'd be generally interested in the procedure you guys use to

<br>> do QA on the data, since I'm pretty new to this.<br><br>I'm not doing any TIGER work myself but if you have osm files that are<br>too large to work on them in JOSM, you could try using one of the<br>excerpt tools, probably 

osm-excerpt-area.pl (in SVN under applications /<br>utils somewhere), to cut a piece out of it to examine more closely in JOSM.<br><br>It is well-known that JOSM needs some optimising in terms of memory<br>usage, and this is being worked on; I don't know however if it'll

<br>complete before you've finished your TIGER work ;-)<br><br>Bye<br>Frederik<br><br></blockquote></div><br>

</span></div><br>_______________________________________________<br>dev mailing list<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:dev@openstreetmap.org">dev@openstreetmap.org</a><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev" target="_blank">

http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev</a><br><br></blockquote></div><br>