[OSM-dev] Update of TIGER ruby import code

Al Wold alwold at gmail.com
Fri Jul 6 22:14:51 BST 2007


Are you guys using JOSM to play with the data after it has been converted?
It seems like JOSM has a pretty rough time, even with small counties.  I'd
be generally interested in the procedure you guys use to do QA on the data,
since I'm pretty new to this.

Also, I live in Maricopa County, which is supposedly about #3 in the US, and
I only have 1 gig of memory, and it's definitely not enough.  I played with
having the script stop after a certain number of RT1 lines so it wouldn't go
out of control, and that's no good because of roads being split into
multiple records.  Does it seem like it would be useful to be able to
specify a bounding box so that you can do more reasonable sized sections?  I
was going to try to implement it and see how it works.

It seems like the biggest issue I've noticed in my (limited) tests is that
the address ranges seem to be discarded, which I imagine is because of the
way merging.  I read the thread on addressing and it seems like we all need
a standard to do addressing which allows the TIGER import to move forward as
well as allows addressing tags in other areas of the world.

It looks like the conversion works pretty well, though.  Good work guys; we
should keep hacking on this and eventually get some data imported.  With the
TIGER data in there, I think it will really make the US more appealing to
work on.

-Al

On 6/21/07, Dave Hansen <dave at sr71.net> wrote:
>
> First of all, all of the actual TIGER parsing code here came from
> Brandon Martin-Anderson.  All I did was spit out some OSM objects from
> the data he produced.  He did all of the hard work.
>
> This code takes a set of TIGER shapefiles, and turns them into an
> OSM .xml file.  That file can then be opened in JOSM and reviewed before
> uploading.  If anyone just wants to look at their county, I'll be happy
> to run this for them and I'll just send you the .osm file.
>
> You can get the code here:
>
>         http://sr71.net/~dave/osm/tiger/
>
> But, don't go uploading anything produced with this just yet.  We need
> to make sure it's actually producing good, sane data.  Feel free to go
> run it on a county that you know well, and closely examine the output.
> Report anything that looks strange or incorrect.
>
> (BTW, If anyone wants to take this mail and stick it in the wiki, I'd
> really appreciate it.)
>
> Changes from the last time I posted:
> - added lots of nifty percentage progress meters so you can see where
>    your precious CPU is going
> - greatly reduced memory usage by freeing up TIGER data as we convert
> - added --no-coalesce option to tiger_to_osm.rb to turn off all way
>    coalescing
> - dave_model.rb has a max_angle variable.  As two ways are about
>    to be joined together, if their intersection has an angle greater
>    than this variable, we won't join them
> - nodes from ways of different types (eg. highway, power) are never
>    shared
>
> Run it like this:
>
>         sh tiger-zip-to-osm.sh zips/IL/TGR17019.ZIP
>
> If you use that shell script, it does proper locking, and you can run
> multiple instances of the converter on a machine without duplicating
> work (it's SMP safe, effectively).
>
> I feel like it is pretty darn slow.  I guess that's mostly ruby's fault,
> but I'd appreciate any tuning tips that people have for performance
> tuning ruby.  You'll need quite a bit of RAM to run this for any
> large-size counties.  At *LEAST* a gig, probably two.  I put a machine
> out of memory pretty badly that had 8GB in it because I ran several
> instances of this at once, and it didn't have and swap.
>
> You can get TIGER zipfiles from here:
>
> http://www2.census.gov/geo/tiger/tiger2006se/OR/
>
> To find your county, look here:
>
> http://www.census.gov/geo/www/fips/fips65/data/national.txt
>
> Champaign County, IL has an entry like this in that file:
>
> IL,17,019,Champaign,H1
>
> You can find its zip file here:
>
> http://www2.census.gov/geo/tiger/tiger2006se/IL/TGR17019.ZIP
>
> You could also do something like this if you wanted to get a bunch of
> counties with the name "Middle" in them"
>
> wget -O - http://www.census.gov/geo/www/fips/fips65/data/national.txt\
>         | grep Middle | perl -pe 's#,#/TGR#' | perl -pe 's#,##' \
>         | perl -pe 's#,.*#.ZIP#' \
>         | awk '{print "http://www2.census.gov/geo/tiger/tiger2006se/"
> $1}'      \
>         | xargs wget
>
>
> -- Dave
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070706/20d5f71c/attachment.html>


More information about the dev mailing list