[OSM-dev] Update of TIGER ruby import code

Fri Jun 22 05:42:25 BST 2007

First of all, all of the actual TIGER parsing code here came from
Brandon Martin-Anderson.  All I did was spit out some OSM objects from
the data he produced.  He did all of the hard work.

This code takes a set of TIGER shapefiles, and turns them into an
OSM .xml file.  That file can then be opened in JOSM and reviewed before
uploading.  If anyone just wants to look at their county, I'll be happy
to run this for them and I'll just send you the .osm file.

You can get the code here:

	http://sr71.net/~dave/osm/tiger/

But, don't go uploading anything produced with this just yet.  We need
to make sure it's actually producing good, sane data.  Feel free to go
run it on a county that you know well, and closely examine the output.
Report anything that looks strange or incorrect.  

(BTW, If anyone wants to take this mail and stick it in the wiki, I'd
really appreciate it.)

Changes from the last time I posted:
 - added lots of nifty percentage progress meters so you can see where  
   your precious CPU is going
 - greatly reduced memory usage by freeing up TIGER data as we convert
 - added --no-coalesce option to tiger_to_osm.rb to turn off all way
   coalescing
 - dave_model.rb has a max_angle variable.  As two ways are about 
   to be joined together, if their intersection has an angle greater
   than this variable, we won't join them
 - nodes from ways of different types (eg. highway, power) are never
   shared

Run it like this:

	sh tiger-zip-to-osm.sh zips/IL/TGR17019.ZIP

If you use that shell script, it does proper locking, and you can run
multiple instances of the converter on a machine without duplicating
work (it's SMP safe, effectively).

I feel like it is pretty darn slow.  I guess that's mostly ruby's fault,
but I'd appreciate any tuning tips that people have for performance
tuning ruby.  You'll need quite a bit of RAM to run this for any
large-size counties.  At *LEAST* a gig, probably two.  I put a machine
out of memory pretty badly that had 8GB in it because I ran several
instances of this at once, and it didn't have and swap.

You can get TIGER zipfiles from here:

http://www2.census.gov/geo/tiger/tiger2006se/OR/

To find your county, look here:

http://www.census.gov/geo/www/fips/fips65/data/national.txt

Champaign County, IL has an entry like this in that file:

IL,17,019,Champaign,H1

You can find its zip file here:

http://www2.census.gov/geo/tiger/tiger2006se/IL/TGR17019.ZIP

You could also do something like this if you wanted to get a bunch of
counties with the name "Middle" in them"

wget -O - http://www.census.gov/geo/www/fips/fips65/data/national.txt\
	| grep Middle | perl -pe 's#,#/TGR#' | perl -pe 's#,##' \
	| perl -pe 's#,.*#.ZIP#' \
	| awk '{print "http://www2.census.gov/geo/tiger/tiger2006se/" $1}'	\
	| xargs wget

-- Dave