[OSM-dev] TIGER import.rb

James Marca jmarca at translab.its.uci.edu
Wed Jun 13 17:52:14 BST 2007

On Wed, Jun 13, 2007 at 08:25:50AM -0700, Dave Hansen scribeth thusly:
> I've hacked up the tiger import code a bit.  Instead of having it upload

Good work.

> to osm directly, I decided to have it produce .osm files that JOSM could
> open.  I bet this might be a good approach in the future for other
> people, too.  It allows you to make sure that the map "looks right", and
> to run things like the JOSM validator plugin on it. 

That is a good idea.  I was thinking the same thing myself,
specifically because the TIGER files are known to be imperfect.

> At this point, the only validator warnings that it produces are for
> untagged and unnamed ways.  I need to verify that the TIGER db even
> _has_ names for these.
> I've also created a couple of ruby classes to make handing the node,
> segment, and way classes a bit easier.  This is my first coding in ruby,
> ever, so please be gentle. :)
> The node creation code will detect close nodes and merge them.  This
> uses raw lat/lon and pretends they're actual distance units for now.
> I'm sure this can be easily fixed up to use real distances
> This will create ordered ways, and will coalesce all adjacent ways with
> the same name into a single way.  It will flip segments and ways as

I would suggest somehow flagging for further checking names that are
similar and might be the same.  

> necessary to make them fit.  I had some performance problems with this,
> but I feel like it's running at a workable speed now.
> One last thing...  Do we really need the mysql database?  After
> importing a single zip code, the database only looks to me to be ~2MB.
> I also don't see any SELECT statements which look too horribly complex.
> Any chance we could just build the DB's contents in-memory?  We'd be
> left with scripts that take TIGER .zip files and produce .osms.  We
> could post those individually somewhere and have people familiar with
> the area go over them before actually uploading them.  

I'd vote to keep the database.  It provides a nice place for storage
of parsed TIGER files, letting you modularize the code, and you could
also use it to do the geometry work (merging nearby end points, etc).
Although I use PostgreSQL/PostGIS, I know MySQL has support for such
simple spatial commands.

I'm curious, if you import one zipcode at a time, how does that handle
roads that cross the zipcode?  Would storage in a database help catch
those boundary cases?


More information about the dev mailing list