[OSM-dev] TIGER import.rb

Dave Hansen dave at sr71.net
Wed Jun 13 20:55:19 BST 2007


On Wed, 2007-06-13 at 09:52 -0700, James Marca wrote:
> On Wed, Jun 13, 2007 at 08:25:50AM -0700, Dave Hansen scribeth thusly:
> > I've hacked up the tiger import code a bit.  Instead of having it upload
> 
> Good work.
> 
> > to osm directly, I decided to have it produce .osm files that JOSM could
> > open.  I bet this might be a good approach in the future for other
> > people, too.  It allows you to make sure that the map "looks right", and
> > to run things like the JOSM validator plugin on it. 
> 
> That is a good idea.  I was thinking the same thing myself,
> specifically because the TIGER files are known to be imperfect.

Imperfect, but an incredible timesaver.  It will be great to combine
TIGER data with even just trackfiles that people collect, even if they
don't take notes when collecting the trackfiles.  TIGER can be off by
quite a bit in some places, and it's easy to figure out which street the
track is from because of the shape, and just move the TIGER data in JOSM
to fit.

> > At this point, the only validator warnings that it produces are for
> > untagged and unnamed ways.  I need to verify that the TIGER db even
> > _has_ names for these.
> > 
> > I've also created a couple of ruby classes to make handing the node,
> > segment, and way classes a bit easier.  This is my first coding in ruby,
> > ever, so please be gentle. :)
> > 
> > The node creation code will detect close nodes and merge them.  This
> > uses raw lat/lon and pretends they're actual distance units for now.
> > I'm sure this can be easily fixed up to use real distances
> > 
> > This will create ordered ways, and will coalesce all adjacent ways with
> > the same name into a single way.  It will flip segments and ways as
> 
> I would suggest somehow flagging for further checking names that are
> similar and might be the same.  

This would be a *GREAT* validator plugin addition.  Not just names, but
similar tag values in adjacent ways.  I've made mistakes before where
one way of a street is highway=primary and another is highway=secondary.
Not always an error, but still a good warning.

That reminds me.  The TIGER data like to make onramps 'name=Ramp'.  I
need to twiddle that over to the highway=motorway_link syntax.

> > necessary to make them fit.  I had some performance problems with this,
> > but I feel like it's running at a workable speed now.
> > 
> > One last thing...  Do we really need the mysql database?  After
> > importing a single zip code, the database only looks to me to be ~2MB.
> > I also don't see any SELECT statements which look too horribly complex.
> > Any chance we could just build the DB's contents in-memory?  We'd be
> > left with scripts that take TIGER .zip files and produce .osms.  We
> > could post those individually somewhere and have people familiar with
> > the area go over them before actually uploading them.  
> 
> I'd vote to keep the database.  It provides a nice place for storage
> of parsed TIGER files, letting you modularize the code, and you could
> also use it to do the geometry work (merging nearby end points, etc).
> Although I use PostgreSQL/PostGIS, I know MySQL has support for such
> simple spatial commands.
> 
> I'm curious, if you import one zipcode at a time, how does that handle
> roads that cross the zipcode?  Would storage in a database help catch
> those boundary cases?

I don't really know how the TIGER data is structured enough to tell
you. :)  From what I do understand, the DB is a pretty direct dump of
the TIGER data.  All of the work with merging ways and nodes happens
_after_ things come out of the DB.  So, it really doesn't help with
those kinds of boundary problems.

One nice thing would be to break things up even more than by zipcode,
probably by fixed size lat/lon boxes.  The TIGER file for my area is
pretty large, and it is a bit cumbersome to deal with, if only because
JOSM slows down with that much data.

Honestly, I wouldn't mind too much having to go through these afterwords
stitching the edges of the zipcodes back together.  The validator plugin
could also help here quite a bit.

-- Dave





More information about the dev mailing list