[OSM-dev] TIGER import.rb
jmarca at translab.its.uci.edu
Wed Jun 13 17:52:14 BST 2007
On Wed, Jun 13, 2007 at 08:25:50AM -0700, Dave Hansen scribeth thusly:
> I've hacked up the tiger import code a bit. Instead of having it upload
> to osm directly, I decided to have it produce .osm files that JOSM could
> open. I bet this might be a good approach in the future for other
> people, too. It allows you to make sure that the map "looks right", and
> to run things like the JOSM validator plugin on it.
That is a good idea. I was thinking the same thing myself,
specifically because the TIGER files are known to be imperfect.
> At this point, the only validator warnings that it produces are for
> untagged and unnamed ways. I need to verify that the TIGER db even
> _has_ names for these.
> I've also created a couple of ruby classes to make handing the node,
> segment, and way classes a bit easier. This is my first coding in ruby,
> ever, so please be gentle. :)
> The node creation code will detect close nodes and merge them. This
> uses raw lat/lon and pretends they're actual distance units for now.
> I'm sure this can be easily fixed up to use real distances
> This will create ordered ways, and will coalesce all adjacent ways with
> the same name into a single way. It will flip segments and ways as
I would suggest somehow flagging for further checking names that are
similar and might be the same.
> necessary to make them fit. I had some performance problems with this,
> but I feel like it's running at a workable speed now.
> One last thing... Do we really need the mysql database? After
> importing a single zip code, the database only looks to me to be ~2MB.
> I also don't see any SELECT statements which look too horribly complex.
> Any chance we could just build the DB's contents in-memory? We'd be
> left with scripts that take TIGER .zip files and produce .osms. We
> could post those individually somewhere and have people familiar with
> the area go over them before actually uploading them.
I'd vote to keep the database. It provides a nice place for storage
of parsed TIGER files, letting you modularize the code, and you could
also use it to do the geometry work (merging nearby end points, etc).
Although I use PostgreSQL/PostGIS, I know MySQL has support for such
simple spatial commands.
I'm curious, if you import one zipcode at a time, how does that handle
roads that cross the zipcode? Would storage in a database help catch
those boundary cases?
More information about the dev