[OSM-dev] Update of TIGER ruby import code

Sat Jul 7 00:29:59 BST 2007

On Fri, 2007-07-06 at 14:14 -0700, Al Wold wrote:
> Are you guys using JOSM to play with the data after it has been
> converted?  It seems like JOSM has a pretty rough time, even with
> small counties.  I'd be generally interested in the procedure you guys
> use to do QA on the data, since I'm pretty new to this. 

JOSM has a pretty hard time.  gosmore seems to handle things a bit
better, though.

Somebody might want to spend some time on scripts that can split up the
xml files better.  That would be really helpful.

> Also, I live in Maricopa County, which is supposedly about #3 in the
> US, and I only have 1 gig of memory, and it's definitely not enough. 

Ugh.  Ruby is an *AWFUL* language for this.  It is horribly, horribly
piggish with its memory, and it is really hard to figure out the scope
of the objects.  Its garbage collector also seems to like to touch lots
of memory, keeping truly unused memory from being swapped out.  I've
really, really considered rewriting the converter into a more sane
language.

I converted Kern County, CA.  It is a 24MB TIGER zip file.  The bzip'd
xml that comes out is ~8MB and the uncompressed xml is ~100MB.  The RAM
required on a 64-bit machine is a little over 3GB (real RAM, not
including swap).  This is awful.

>  I played with having the script stop after a certain number of RT1
> lines so it wouldn't go out of control, and that's no good because of
> roads being split into multiple records.  Does it seem like it would
> be useful to be able to specify a bounding box so that you can do more
> reasonable sized sections?  I was going to try to implement it and see
> how it works. 

We could do this, but you'll have to stitch the bounding boxes back
together at some point.  

> It seems like the biggest issue I've noticed in my (limited) tests is
> that the address ranges seem to be discarded, which I imagine is
> because of the way merging.  I read the thread on addressing and it
> seems like we all need a standard to do addressing which allows the
> TIGER import to move forward as well as allows addressing tags in
> other areas of the world. 

We do need the addressing, but there's no real way to store it in OSM
right now.  I've left enough of the TIGER data around that we can add it
later, though.  

-- Dave