[OSM-dev] Mass imports (TIGER and AND)

Brandon Martin-Anderson badhill at gmail.com
Tue Aug 28 06:08:19 BST 2007


Hey,

Dave I'd just like to say thanks for spending so much time on this. Since
the topic was raised at Wherecamp and work subsequently began you have been
by far the greatest contributor to this project. You're awesome.

Regarding Ruby on rails: there's huge amount of overhead involved with
catching a request and routing it to the appropriate controller. I'm going
to guess this is doing the lion's share of the work you're seeing 'ruby
code' do. What would really improve performance is allowing a single post
request to contain a payload of several, ideally millions, of OSM elements
all at once. This would eliminate the round trip through the rails
controller routing framework and leave only the round trip between the
controller and the database, which by my estimation ain't too bad at all. It
wouldn't even be that hard to implement, though you'd have to upload each
chunk within the confines of a transaction. And as the OSM database doesn't
support transactions, you'd just have to fake them out using some sort of
high-level transaction ID.

That's my proposal.

-B

On 8/28/07, Dave Hansen <dave at sr71.net> wrote:
>
> I've been very painfully uploading the TIGER-generated data through
> JOSM.  At the rate I'm going it will probably take 5 or 10 years to
> upload the entire US.  Literally.  I'm uploading one or two counties a
> day, and there are 3,234 counties in the country.
>
> So, I installed the rails port on my laptop, and sicked JOSM on it.  The
> uploads are maybe twice as fast as they are to the main OSM server.  So,
> the round-trip-time actually isn't that _huge_ of a performance
> bottleneck.
>
> The thing that *IS* on my laptop is the ruby code.  It is responsible
> for 90% of the CPU time, and the CPUs are maxed out.  mysql, on the
> other hand, is responsible for ~3% of total cpu time.  Even with my
> piddly notebook hard drive, the I/O wait time is under 1%.
>
> People have been saying that we should write the import code in ruby to
> run on the server and use the existing rails code.  If the ruby code
> itself is the bottleneck and not the round-trip time or the disk, is
> doing the import through the ruby code going to even help?
>
> -- Dave
>
>
> _______________________________________________
> dev mailing list
> dev at openstreetmap.org
> http://lists.openstreetmap.org/cgi-bin/mailman/listinfo/dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20070828/53896d84/attachment.html>


More information about the dev mailing list