[OSM-dev] Mass imports (TIGER and AND)

Tue Aug 28 20:01:25 BST 2007

On Tue, 2007-08-28 at 08:29 +0100, Tom Hughes wrote:
> In message <1188275447.28903.15.camel at localhost>
>         Dave Hansen <dave at sr71.net> wrote:
> > The thing that *IS* on my laptop is the ruby code.  It is responsible
> > for 90% of the CPU time, and the CPUs are maxed out.  mysql, on the
> > other hand, is responsible for ~3% of total cpu time.  Even with my
> > piddly notebook hard drive, the I/O wait time is under 1%.
> 
> That's quite impressive, because the CPUs on our web servers never
> get anywhere near maxing out, and between then they are processing 
> anything up to about a dozen requests each at any one time.

Yeah, that's very interesting.  What is the actual bottleneck?  Do you
see actual I/O wait time on the server?

Here's ruby-prof output for a minute or so of the completely cpu-bound
server:

http://www.sr71.net/~dave/osm/ruby-prof.server.gz

> > People have been saying that we should write the import code in ruby to
> > run on the server and use the existing rails code.  If the ruby code
> > itself is the bottleneck and not the round-trip time or the disk, is
> > doing the import through the ruby code going to even help?
> 
> As somebody else has pointed out, it is only the object model that you
> would need to use so all overhead of parsing the requests would be
> avoided.
> 
> I think the problem with my scheme is going to be keeping the amount
> of history required to map the negative IDs in the change file to the
> allocated positive IDs as things are added. That will use up a lot of
> memory in ruby.

That shouldn't be a problem at all, in practice.  Who says we have to
store it in memory? ;)  I can always just create a temporary directory
and put all of the translations in there.

$ cat /tmpdir/node.-56123234
123456

-- Dave