[OSM-dev] Tiling the Planet and Missing Nodes

Stefan de Konink stefan at konink.de
Sat Mar 28 15:23:39 GMT 2009


Ben Supnik wrote:
> Hi Stefan,
> 
> Stefan de Konink wrote:
>> How do you handle the LargeFile allocation? Because I tried mmaping on 
>> some obscure 32 userspace valgrind 64 kerner lately, and that was 
>> prone to failure.
> 
> It's a (very crude) "windowing" system - a smaller chunk of the file is 
> read in, scanned, and scooted down.  If a single XML node were larger 
> than the window, it would fail.  But...even in 32 bits you can make a 
> very large window - I think we are unlikely to have a single tag be > 1 
> GB, for example. :-)

Yup :) Sounds familiar ;) But if you have the 64bit machine... you want 
to use it ;)

> In the long term I want to switch from my hacky crude lousy XML parser 
> to expat...expat runs through XML at about half the speed mine does, but 
> then it is a real correct robust XML parser whereas I am simply grepping 
> blindly for strings (and wouldn't handle legal small changes in char 
> spacing well).

Did you see my osmsucker/osmparser code?


> Run time was about 18 hours last time I ran it...the biggest limit is 
> the number of open file descriptors - only 1024 on my Mac.  Currently I 
> do multiple passes over the input file to avoid open/closing output 
> files...the locality of output was really poor when I examined the problem.

I cannot compare it with your splitter, what basically my code produces 
is CSV from the entire file, since that makes it far easier to process 
(or insert into a database). But I think on the planet it is much faster 
than your 18 hours in mmap mode (thus uncompressed).


> The program uses gzip on both input/output to let someone work entirely 
> in terms of compressed XML.  It could easily be converted to bz2.

Due to the non-paralel capabilities of gzip this will not scale. It is 
even faster to decompress first :(


>> Welcome in the fantastic world of OSM :D I thought had fixed all those 
>> instances two days ago :) Cool you found another one.
> 
> At least - it only finds one and dies right now. :-)

I think you 26-03 or 27-03 should have all fixes.

>> The idea with that situation is for ways:
>> - Download the way; and just reupload it, then it will be solved in 
>> the next run (because the way download hides invisible nodes)
> 
> Okay - so simply omitting the node would be a reasonable thing to do. 
> Thanks!!

Keep on the good work :)


Stefan




More information about the dev mailing list