[OSM-dev] Tiling the Planet and Missing Nodes
Stefan de Konink
stefan at konink.de
Sat Mar 28 15:23:39 GMT 2009
Ben Supnik wrote:
> Hi Stefan,
>
> Stefan de Konink wrote:
>> How do you handle the LargeFile allocation? Because I tried mmaping on
>> some obscure 32 userspace valgrind 64 kerner lately, and that was
>> prone to failure.
>
> It's a (very crude) "windowing" system - a smaller chunk of the file is
> read in, scanned, and scooted down. If a single XML node were larger
> than the window, it would fail. But...even in 32 bits you can make a
> very large window - I think we are unlikely to have a single tag be > 1
> GB, for example. :-)
Yup :) Sounds familiar ;) But if you have the 64bit machine... you want
to use it ;)
> In the long term I want to switch from my hacky crude lousy XML parser
> to expat...expat runs through XML at about half the speed mine does, but
> then it is a real correct robust XML parser whereas I am simply grepping
> blindly for strings (and wouldn't handle legal small changes in char
> spacing well).
Did you see my osmsucker/osmparser code?
> Run time was about 18 hours last time I ran it...the biggest limit is
> the number of open file descriptors - only 1024 on my Mac. Currently I
> do multiple passes over the input file to avoid open/closing output
> files...the locality of output was really poor when I examined the problem.
I cannot compare it with your splitter, what basically my code produces
is CSV from the entire file, since that makes it far easier to process
(or insert into a database). But I think on the planet it is much faster
than your 18 hours in mmap mode (thus uncompressed).
> The program uses gzip on both input/output to let someone work entirely
> in terms of compressed XML. It could easily be converted to bz2.
Due to the non-paralel capabilities of gzip this will not scale. It is
even faster to decompress first :(
>> Welcome in the fantastic world of OSM :D I thought had fixed all those
>> instances two days ago :) Cool you found another one.
>
> At least - it only finds one and dies right now. :-)
I think you 26-03 or 27-03 should have all fixes.
>> The idea with that situation is for ways:
>> - Download the way; and just reupload it, then it will be solved in
>> the next run (because the way download hides invisible nodes)
>
> Okay - so simply omitting the node would be a reasonable thing to do.
> Thanks!!
Keep on the good work :)
Stefan
More information about the dev
mailing list