[OSM-dev] Reducing osm2pgsql memory usage using a database method
Artem Pavlenko
artem at mapnik.org
Mon Mar 12 09:27:28 GMT 2007
Hi Frederik,
>
>> Is this using a true XML reader or a simple line matching approach?
>
> I never use "true XML readers" (would have to be a SAX parser here)
> for
> the planet file. Reasons for this:
>
> * the planet file is not true XML (quoting problems, UTF-8 problems),
> most libraries will complain
I don't understand your argument here. If there are problems with
planet XML lets fix them, parsing XML with regex wont get you far.
>
> * a lot of efficiency can be gained by making the assumption that the
> file consists of nodes first, then segments, then ways, which,
> granted,
> is a "hack" as theoretically the XML could be in any sequence. If I
> drop
> my regular expressions because I say that the XML format could change
> anytime, then I'd also have to drop assumptions like this, and that
> would probably catapult me far beyond the 100 minute ballpark
> mentioned.
Ok, it is a hack after all.
>
> * regular expressions are faster (for this specific application and
> when
> doing it with Perl)
What application are you talking about? How this application relates
to osm2pgsql? Can you prove it?
> If the only problem of that algorithm is memory consumption, could one
> not simply run it in multiple passes, dividing the globe up in a
> number
> of bounding boxes and working them one after the other, with a little
> bit of overlap to allow for long ways/segments? The size of the
> bounding
> boxes could be chosen heuristically based on the file size of the
> planet
> file and the amount of available memory, so someone with a 4 gig
> machine
> could still do the whole file in one pass, and if there's only 512 mb
> available it would also work, just slower? - If you always use a DB
> backend for transient then it'll always work but more memory will only
> give you an advantage if efficiently used for database caches.
> All wildly speculating since I'm way out of my depth here,
I understand.
> I was just
> taking exception at the original argument "let's ditch C in favour of
> C++ because there we have hash tables".
'Hash tables' is not the only reason to consider C++.
FYI osm2pgsql is already linking to GEOS which requires C++ compiler.
I suggest you study the subject more carefully before writing long
emails.
Cheers,
Artem
More information about the dev
mailing list