[OSM-dev] Reducing osm2pgsql memory usage using a database method
Artem Pavlenko
artem at mapnik.org
Sat Mar 10 20:04:00 GMT 2007
On 10 Mar 2007, at 18:43, Jon Burgess wrote:
> On Sat, 2007-03-10 at 14:56 +0000, Artem Pavlenko wrote:
>> Jon,
>>
>>> I've just uploaded an experimental version of osm2pgsql which uses
>>> Postgresql database tables for the transient node and segment
>>> storage.
>>> This drops the memory usage from >1GB to ~60MB. On the downside, the
>>> import time has gone up from 20 to 100 minutes. I'm sure this can be
>>> improved though with some more database Mojo.
>>>
>>> For further details see SVN (utils/osm2pgsql/experimental/
>>> readme.txt)
>>> or
>>> http://trac.openstreetmap.org/browser/utils/osm2pgsql/experimental/
>>> readme.txt
>>>
>> Good stuff.
>>
>> I'm working on new osm.xml and I have some ideas on how to improve
>> osm2psql output:
>>
>> 1. We can re-write osm2pgsql in c++ and take advantage of dynamic
>> structures e.g std::map, safe formatting and casting
>> boost::lexiacal_cast, boost::format and more.
>>
>
> I think this makes sense. Handling items like the tag, segment and
> attribute lists should be significantly simpler in c++.
>
>> 2. At the moment there are a lot of redundant data in output tables.
>> Everything apart from geometries are dumped as 'TEXT' .
>>
>
>> We can have a more flexible design where table structure, attribute
>> values are configurable (at compile time).
>
> The current export tags table could easily be read in at run time.
> One possibility would be to move some of the rules from osm.xml into
> osm2pgsql, e.g. the roads, leisure, water, text could become separate
> tables instead of requiring select statements in the osm.xml file. I
> guess osm2pgsql could even be taught how to interpret the osm.xml
> file.
>> Consider this for example:
>> To render highway features in correct order I want to have z_order
>> field in planet_osm_table calculated as follow:
>
>> int z_order ( osm_feature const& feat)
>> {
>> int layer = 0; //default
>> try
>> {
>> layer = boost::lexical_cast<int>(feat['layer']);
>> }
>> catch (boost::bad_lexical_cast & )
>> {
>> // layer tag has got lots of junk!!!
>> }
>> int highway_z = 0; // 0..9
>> std::string highway = feat['highway']
>> if ( highway == 'motorway' || highway == 'motorway_link')
>> {
>> highway_z = 9;
>> }
>> else if (...) {}
>> ....
>>
>> bool bridge = false;
>> try {
>> bridge = boost::lexical_cast<bool>(feat['bridge']);
>> catch (...) {}
>> return 10 * ( layer + bridge?1:0) + highway_z ;
>> }
>>
>> Also I want to have consistent numeric feature_type calculated
>> differently depending on tags/values. This will make rendering more
>> efficient and will bring some (needed) sanity to styles in osm.xml.
>>
>> 3. Also we can abstract 'output writing' to have multiple back-ends :
>> mysql, sqlite , shapefiles etc .
>>
>> What do you think?
>>
>
> Makes sense to me. I would consider having a abstraction layer for
> both
> the data output and also the transient storage (array in RAM, database
> tables, mmaped file)
Yes, even better.
> Maybe it should adopt a plugin architecture with a config file much
> like
> mapnik. This might allow multiple simultaneous outputs, e.g. roads to
> DB, coast to shapefile.
Exactly.
>>
>
> Are you sure you are comparing like-for-like systems? I run on
> Linux/ext3 and do not notice any significant IO issues. Mind you, my
> machine has 2GB of RAM so a lot is cached and I have 4 disks in a
> RAID5
> setup so the IO rate is quite good.
>
You're right. Linux is 512Mb AMD x86_64 and mac is intel core 2 duo
(running in 32-bit mode).
> Anecdotally, I have heard that OS-X tends to be slightly slower than
> Linux in most tests (I had better run for cover now...).
I've heard that DOS is the fastest filesystem for postgresql data
dir :)
I'm running fedora core 6 (64-bits) on the same macbook I'll try to
do a better comparison soon.
Cheers,
Artem
More information about the dev
mailing list