[OSM-dev] Another benchmark (was: XAPI and other solutions)

Pierre-Alain Dorange pdorange at mac.com
Fri Apr 29 13:46:10 BST 2011


Oliver Tonnhofer <olt at omniscale.de> wrote:

> > Interesting, this totally uproots my firm conviction that SAX parsing is
> > always more time- and memory-efficient than tree/DOM-based parsing.
> 
> Sure. DOM based parser need to keep the whole tree in memory which doesn't
> work well with large XML files. SAX and other stream based parser (like
> iterparse for Python) do not keep any information in memory. Stream
> based parser require more work from the developer, you can't use XPath
> expressions for example. So DOM based parsing still makes sense for
> documents with a known size, but for OSM data SAX/iterparse is the way
> to go.

I just make a small bench ElementTree (DOM) against xml.sax and
ElementTree (iterparse) on OSM data.

1st file XML : 12 MB
        ET (DOM) : 8 seconds (about 150 MB)
        ET (iterparse) : 8 seconds (about 100 MB)
        xml.sax : 3 seconds (about 75 MB)

2nd file XML : 30 MB
        ET (DOM) : 25 seconds (require about 280 MB)
        ET (iterparse) : 22 seconds (about 100 MB)
        xml.sax : 8 seconds (require about 72 MB)

3rd file XML : 243 MB
        ET (DOM) : n/a
        ET (iterparse) : 328 seconds (about 70 MB)
        xml.sax : 115 seconds (require about 50 MB)

Note : ET (DOM) broke, not enough memory (my bsd system has 4 GB).

4th file XML : 1680 MB (1.64 GB)
        ET (DOM) : n/a
        ET (iterparse) :  n/a
        xml.sax : 365 seconds (require about 50 MB), 20 Knodes/sec

Note : 
- ET (DOM) broke, not enough memory
- ET (iterparse) also use all memory (don't understand why, must be a
bug in my code)

The task was to extract nodes with tags place=* with their lon-lat, name
and polulation (build a list of them in memory than export for a text
file).

Conclusions :
xml.sax is the winner (fastest) but also require more code to handle the
same task.
On the code size and simplicity ET (DOM) win but is also the slower one
(note it's also the fastest xml DOM).
In the middle ET (iterparse) has a code more simple than xml.sax, but
not that efficient.

For my project i'll of course switch to xml.sax, thanks for advices.

-- 
Pierre-Alain Dorange
OSM experiences : <http://www.leretourdelautruche.com/map/>




More information about the dev mailing list