[osmosis-dev] Cloning and Concurrency Support

Brett Henderson brett at bretth.com
Tue Jan 20 02:30:26 GMT 2009


There's been discussion recently between myself, Marcus and Jochen on 
the best way to deal with modification of Entities within tasks.

In the current osmosis implementation, all entity classes (ie. Bound, 
Way, Node and Relation) are treated as immutable.  In other words they 
are never modified after creation.  If a task wishes to make a 
modification to an existing entity it always creates a new copy of the 
Entity.  The main reason for this is thread safety.  This allows 
multiple threads to operate on a single Entity instance without worrying 
about concurrency issues.  This is a similar approach taken by many core 
Java classes such as String and Date.  In most cases this is unnecessary 
in osmosis because a pipeline is usually a linear set of tasks, the only 
time it could become an issue is if a --tee task is used and multiple 
tasks begin acting on Entity instances at the same time.

However, it has some disadvantages:
1. There is a performance overhead in creating new object instances.
2. There is additional coding effort compared with modifying entities 
directly.

I've recently added some new classes such as EntityBuilder which assist 
with number 2, but it is still more complicated than dealing with Entity 
objects directly.

I've just completed some tests to get some numbers to find out just how 
much performance impact it has.  All tests were performed using a 
gzipped osm file approximately 100MB in size.  All timings are in 
milliseconds.  I used a task that removes a subset of tags from entities.

Baseline timings without
osmosis --read-xml-0.6 myfile.osm.gz --write-null-0.6
51791
52112
52312

Using a cloning implementation (current svn code)
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by 
--write-null-0.6
56397
56187
56227

Using Jochen's initial "drop tags" implementation.
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by 
--write-null-0.6 (I changed code to invoke a different implementation)
55642
57376
55665

Using a tweaked implementation to more efficiently remove existing tags 
from the ArrayList (same as used by the cloning implementation to 
compare apples with apples).
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by 
--write-null-0.6 (I changed code to invoke a different implementation)
53692
53820
55934

As above but using the new --fast-read-xml-0.6 implementation.
osmosis --fast-read-xml-0.6 myfile.osm.gz --remove-tags-0.6 
keys=created_by --write-null-0.6
44895
44923
45161

The numbers are interesting.  The majority of overhead is in xml parsing 
which is normal.  Initially I was slightly surprised to see almost 
identical results from the cloning and the first "drop tags" 
implementations.  I then modified the second one to use the same tag 
manipulation code as the cloning implementation which saw it improve in 
speed.  The results aren't consistent enough to get hard numbers, but 
the tweaked "drop tags" approach appears to consume less than half the 
cpu of the cloning implementation.  Finally, I tried the new fast xml 
reader out of interest to see what difference it made and it took off 
over 15% execution time.

So to summarise:
* The overhead of cloning isn't large in real-world usage.
* But within individual tasks cloning has noticeable impact.
* As always, most performance gains are to be realised in the "IO" tasks 
(ie. xml processing, db processing, etc).

I'm now starting to think that cloning isn't ideal but don't have a 
better alternative at the moment.  It would be nice if we could achieve 
guaranteed thread safety while avoiding cloning in the majority of cases 
(ie. when --tee isn't being used), so long as it doesn't add complexity 
in other areas.  One possibility as suggested by Jochen already is to do 
all cloning in the --tee task instead.  It would probably solve the 
issue although I'd like to consider other options if possible.

Anyway, I thought these numbers might be interesting.  The patch I used 
to do this testing is attached.

Brett

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tag_remove_variations.txt
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20090120/6f8e160b/attachment.txt>


More information about the osmosis-dev mailing list