[osmosis-dev] Cloning and Concurrency Support
Brett Henderson
brett at bretth.com
Tue Jan 20 02:30:26 GMT 2009
There's been discussion recently between myself, Marcus and Jochen on
the best way to deal with modification of Entities within tasks.
In the current osmosis implementation, all entity classes (ie. Bound,
Way, Node and Relation) are treated as immutable. In other words they
are never modified after creation. If a task wishes to make a
modification to an existing entity it always creates a new copy of the
Entity. The main reason for this is thread safety. This allows
multiple threads to operate on a single Entity instance without worrying
about concurrency issues. This is a similar approach taken by many core
Java classes such as String and Date. In most cases this is unnecessary
in osmosis because a pipeline is usually a linear set of tasks, the only
time it could become an issue is if a --tee task is used and multiple
tasks begin acting on Entity instances at the same time.
However, it has some disadvantages:
1. There is a performance overhead in creating new object instances.
2. There is additional coding effort compared with modifying entities
directly.
I've recently added some new classes such as EntityBuilder which assist
with number 2, but it is still more complicated than dealing with Entity
objects directly.
I've just completed some tests to get some numbers to find out just how
much performance impact it has. All tests were performed using a
gzipped osm file approximately 100MB in size. All timings are in
milliseconds. I used a task that removes a subset of tags from entities.
Baseline timings without
osmosis --read-xml-0.6 myfile.osm.gz --write-null-0.6
51791
52112
52312
Using a cloning implementation (current svn code)
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by
--write-null-0.6
56397
56187
56227
Using Jochen's initial "drop tags" implementation.
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by
--write-null-0.6 (I changed code to invoke a different implementation)
55642
57376
55665
Using a tweaked implementation to more efficiently remove existing tags
from the ArrayList (same as used by the cloning implementation to
compare apples with apples).
osmosis --read-xml-0.6 myfile.osm.gz --remove-tags-0.6 keys=created_by
--write-null-0.6 (I changed code to invoke a different implementation)
53692
53820
55934
As above but using the new --fast-read-xml-0.6 implementation.
osmosis --fast-read-xml-0.6 myfile.osm.gz --remove-tags-0.6
keys=created_by --write-null-0.6
44895
44923
45161
The numbers are interesting. The majority of overhead is in xml parsing
which is normal. Initially I was slightly surprised to see almost
identical results from the cloning and the first "drop tags"
implementations. I then modified the second one to use the same tag
manipulation code as the cloning implementation which saw it improve in
speed. The results aren't consistent enough to get hard numbers, but
the tweaked "drop tags" approach appears to consume less than half the
cpu of the cloning implementation. Finally, I tried the new fast xml
reader out of interest to see what difference it made and it took off
over 15% execution time.
So to summarise:
* The overhead of cloning isn't large in real-world usage.
* But within individual tasks cloning has noticeable impact.
* As always, most performance gains are to be realised in the "IO" tasks
(ie. xml processing, db processing, etc).
I'm now starting to think that cloning isn't ideal but don't have a
better alternative at the moment. It would be nice if we could achieve
guaranteed thread safety while avoiding cloning in the majority of cases
(ie. when --tee isn't being used), so long as it doesn't add complexity
in other areas. One possibility as suggested by Jochen already is to do
all cloning in the --tee task instead. It would probably solve the
issue although I'd like to consider other options if possible.
Anyway, I thought these numbers might be interesting. The patch I used
to do this testing is attached.
Brett
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: tag_remove_variations.txt
URL: <http://lists.openstreetmap.org/pipermail/osmosis-dev/attachments/20090120/6f8e160b/attachment.txt>
More information about the osmosis-dev
mailing list