[osmosis-dev] API 0.6 and unique keys
Brett Henderson
brett at bretth.com
Thu Jan 29 11:01:08 GMT 2009
marcus.wolschon at googlemail.com wrote:
> On Wed, 28 Jan 2009 09:20:23 +0100, Jochen Topf <jochen at remote.org> wrote:
>
>> On Wed, Jan 28, 2009 at 08:29:14AM +0100, marcus.wolschon at googlemail.com
>> wrote:
>>
>>> I just learned on OSM-dev that in 0.6 you can no longer
>>> have more then one value for any key on an entity.
>>>
>>> Any chance we can use this to replace the Tag-class with
>>> a simple read-only Hashtable in the v0_6.Entity -class?
>>>
>> I am not sure this is actually a win. The normal case is just two or
>> three tags per entity. Creating a hash table with the associated memory
>> overhead for so few keys is probably more expensive then the one or two
>> full array scans you do to get the tags out again.
>>
>
> If it's about memory, then storing a simple String-array internally instead
> of
> Tag-instances and providing get(key), containsKey(key) and iterator() will
> require even less heap then the current implementation and allow for
> shorter
> code in filtering on tag- names. As the tag-list is immutable, the
> array-size
> is fixed anyway.
> What I do with it is routing, searching and painting, thus heavy filtering
> on tags
> and I am starting to export a lot of the preprocessing, indexing and
> filtering
> as osmosis- plugins to be usable by many other applications too.
>
Actually I misread your initial email. I thought you wanted to replace
the current Collection of Tags with a Map of Tags keyed by the name. I
didn't realise you wanted to replace Tag entirely although I should have
realised because you'd suggested it once before. Having thought about
this some more I'm getting more uncomfortable with it.
From a performance point of view it may not matter too much in practice
although a Map is always going to be heavier than an ArrayList,
definitely from a CPU perspective and probably from a RAM perspective as
well. You'll now be forced to calculate hashCode values for every tag
name string. If you were worried about the performance implications of
cloning Entities (which consist of several pointers and some ArrayLists
which are fast to duplicate) you should be much more worried about this ;-)
A low impact solution would be to add some utility methods to the
EntityBuilder classes that allow you to retrieve and update tags via a
Map<String, String>. This would let you access the tags in a Map if you
want to, but only instantiate the Map if it is actually required.
This is another case where making it easier to do one task has a
detrimental impact elsewhere. The current entity structures are about
as simple as possible, I always expected additional functionality to be
written within tasks and factored into re-usable components. I'd rather
not add things to the Entity classes unless essential.
But ...
There is another option here which I've just though of (so perhaps
rubbish) but I think it's worth seriously considering. Add support for
a new type of entity to the pipeline (probably as a plugin ...). Create
a new set of entity types that look exactly how you want them to look.
They can store tags as Maps, add extra optional fields such as way
geometries, and anything else that you wish to experiment with. Create
a new Sink and Source interface that deals with those types, and new
Manager objects (such as SinkManager, SourceSinkManager,
RunnableSourceManager, etc) to deal with them. Then write two tasks to
convert from existing entities to the super entities and back again. At
that point you're free to write tasks in any way you see fit using
entity types that you have complete control over. At runtime the only
overhead is a single conversion at the boundary from core entity task to
super entity task which I can almost guarantee will have less than 3%
overhead based on my recent measurements. Over time if we find that the
new entities work well and all new functionality is being built on top
of the super entities then we can roll functionality into the core
entities. I'm prepared to help here because it would be a few hours
work to setup initially. I like the idea of experimenting with new
entity ideas such as making them interfaces, adding optional fields, etc
but I'm not keen to add them to the core Entities (of which I'm still
the main maintainer) yet. Again, I don't want to be a roadblock to
progress, but I don't want to be lumped with a set of enhancements that
turned out to be a bad idea either. What do you guys think? I don't
want to discuss things on the mailing list forever, I'd like to
experiment with new ideas but ideally not in the core codebase.
More information about the osmosis-dev
mailing list