[osmosis-dev] API 0.6 and unique keys

Thu Jan 29 11:01:08 GMT 2009

marcus.wolschon at googlemail.com wrote:
> On Wed, 28 Jan 2009 09:20:23 +0100, Jochen Topf <jochen at remote.org> wrote:
>   
>> On Wed, Jan 28, 2009 at 08:29:14AM +0100, marcus.wolschon at googlemail.com
>> wrote:
>>     
>>> I just learned on OSM-dev that in 0.6 you can no longer
>>> have more then one value for any key on an entity.
>>>
>>> Any chance we can use this to replace the Tag-class with
>>> a simple read-only Hashtable in the v0_6.Entity -class?
>>>       
>> I am not sure this is actually a win. The normal case is just two or
>> three tags per entity. Creating a hash table with the associated memory
>> overhead for so few keys is probably more expensive then the one or two
>> full array scans you do to get the tags out again.
>>     
>
> If it's about memory, then storing a simple String-array internally instead
> of
> Tag-instances and providing get(key), containsKey(key) and iterator() will
> require even less heap then the current implementation and allow for
> shorter
> code in filtering on tag- names. As the tag-list is immutable, the
> array-size
> is fixed anyway.
> What I do with it is routing, searching and painting, thus heavy filtering
> on tags
> and I am starting to export a lot of the preprocessing, indexing and
> filtering
> as osmosis- plugins to be usable by many other applications too.
>   
Actually I misread your initial email.  I thought you wanted to replace 
the current Collection of Tags with a Map of Tags keyed by the name.  I 
didn't realise you wanted to replace Tag entirely although I should have 
realised because you'd suggested it once before.  Having thought about 
this some more I'm getting more uncomfortable with it.

 From a performance point of view it may not matter too much in practice 
although a Map is always going to be heavier than an ArrayList, 
definitely from a CPU perspective and probably from a RAM perspective as 
well.  You'll now be forced to calculate hashCode values for every tag 
name string.  If you were worried about the performance implications of 
cloning Entities (which consist of several pointers and some ArrayLists 
which are fast to duplicate) you should be much more worried about this ;-)

A low impact solution would be to add some utility methods to the 
EntityBuilder classes that allow you to retrieve and update tags via a 
Map<String, String>.  This would let you access the tags in a Map if you 
want to, but only instantiate the Map if it is actually required.

This is another case where making it easier to do one task has a 
detrimental impact elsewhere.  The current entity structures are about 
as simple as possible, I always expected additional functionality to be 
written within tasks and factored into re-usable components.  I'd rather 
not add things to the Entity classes unless essential.

But ...

There is another option here which I've just though of (so perhaps 
rubbish) but I think it's worth seriously considering.  Add support for 
a new type of entity to the pipeline (probably as a plugin ...).  Create 
a new set of entity types that look exactly how you want them to look.  
They can store tags as Maps, add extra optional fields such as way 
geometries, and anything else that you wish to experiment with.  Create 
a new Sink and Source interface that deals with those types, and new 
Manager objects (such as SinkManager, SourceSinkManager, 
RunnableSourceManager, etc) to deal with them.  Then write two tasks to 
convert from existing entities to the super entities and back again.  At 
that point you're free to write tasks in any way you see fit using 
entity types that you have complete control over.  At runtime the only 
overhead is a single conversion at the boundary from core entity task to 
super entity task which I can almost guarantee will have less than 3% 
overhead based on my recent measurements.  Over time if we find that the 
new entities work well and all new functionality is being built on top 
of the super entities then we can roll functionality into the core 
entities.  I'm prepared to help here because it would be a few hours 
work to setup initially.  I like the idea of experimenting with new 
entity ideas such as making them interfaces, adding optional fields, etc 
but I'm not keen to add them to the core Entities (of which I'm still 
the main maintainer) yet.  Again, I don't want to be a roadblock to 
progress, but I don't want to be lumped with a set of enhancements that 
turned out to be a bad idea either.  What do you guys think?  I don't 
want to discuss things on the mailing list forever, I'd like to 
experiment with new ideas but ideally not in the core codebase.