[OSM-talk] [Tagging] Introducing Taginfo
jochen at remote.org
Wed Oct 13 08:38:09 BST 2010
On Tue, Oct 12, 2010 at 10:58:34PM +0200, Sebastian Klein wrote:
> Jochen Topf wrote:
>> On Tue, Oct 05, 2010 at 04:46:26PM +0200, Pieren wrote:
>>> If I could have one request, it would be nice to see the amount of different
>>> contributors using the same tag. This to distinguish between quantity and
>>> popularity. I know it might be challenging since we should only count the
>>> user of the tag creation in the element history...
>> On http://taginfo.openstreetmap.de/keys there is a 'users' column. But this
>> doesn't look at the history, only the current use. It gives you still some
>> idea, but its not perfect. But reading the history is not an option at the
>> moment, because this would need far more resources.
>> The number of users is also taking into account when creating the tag cloud
>> for the home page. This way some tags from imports which are very common in
>> the database but have a small user count are downgraded. :-)
> Is it planned to have users count for the individual key pages? It can
> be interesting to see how popular common_key=some_exotic_value really is.
> Sometimes it is used frequently, but by a single mapper only.
Users are counted for keys only and not for key=value combinations, because
there are just too many key=value combinations and too many users to do this
counting efficiently. At least I haven't come up with an idea how to do this.
maybe somebody else can.
Currently for every key I create a hash with all users in it, that use this
key. When I am through all the tags, I count how many elements there are in
each hash and thats the number of users for this key. This is rather
inefficient and could probably be improved using some clever hashing for
the price of some inaccuracies (which don't matter too much in this case,
all we really want to know is roughly how many users there are).
But even when this is done in a more efficient manner, we can't to that
for 50 million different key=value combinations. We might be able to do
it for the more popular combinations, after all if a key=value combination
only appears twice in the whole database, it doesn't really matter if that
was from one or two users.
Currently Taginfo needs about 10G RAM to do all the statistics it does. Thats
already too much in my opinion. So until somebody finds a clever way how to
reduce the memory needed for these kinds of statistics, they can not be done.
Jochen Topf jochen at remote.org http://www.remote.org/jochen/ +49-721-388298
More information about the talk