[OSM-dev] Tagstat History/Change over time

Lars Francke lars.francke at gmail.com
Wed Jul 21 17:56:02 BST 2010


> I'm wondering if anyone out there could talk about how resource-intensive it
> would be to set up a system to show the usage of any arbitrary tag/value
> pair (or maybe just the top 5000?) over the course of time. Presumably it
> would be something like what tagstat does, but the results would be saved
> every week so that graphs could be made.
> My thought is that maybe something like this could be used to spot bot
> vandalism. Also, it might be helpful to know if a particular set of tags is
> falling into disuse or is gaining in popularity.
> And it would look cool.

I'm doing that for the new OSMdoc version (I know I've talked a lot
about it and got nothing new to show). I'm doing a snapshot daily and
I'm using the historydump. I am using the Hadoop stack for this
(Hadoop, Hive, HBase, ...) and it takes two to three servers to run
efficiently. Unfortunately I have no way to host this stuff so I just
run it at home from time to time. But this is very elaborate. The same
should be possible on a smaller scale and in a regular PostgreSQL
database with some processing of the historydump and/or daily diffs.

I too am thinking that this would be great and that some machine
learning algorithms would be nice to try on that data set. Classifying
changesets as spam and such.

Cheers,
Lars




More information about the dev mailing list