[OSM-talk] Tag statistics

Frederik Ramm frederik at remote.org
Thu Mar 1 10:15:16 GMT 2007


> Is there an existing tool that I can use to do a global osm search
> and then do a replace? Or can the web api be used to do that (the db
> search and extract) in a self-made Perl script?

There is no existing tool I am aware of.

I have started writing a Perl framework that does the following:

* Run through the planet file and issue "callbacks" to one or more  
checking modules for each way, segment, or node found there;
* The checking module will look at the item and may ask the framework  
to change or delete the item just seen or add new items;
* The framework will, in that case, fetch the item from the API  
(because it cannot be sure that the planet file was current), and if  
the item is still unchanged from the planet file, commit the change  
requested by the checking module.
* The framework will also, optionally, allow checking modules to  
randomly access elements from the planet file as if it were talking  
to the API; this is done by keeping an in-memory index of IDs seen  
and seeking to the appropriate position in the file when requested.  
This is the closest you can get to a "database mode" without going  
through the troubles of importing the planet file into a database.

Checking modules will follow a simple interface and can be  
contributed by anyone; simple checking modules might correct tag  
names that are obviously mis-spelled and report others that are not  
in the list of known tags, for possible manual intervention. It is  
also possible to have an interactive checking module that would ask  
the operator ("found unknown tag xy with value zz - d)elete, e)dit  
tag name, a)dd tag to list of allowed tags, s)kip" or something).  
Complex checking modules might also check referential integrity or  
relations between elements ("found way with unordered segments") or  
even do things like automatically splitting ways that fork etc.,  
albeit these operations may require extra memory commitment.

The script will differ from efforts like Maplint in that it (a) can  
reasonably operate on the full database, not only a subset, and (b)  
can make automated changes to the database. It will differ from the  
various database-based efforts in that it doesn't require  
installation of a database and a lengthy import process. It will be  
especially well-suited for isolated checks where only one segment,  
way, or node has to be looked at; complex referential analyses will  
be better served  by a database-based approach.


Automatic changes to wiki-like systems are a sensitive issue. We  
would not want inexperienced programmers write faulty checking  
modules for this framework and let them run amok. I will come up with  
reasonable safety measures built into the framework, but the  
potential for - accidental or deliberate - damage remains.

Then again it would be naive to *not* release the framework to SVN  
because of the damage it could do in the wrong hands; the damage can  
already be done. And in the worst case, it still is a wiki so we  
should be able to revert changes.

Maybe the tool should only be used by a group of experienced people,  
to whom others submit their requests in plain language.

I'll let you know when there's progress.


Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00.09' E008°23.33'

More information about the talk mailing list