[OSM-talk] Tag statistics
Frederik Ramm
frederik at remote.org
Thu Mar 1 10:15:16 GMT 2007
Hi,
> Is there an existing tool that I can use to do a global osm search
> and then do a replace? Or can the web api be used to do that (the db
> search and extract) in a self-made Perl script?
There is no existing tool I am aware of.
I have started writing a Perl framework that does the following:
* Run through the planet file and issue "callbacks" to one or more
checking modules for each way, segment, or node found there;
* The checking module will look at the item and may ask the framework
to change or delete the item just seen or add new items;
* The framework will, in that case, fetch the item from the API
(because it cannot be sure that the planet file was current), and if
the item is still unchanged from the planet file, commit the change
requested by the checking module.
* The framework will also, optionally, allow checking modules to
randomly access elements from the planet file as if it were talking
to the API; this is done by keeping an in-memory index of IDs seen
and seeking to the appropriate position in the file when requested.
This is the closest you can get to a "database mode" without going
through the troubles of importing the planet file into a database.
Checking modules will follow a simple interface and can be
contributed by anyone; simple checking modules might correct tag
names that are obviously mis-spelled and report others that are not
in the list of known tags, for possible manual intervention. It is
also possible to have an interactive checking module that would ask
the operator ("found unknown tag xy with value zz - d)elete, e)dit
tag name, a)dd tag to list of allowed tags, s)kip" or something).
Complex checking modules might also check referential integrity or
relations between elements ("found way with unordered segments") or
even do things like automatically splitting ways that fork etc.,
albeit these operations may require extra memory commitment.
The script will differ from efforts like Maplint in that it (a) can
reasonably operate on the full database, not only a subset, and (b)
can make automated changes to the database. It will differ from the
various database-based efforts in that it doesn't require
installation of a database and a lengthy import process. It will be
especially well-suited for isolated checks where only one segment,
way, or node has to be looked at; complex referential analyses will
be better served by a database-based approach.
However:
Automatic changes to wiki-like systems are a sensitive issue. We
would not want inexperienced programmers write faulty checking
modules for this framework and let them run amok. I will come up with
reasonable safety measures built into the framework, but the
potential for - accidental or deliberate - damage remains.
Then again it would be naive to *not* release the framework to SVN
because of the damage it could do in the wrong hands; the damage can
already be done. And in the worst case, it still is a wiki so we
should be able to revert changes.
Maybe the tool should only be used by a group of experienced people,
to whom others submit their requests in plain language.
I'll let you know when there's progress.
Bye
Frederik
--
Frederik Ramm ## eMail frederik at remote.org ## N49°00.09' E008°23.33'
More information about the talk
mailing list