[Tagging] Mechanical Edit: fix japanese train stations wikipedia/names fields

Fabien SK fabiensk at gmail.com
Sun Oct 14 16:09:53 GMT 2012


Hi everyone,

I intend to write a script to complete the information on the japanese
train stations nodes. It would
- add the «wikipedia» tag if it does not exist
- fix the «wikipedia» tags with outdated format (for example:
wikipedia:ja = http://...)
- complete the names tags from the existing values
So any comment would be welcome.

So I think that I would do it like that:
- I create an OSM account for this task
- I get a recent dump for Japan
- Using osmosis, I extract all the nodes having railway=station (about
9000 nodes)
- My script will filter the nodes having incomplete information
- It will process the filtered nodes list by batches of X nodes (where X
is a reasonable number for a commit). I could try to create batches
containing nodes in the same area
- it will retrieve the latest version of each node (by id) using the API
- If the Wikipedia link is missing, it will download the jp WP page
(easy, the page name is [station name] + «eki»). If it cannot be
retrieved, or if I cannot be sure that the page is a not disambiguation
page, I give up
- it could check if the coordinates in the WP page (if present) are
about the same than the node. If not, wrong station, I give up
- if the wikipedia tag is wrong, it can fix it (if the value is an URL,
it can set it to the page name)
- it will complete the names tags if needed: «name:en» and «name:ja_rm»
have the same value. They can also be deduced from «ja_kana» (but it's
not always perfect), and vice versa. «name» can be set with the value
«name:ja (name:ja_rm)» if these two tags are present.
- the modified nodes of a batch will be put in an XML. I could review
them in JOSM before submitting them.

I have no experience when it comes to mechanical edit in OSM, so any
comment is welcome to make it safe, for the both the servers and the
existing data.
I don't know if it is technically necessary to split the commit in
batches. But I thought that it would be nice for manual reviews.
My japanese is too bad to write on the jp mailing list :-)

Cheers,
Fabien



More information about the Tagging mailing list