[Imports] Importing japanese train stations from Wikipedia

Thu Oct 11 16:39:41 BST 2012

Hi,

I have written a python script to complete the missing tags for japanese
train stations (like the missing romaji names (in latin alphabet) from
Wikipedia. I would like some confirmation that it is safe to import this
information.

At the beginning, I used to do everything manually:
- I downloaded a region in JOSM
- I found the stations with missing attributes
- most of the time, the japanese name was present, so I used it to find
the japanese wikipedia page (by just adding 駅 / "station" at the end of
the name)
- then I completed the names fields (romaji, english, french, kana,
name) if needed. I also looked at the english page because the romaji
name was sometimes better there (long vowels written with a macron)

It was quite long, so I automated the task:
- I still have to download the stations in JOSM (using the relations
saves a lot of time)
- I export that to an XML
- I find the nodes being a station
- if a romaji name is present (romaji, english, french), I use it to
complete the others
- I download the Japanese and English Wikipedia pages and extract the info
- I complete the missing attributes
- I put the modified node into another XML
- I open it in JOSM
- Double-check manually that it looks fine and commit
Note that it does not handle the disambiguation pages (when two stations
have the same name). And it is not really clean at it is done by
extracting data from HTML.

I would like to know what do you think about that, concerning the
legality of the import (I hope that the stations information was not put
illegally in Wikipedia), and also if you have an idea to improve the
process (for example to remove the need to download manually the nodes
in JOSM).

Cheers,
Fabien

PS: my japanese is pretty poor :-)