[Imports] Importing japanese train stations from Wikipedia

Paul Norman penorman at mac.com
Thu Oct 11 22:54:31 BST 2012


> From: Fabien SK [mailto:fabiensk at gmail.com]
> Sent: Thursday, October 11, 2012 8:40 AM
> To: imports at openstreetmap.org
> Subject: [Imports] Importing japanese train stations from Wikipedia
> 
> Hi,
> 
> I have written a python script to complete the missing tags for japanese
> train stations (like the missing romaji names (in latin alphabet) from
> Wikipedia. I would like some confirmation that it is safe to import this
> information.
> 
> At the beginning, I used to do everything manually:
> - I downloaded a region in JOSM
> - I found the stations with missing attributes
> - most of the time, the japanese name was present, so I used it to find
> the japanese wikipedia page (by just adding 駅 / "station" at the end of
> the name)
> - then I completed the names fields (romaji, english, french, kana,
> name) if needed. I also looked at the english page because the romaji
> name was sometimes better there (long vowels written with a macron)
> 
> It was quite long, so I automated the task:
> - I still have to download the stations in JOSM (using the relations
> saves a lot of time)
> - I export that to an XML
> - I find the nodes being a station
> - if a romaji name is present (romaji, english, french), I use it to
> complete the others
> - I download the Japanese and English Wikipedia pages and extract the
> info
> - I complete the missing attributes
> - I put the modified node into another XML
> - I open it in JOSM
> - Double-check manually that it looks fine and commit Note that it does
> not handle the disambiguation pages (when two stations have the same
> name). And it is not really clean at it is done by extracting data from
> HTML.
> 
> I would like to know what do you think about that, concerning the
> legality of the import (I hope that the stations information was not put
> illegally in Wikipedia), and also if you have an idea to improve the
> process (for example to remove the need to download manually the nodes
> in JOSM).
> 

Wouldn't any Wikipedia information be under CC BY-SA and therefore unsuitable for importing?





More information about the Imports mailing list