[OSM-ja] Fwd: [Tagging] Mechanical Edit: fix japanese train stations wikipedia/names fields
Satoshi IIDA
nyampire @ gmail.com
2012年 10月 14日 (日) 17:40:15 GMT
いいだです。
Tagging Mailing Listにて、
Wikipediaページの記述を元に日本の鉄道駅の nameタグを充実させたい、という試みがあります。
http://lists.openstreetmap.org/pipermail/tagging/2012-October/011673.html
詳細は原本のメールを参照いただくとして、
おおまかにいうと以下の操作が企画されています。
・Planet dumpから既存の station nodeを抽出する
・スクリプトを併用し、半手動でタグを追加する
・インポート専用のアカウントを作成してインポートを行う
・対象のNode数は現在約 9000 node
・対象のタグは以下のとおり (target tag)
* name:en
* name:ja
* name:ja_rm (same as "name:en")
* name:ja_kana (if available)
* wikipedia
とても素敵な試みではありますが、
ただ、対象範囲が広く、インパクトが大きい変更であることも事実です。
「こうしたほうがいい」「ここはどうなってるの?」等、ご意見いただけると嬉しいです。
以下、個人的な提案です。
* タグスキーマの実例が見てみたいので、
路線を1つに絞ってテストインポートをしてみるのはどうか?
Wikipageでのタグ説明でもOK。
(do a test import that specified on 1 line, to check the tag scheme
and the value.
or make a OSM wikipage to see a tagging sample. - from Satoshi)
なお、提案者のFabienさんから英語で投稿があるかもしれません ;)
---------- Forwarded message ----------
From: Fabien SK <fabiensk @ gmail.com>
Date: 2012/10/15
Subject: [Tagging] Mechanical Edit: fix japanese train stations
wikipedia/names fields
To: tagging @ openstreetmap.org
Hi everyone,
I intend to write a script to complete the information on the japanese
train stations nodes. It would
- add the ≪wikipedia≫ tag if it does not exist
- fix the ≪wikipedia≫ tags with outdated format (for example:
wikipedia:ja = http://...)
- complete the names tags from the existing values
So any comment would be welcome.
So I think that I would do it like that:
- I create an OSM account for this task
- I get a recent dump for Japan
- Using osmosis, I extract all the nodes having railway=station (about
9000 nodes)
- My script will filter the nodes having incomplete information
- It will process the filtered nodes list by batches of X nodes (where X
is a reasonable number for a commit). I could try to create batches
containing nodes in the same area
- it will retrieve the latest version of each node (by id) using the API
- If the Wikipedia link is missing, it will download the jp WP page
(easy, the page name is [station name] + ≪eki≫). If it cannot be
retrieved, or if I cannot be sure that the page is a not disambiguation
page, I give up
- it could check if the coordinates in the WP page (if present) are
about the same than the node. If not, wrong station, I give up
- if the wikipedia tag is wrong, it can fix it (if the value is an URL,
it can set it to the page name)
- it will complete the names tags if needed: ≪name:en≫ and ≪name:ja_rm≫
have the same value. They can also be deduced from ≪ja_kana≫ (but it's
not always perfect), and vice versa. ≪name≫ can be set with the value
≪name:ja (name:ja_rm)≫ if these two tags are present.
- the modified nodes of a batch will be put in an XML. I could review
them in JOSM before submitting them.
I have no experience when it comes to mechanical edit in OSM, so any
comment is welcome to make it safe, for the both the servers and the
existing data.
I don't know if it is technically necessary to split the commit in
batches. But I thought that it would be nice for manual reviews.
My japanese is too bad to write on the jp mailing list :-)
Cheers,
Fabien
_______________________________________________
Tagging mailing list
Tagging @ openstreetmap.org
http://lists.openstreetmap.org/listinfo/tagging
--
Satoshi IIDA
mail: nyampire @ gmail.com
twitter: @nyampire
Talk-ja メーリングリストの案内