[OSM-dev] [Tagging] Retrieving Wikipedia Entries Automatically
zjshen14 at gmail.com
Sat Feb 26 14:58:13 GMT 2011
Hi Serge Wroclawski,
On Sat, Feb 26, 2011 at 10:47 PM, Serge Wroclawski <emacsen at gmail.com>wrote:
> On Sat, Feb 26, 2011 at 9:28 AM, Zhijie Shen <zjshen14 at gmail.com> wrote:
> > Hi developers,
> Please do not cross post messages. Doing so does not make having a
> reasonable conversation any easier, it makes having it nearly
> impossible because the conversation is fragmented over several lists.
> > If you can remember, I've exchange emails with you to discuss the wiki
> > of OpenStreetMap two days ago.
> Do you have a URL to this conversation?
The previous conversation started there:
> > Now I have my quick solution, a Wikipedia
> > entry crawler, to get more Wikipedia entries automatically. Here I am
> > to share with you, and wish it can be useful. The single Java class file
> > be downloaded here.
> What is "here"?
> > The crawler implements the Sink interface of Osmosis, whose OSM XML file
> > parsing functionality is leveraged. It extracts the name of entity (e.g.,
> > node, way) from the name tag (hence the entities without name are
> > uses it as the parameter to search the candidate Wikipedia entries by
> > calling the Wikipedia API, and then judge which entry among the responded
> > results is the true one for the corresponding entity.
> This is an interesting idea and we're all likely impressed by your
> enthusiasm, but there's a huge, huge opportunity for mistakes with
> this kind of import, and since you're talking about mucking with the
> entire globe, it seems like we should take this conversation slowly.
Yes, I agree with you. This automatic import will definitely cause mistakes.
Actually, I wrote this crawler for my research project. I'm eager for data
and can tolerate some mistakes. As I wrote in my wiki page, I shared this
just to inspire some ideas to improve the collaboration between
OpenStreetMap and Wikipedia.
> Where's your source code?
The source code is here:
> - Serge
School of Computing
National University of Singapore
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the dev