[OSM-dev] [Tagging] Retrieving Wikipedia Entries Automatically

Zhijie Shen zjshen14 at gmail.com
Sat Feb 26 14:58:13 GMT 2011


Hi Serge Wroclawski,

On Sat, Feb 26, 2011 at 10:47 PM, Serge Wroclawski <emacsen at gmail.com>wrote:

> On Sat, Feb 26, 2011 at 9:28 AM, Zhijie Shen <zjshen14 at gmail.com> wrote:
> > Hi developers,
>
> Zhijie,
>
> Please do not cross post messages. Doing so does not make having a
> reasonable conversation any easier, it makes having it nearly
> impossible because the conversation is fragmented over several lists.
>
> > If you can remember, I've exchange emails with you to discuss the wiki
> tag
> > of OpenStreetMap two days ago.
>
> Do you have a URL to this conversation?
>
The previous conversation started there:
http://lists.openstreetmap.org/pipermail/dev/2011-February/021969.html

>
> >  Now I have my quick solution, a Wikipedia
> > entry crawler, to get more Wikipedia entries automatically. Here I am
> eager
> > to share with you, and wish it can be useful. The single Java class file
> can
> > be downloaded here.
>
> What is "here"?
>
> > The crawler implements the Sink interface of Osmosis, whose OSM XML file
> > parsing functionality is leveraged. It extracts the name of entity (e.g.,
> > node, way) from the name tag (hence the entities without name are
> omitted),
> > uses it as the parameter to search the candidate Wikipedia entries by
> > calling the Wikipedia API, and then judge which entry among the responded
> > results is the true one for the corresponding entity.
>
> This is an interesting idea and we're all likely impressed by your
> enthusiasm, but there's a huge, huge opportunity for mistakes with
> this kind of import, and since you're talking about mucking with the
> entire globe, it seems like we should take this conversation slowly.
>
Yes, I agree with you. This automatic import will definitely cause mistakes.
Actually, I wrote this crawler for my research project. I'm eager for data
and can tolerate some mistakes. As I wrote in my wiki page, I shared this
just to inspire some ideas to improve the collaboration between
OpenStreetMap and Wikipedia.

>
> Where's your source code?
>
The source code is here:
http://www.comp.nus.edu.sg/~z-shen/WikiEntryCrawler.java.

>
> - Serge
>



-- 
Zhijie Shen
School of Computing
National University of Singapore
<http://www.comp.nus.edu.sg/%7Ez-shen/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20110226/46dd5644/attachment.html>


More information about the dev mailing list