[Imports] [serge at wroclawski.org: Distributed Imports]

Mon Mar 8 12:57:05 GMT 2010

Hi,
I dont think this got set out to the list here, as points from it can
be added onto the page im working on.  Perhaps someone can help add in
points, perhaps expand the 'TIGER lessions' section.  or add in a new
section to summarize what the problem is.  (I already know that the
problem is (of course) but the average reader wont.

http://wiki.openstreetmap.org/wiki/OSM_Import_Database

My solution was the 'time-puch-hole-card' version of the Coraine
web-interface, but it does the same thing.  Essentially keeps track of
exactly what was imported & what it left to deal with. (the Google
Docs chart listing the NTS tile areas) vs. the web-interface.

The working alternative to the Google Docs chart is a wiki-chart,
where users enter in what data they are working on, with a stataement
indicating the status. ... (as well as messaging the talk-ca@ list).
Also effective, but it's still equivalent to a digital punch
card-clock that is not actually connected to anything.
(A time-card in a work setting,  Where a a traditional punch-clock is
not as effective as a finger-print scanner that is connected to the
entire payroll & company system.)

Coraine lists it wonderfully, with a section dedicated to listing data
as 'not-importable', and 'not imported'.  ... with automatically
imported & imported manually. .. and changing the opacity of the
overlay layer.

So anyway, the system is already in place, but the challenge we have
is that there are too few people who know to handle the back-end
database, and answer the question of "how to update the underlying
database with new information & different datasets & sources?".
And to answer the question of "How can i donate my dateset to this
system (create a continious link) so then it will show up in the list
(on that left side scrolling list as)?"

These people have the programming skills that most of us simply dont
have.   So hopefully, we can find out who else out there is available
to assist (that person who made this system work).

Cheers,
Sam

On Mon, Mar 8, 2010 at 4:01 AM, Serge Wroclawski <serge at wroclawski.org> wrote:
> ----- Forwarded message from Serge Wroclawski <serge at wroclawski.org> -----
>
> Date: Tue, 29 Dec 2009 10:13:06 -0500
> From: Serge Wroclawski <serge at wroclawski.org>
> To: Thea Clay <thea at cloudmade.com>
> Subject: Distributed Imports
>
> Thea,
>
> I wanted to follow up a bit on our conversation yesterday about a
> "distributed import" project.
>
> This is going to be a long mail. I appologize in advance.
>
>
> === The Problem ===
>
> There are a number of problems OSM has encountered in relation to
> imports (and let's just stick to the US).
>
> 1. Resource Mismatch
>
> There is a backlog of imports to do be done. Even where we have the data
> and the license, there's a certain amount of effort needed to add the
> data properly im OSM, much of which simply has to be done manually.
>
> At the same time, if you ask the average OSMer, they'd probably tell you
> they'd like to help OSM more with imports, but they don't have the
> skillset necessary. You might even find that an OSMer would like to help
> more in general, but doesn't have the resources to go mapping.
>
> So we have work that isn't done, and a workforce that isn't able to
> help.
>
> 2. Import Methods and Licenseing
>
> Right now the methods for importing are sort of hap-hazard. It feels
> like imports were something of a special case, and as such weren't
> widely discussed, but also generally frowned upon due to the issues
> around them.
>
> There's a relatively inactive import working group and a wiki page.
>
> And finally there's the issue of licensing an re-licensing the data. Any
> data that goes into OSM needs to have all the licence issues worked out
> first. This shouldn't be a cumbersome process, but right now it's
> ill-defined.
>
> 3. OSM Culture
>
> OSM can often feel like an impenitrable bundle of complexity.
> Ambassadors helped break down the initial barriers, but the difficulties
> start when you go home and start actually trying to put this data in
> OSM.
>
> "How do I represent this?" "What's the right way to tag that?", etc.
>
> It would be good if OSM had a system of mentorship where newbies could
> come in, try to work on something, then be given feedback on their work,
> at least in the beginning.
>
> This wouldn't be mandatory, but it would probably help a great deal of
> people join OSM.
>
> Then for more experienced users, it would give them an oportunity to
> keep OSM consistent.
>
>
> My proposed solution to this problem is "Distributed Imports".
>
>
> === History of Distributed Proofreading ===
>
> In 2005, I was out of work for an extended period of time. I was doing a
> little freelance writing, but had tons of time on my hands, and I
> stumbled on Distributed Proofreading.
>
> Project Gutenburg had a problem: They had lots of books, magazines and
> articles, but little time to OCR them. The specifications for inclusion
> in Gutenburg are quite strict, so it was beyond the scope of a normal
> user to work on it, and even if they could, most people didn't have the
> original material, knowledge of copyright law, scanners, etc.
>
> Distributed Proofreaders (DP) was created as a side project to help take
> up some of the overflow. The way it works is each book is scanned in and
> goes through a standard OCR process.
>
> This takes care of some 95% of the work. The rest has to be done by hand.
> There's spelling to correct, formatting to fix, etc. It's a pretty
> detailed process.
>
> A user does a small chunk of work (say, a page), and then an editor comes
> in and goes over the page, giving feedback to the original user on what
> they forgot, missed, etc. There's also opportunity in the interface for a
> proofreader to point out questions/problems in the document using special
> markup.
>
> Once the first editor has looked at it, another editor takes a look, and
> then it's all gone over by the project leader before submission to
> Project Gutenburg.
>
> You can find more detail about the process on thier FAQ:
> http://www.pgdp.net/c/faq/ProoferFAQ.php
>
> I think we can do something similar with OSM.
>
>
> === From a User's Perspective ===
>
> Let's call our new project "Distributed Imports". A user logs into DI and
> sees a list of imports in progress. The project will be labled in terms
> of difficulty.
>
> Once they select a project, they'll get a page from the project lead on
> the project- further detail about it, features that are important, things
> to keep/remove, tags to use, etc.
>
> From there they can check out a chunk of data and they'll be presented
> with an editor with the existing OSM data with the new data overlayed
> on top. They can then edit the page to completion and click done.
>
> Once they've done one unit of data, they can do another, or go on to
> another project, etc.
>
> === From a Project Lead's Perspective ===
>
> Someone who wants to get an import into OSM would go to the DI website
> and sign up. If all they want to do is "donate" the data, there would be
> an interface for that and a project administrator would work with them to
> secure the rights to it.
>
> Then the data would be put into the system (even if it's not yet an
> active project).
>
> Once it's ready to be imported, someone would act as project lead. They'd
> create the first pass at an automatic import and would write up the style
> guide, as well as work with editors on it and make sure the process is
> going smoothly.
>
> === Editors ====
>
> Experienced users could become editors. They would be there to help
> manage a project as it gets edited, correct it where necessary, and
> provide feedback to the user.
>
> On a small import and in the beginning, project leads will probably also
> be the only editors.
>
> They'd have a different interface, which would (at least) show the
> original data and the result side-by-side (or some other way that makes
> sense).
>
> === Terminology ===
>
> To clarify here's a quick glossary of what I mean by certain terms:
>
> Importer - Someone who is doing the work to edit the data for OSM
>
> Editor - Someone who is reviewing the work by an importer and providing
>         feedback to them (as well as coordinating with the project lead)
>
> Project Lead - The person responsible for a dataset
>
> A Project - A project is a single dataset. For example, DC GIS has given
>            us access to, I believe 300 datasets (50 or so we're
>            interested in importing). Each dataset would be a project.
>
> A Work Unit - The checked out data that an importer is working on at a
>              time.
>
>
> If we take this project forward, we may want to have some new terms. For
> example, I can see saying "Reviewer" rather than "Editor", etc.
>
> === The Technology ===
>
> The technology behind a DI would be a mix of some things that we could do
> pretty easily, along with some technologies that don't (AFAIK) exist yet.
>
> The way I imagine it, there would need to be a new instance of the OSM
> database stood up, as well as a catalog of the datasets and the tools to
> go from the original data to the OSMed version.
>
> An import would sort of be like a batch processing job, where they
> could use one of the existing tools (like shp-to-osm) with a config file,
> or something similar.
>
> Then that shape file would need to be chunked out into work units. This
> is really where I'm unsure how to best go along this. We want a work
> unit to be 10-30 minutes worth of work. I'm not sure how go about making
> these chunks.
>
> Then there'd be a database of work units, who is working on what, etc.
>
> We'd need a special version of Potlatch (probably PL2) which would be
> able to display the data in the way we want, if possible, hilighting
> what's the data to be reviewed, etc.
>
> And then when the user saves the data, it shouldn't go directly into OSM,
> but back to an Editor, who then can modify it, and the data would be sent
> into OSM as, eg an OSMChange file. This means either the editor needs to
> be modified or we need a modified server that can take updates and store
> them "off to the side".
>
> === Other Benefits ===
>
> The benefits to this idea go beyond imports, in my mind. They'd let OSM
> integrate people and be comfortable with the tools without needing to go
> collect data on their own.
>
> It would give them an opportunity to understand the culture of OSM
> without getting on the (sometimes noisy) lists.
>
> And it wouldn't disrupt the existing OSM infrastucture. Most OSM users
> wouldn't even need to know this project exists.
>
>
>
> So, that's my idea...
>
>
> - Serge
>
> ----- End forwarded message -----
>