[Talk-us] Imports and Mass Edits in the US
Greg Troxel
gdt at ir.bbn.com
Tue Dec 18 02:12:26 GMT 2012
The result is that folks like myself and others are frustrated by the
import process, and folks who have good, useful datasets are frstrated
by the import process.
[import/mechanical-edit committee proposal]
I agree with your broad sentiments.
Having observed some recent discussion, I think we have two fundamental
problems:
1) the import guidelines don't adequately describe what is actually
expected (reasonably so) by the more experienced people
2) people who want to import are very enthusiastic and often do not
fully appreciate the difficulty of doing it right and the benefits of
review and care, and aech new would-be importer needs to have the
norms communicated to them
I have a concern that while there is wide agreement that imports must be
careful, there is also a view (which I perceive to be a minority view)
that all imports are harmful. For the committee and "import with care"
effort to be socially successful, I think it has to be separated from
the "do not import at all" view. I think your note expresses that
separation (or rather, only expresses the view that imports must be done
with care, and I am speculating that you did that on purpose), but I
wanted to mention this explicitly.
I realize my proposals below may come across as strict, but I am
actually in favor of careful imports of high-quality data, when done by
people with a sense of stewardship for the affected area. (I'm in
Massachusetts, and most of the MassGIS data is very high quality, so
that's my implicit reference point.) So I am not trying to stop
imports; rather, I think that with more care and especially more delays
for review, we'll get a better outcome in terms of the ratio of map
utiltity to total volunteer time.
My thinking is heavily influenced by the experience of leading a
~20-person software team, with a loose analogy of preparing changes on
branches and then merging to master with approval. I know imported data
isn't software, but in terms of preparing bits and then changing the
shared code/data base, I think it's quite analagous.
Overall I suggest three concrete steps:
1) document the actual expectations on the wiki. Specifically
a) The conversion process has to be described well enough to be
considered High Level Design from a software viewpoint so that
someone else could write the conversion scripts. This should
address datum/projection issues. Most importantly, it should
address how the import avoids new data that conflicts with old
data. The plan should describe which tools will be used to put the
data in the main database
b) The actual data to be uploaded (with all pre-upload cleanup
actually done, not the notion that each file will get manual
cleanup before uploading) has to be posted for review.
c) No data can be uploaded until the per-import page has met the
standards, and the scripts and converted data that will be uploaded
has been published, and there's been a 14 day review period, which
is reset by any substantive change in the page or any change in the
script or data.
d) (probably) the data should be uploaded to some test server
(assuming there is one) so that people can see what happens in the
database and with rendering. Each person doing uploads should be
expected to do the test server upload.
e) Once the two weeks have passed, and there is rough consensus
that the plan and data are adequate, a small amount of data (but
bigger than can be examined 100% by hand) can be uploaded. The
idea is to have something that is not that big in case there is
trouble, but for which the process will be representative of the
rest. An example would be a single town in Massachusetts, with
thousands of buildings or address points or hundreds of roads.
f) After the initial small upload, there is another 14 day review
period, during which people can find issues with the data. If
there are significant issues, the proposal, script and data should
be fixed, and the 14-day review period in step c starts anew
2) Add the notion that when people talk about imports, the committee
contacts them privately and makes sure they really understand point
1. Probably also a public note in response, briefer. Someone from
the committee should stay in touch about judging when the consensus
in (e) has happened. Overall, aside from documenting the norms, I
see this as the main job of the committee.
3) For areas where it makes sense, consider sending private messages
via the web site to registered active mappers in the area. For
example, if after the MassGIS buildings import entered the 14-day
review period (where all concerns had been met), it might make
sense to message every Mass mapper who has edited in the last 90
days and point out the wiki page and that it's being discussed on
talk-us at .
(Everyone knows that this discussion was triggered by the massgis
buildings import.) I should emphasize that I'm not trying to pick on
Jason here. I think data was uploaded too soon, and too many towns.
But, I have looked at the map of data that's been imported, and driven
around today with the data on my Nuvi (last night's us-northeast
geofabrik extract - thanks again to Frederik for providing those) plus
the not-yet-imported data for my town and all the towns between it and
Cambridge. Aside from one glitch in not-yet-imported data (in one
town), everything looked excellent in terms of accuracy. I would see a
small building on the map, find it odd, and then look at the real world
and in fact it was there, every time. There were a few houses that are
not in the data (probably due to tree cover). The only building on the
map that isn't there was one in my town that was torn down one year ago.
I could not tell when I crossed from new data to data from the previous
lidar import. I did not see a single overlapping/messy building. So
while there are (entirely fair) process concerns about long enough
review periods (which are NOT documented on the import guidelines
page!), the actual uploaded data looks good to me. I'm unaware of
anyone pointing out a specific significant issue (or really any issue)
with the uploaded data.
Greg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://lists.openstreetmap.org/pipermail/talk-us/attachments/20121217/bd10e1c1/attachment.pgp>
More information about the Talk-us
mailing list