[Imports] Importing Kerala, India road network from Facebook's ML generated data
Christoph Hormann
osm at imagico.de
Thu Aug 30 09:51:05 UTC 2018
On Wednesday 29 August 2018, Blake Girardot wrote:
> Greetings,
>
> An import specific wiki page has been created for this import:
>
> https://wiki.openstreetmap.org/wiki/Kerala_Road_Import
>
Ok, that looks much better already. Based on that here my review.
The data files are from August 21 and contain the conflation with OSM
data which raises the question i already hinted at - you will likely
have significant problems with editing conflicts. This applies to both
hard conflicts you need to resolve and semantic incompatibilities - if
for example buildings have been added in the meantime which happen to
intersect some of the roads to be imported.
Looking over the data it seems that the number of roads to be added is
relatively small compared to the number of roads that already exists in
most parts (which is a good thing obviously). Where roads are added
and where not looks arbitrary in a lot of cases when looking at
satellite images - there is no visible reason in many cases why a
certain road is in the data and a nearby much more prominent road is
not. This is not bad per se, just curious.
The data contains a source=digitalglobe tag which should only be added
to the changesets and not the data. The import=yes tag should
also be removed before upload.
So far the general remarks, now regarding the data quality. This is
based not on looking at all of the data but at a cross section -
covering both samples near the coast and in the mountains.
Positional accuracy
First the available images show a fairly large variance in alignment, in
particular in the mountain areas, and there is no indication that the
images the road geometries are generated from have the best positional
accuracy than others. In fact i found several places where the image
matching the roads had visibly the largest off-nadir angle and
therefore likely the largest positional error. Offsets between the
different images available are frequently about 20m, sometimes more
than 50m. Existing road data has a similar level of accuracy, in a few
cases also more than 50m offset to the average alignment of the image
layers.
Geometry data
Regarding the geometries - i would estimate about 10-15 percent of the
road geometries are clearly faulty, the most common cases were:
* nonsense geometries resulting from conflating roads with existing data
with significant relative offset.
* intersections between roads without nodes
* roads drawn where there is evidently no road
I would estimate an additional 10 percent where without additional data
(ground level photos or local knowledge) you can't reliably verify if
there is a road (i.e. the geometry looks guessed and doubtful but you
can neither verify not falsify it reliably).
Tagging
The tags assigned to the generated roads looks wrong in the majority of
cases specifically i would estimate:
* unclassified: wrong in about 60 percent of cases (in particular where
the road has no connecting function)
* residential: evidently wrong in about 70 percent of cases (mostly
because no residential buildings near it or because it is clearly just
a service road)
* service: too few for an accurate estimate but mostly wrong (in
particular roads with a connecting function)
* path/footway: too few for an accurate estimate but most are likely
wide enough for cars, hence wrong
* track: too few as well but this might actually be correct in the
majority of cases
Conclusions and recommendations:
* there is no basis for the tags chosen - replacing them all with
highway=road would be a big improvement.
* running the import as is would create significant technical debt
because it would conflate data with different alignments all of unknown
accuracy. Improving the overall accuracy later or just mapping other
stuff with better accuracy would require a lot of hand work
(essentially checking and correcting every road manually) which would
be much more work in total than importing the data.
The second point is of course something you also have with manual
mapping to some extent but
* you don't accumulate that much debt in such a short time.
* you have the possibility to significantly improve the accuracy by
aligning images locally based on ground reference data or by taking
into account other image sources. With the errors mentioned above the
difference this can make is significant.
Overall my assessment of this is that the work required to bring the
data shown to a level of quality similar to good quality manual mapping
is probably similar - if not larger - than mapping the roads manually.
For Facebook this is not so relevant because (a) they have made the
overall decision they want to take this approach independent of its
efficiency in the individual case and (b) they are doing a mixed
calculation that a large part of the required work is either done
through free labour from the community rather than paid work by their
staff or not at all.
The Kerala community needs to contemplate and discuss if their goals in
the long term in mapping their region (read long term here as 5-10
years) are compatible with that approach and actually more work
efficient for them than mapping by local mappers (which can still be
supported by algorithm help). I don't know the answer to this question
but i have not seen a serious discussion of this question by the local
community either.
-- end of review --
As a general remark and recommendation to local communities approached
by international corporations for approving import or organized editing
plans: Making such approval contingent on training and hiring locals
to perform the work could be a useful approach on several levels - both
to support the local economy and to ensure work is performed with
proper knowledge of the local geography as well as to support a
sustained growth of the local community.
--
Christoph Hormann
http://www.imagico.de/
More information about the Imports
mailing list