[Imports] Importing Kerala, India road network from Facebook's ML generated data

Thu Aug 30 09:51:05 UTC 2018

On Wednesday 29 August 2018, Blake Girardot wrote:
> Greetings,
>
> An import specific wiki page has been created for this import:
>
> https://wiki.openstreetmap.org/wiki/Kerala_Road_Import
>

Ok, that looks much better already.  Based on that here my review.

The data files are from August 21 and contain the conflation with OSM 
data which raises the question i already hinted at - you will likely 
have significant problems with editing conflicts.  This applies to both 
hard conflicts you need to resolve and semantic incompatibilities - if 
for example buildings have been added in the meantime which happen to 
intersect some of the roads to be imported.

Looking over the data it seems that the number of roads to be added is 
relatively small compared to the number of roads that already exists in 
most parts (which is a good thing obviously).  Where roads are added 
and where not looks arbitrary in a lot of cases when looking at 
satellite images - there is no visible reason in many cases why a 
certain road is in the data and a nearby much more prominent road is 
not.  This is not bad per se, just curious.

The data contains a source=digitalglobe tag which should only be added 
to the changesets and not the data.  The import=yes tag should 
also be removed before upload.

So far the general remarks, now regarding the data quality.  This is 
based not on looking at all of the data but at a cross section - 
covering both samples near the coast and in the mountains.

Positional accuracy

First the available images show a fairly large variance in alignment, in 
particular in the mountain areas, and there is no indication that the 
images the road geometries are generated from have the best positional 
accuracy than others.  In fact i found several places where the image 
matching the roads had visibly the largest off-nadir angle and 
therefore likely the largest positional error.  Offsets between the 
different images available are frequently about 20m, sometimes more 
than 50m.  Existing road data has a similar level of accuracy, in a few 
cases also more than 50m offset to the average alignment of the image 
layers.

Geometry data

Regarding the geometries - i would estimate about 10-15 percent of the 
road geometries are clearly faulty, the most common cases were:

* nonsense geometries resulting from conflating roads with existing data 
with significant relative offset.
* intersections between roads without nodes
* roads drawn where there is evidently no road

I would estimate an additional 10 percent where without additional data 
(ground level photos or local knowledge) you can't reliably verify if 
there is a road (i.e. the geometry looks guessed and doubtful but you 
can neither verify not falsify it reliably).

Tagging

The tags assigned to the generated roads looks wrong in the majority of 
cases specifically i would estimate:

* unclassified: wrong in about 60 percent of cases (in particular where 
the road has no connecting function)
* residential: evidently wrong in about 70 percent of cases (mostly 
because no residential buildings near it or because it is clearly just 
a service road)
* service: too few for an accurate estimate but mostly wrong (in 
particular roads with a connecting function)
* path/footway: too few for an accurate estimate but most are likely 
wide enough for cars, hence wrong
* track: too few as well but this might actually be correct in the 
majority of cases

Conclusions and recommendations:

* there is no basis for the tags chosen - replacing them all with 
highway=road would be a big improvement.
* running the import as is would create significant technical debt 
because it would conflate data with different alignments all of unknown 
accuracy.  Improving the overall accuracy later or just mapping other 
stuff with better accuracy would require a lot of hand work 
(essentially checking and correcting every road manually) which would 
be much more work in total than importing the data.

The second point is of course something you also have with manual 
mapping to some extent but

* you don't accumulate that much debt in such a short time.
* you have the possibility to significantly improve the accuracy by 
aligning images locally based on ground reference data or by taking 
into account other image sources.  With the errors mentioned above the 
difference this can make is significant.

Overall my assessment of this is that the work required to bring the 
data shown to a level of quality similar to good quality manual mapping 
is probably similar - if not larger - than mapping the roads manually.  
For Facebook this is not so relevant because (a) they have made the 
overall decision they want to take this approach independent of its 
efficiency in the individual case and (b) they are doing a mixed 
calculation that a large part of the required work is either done 
through free labour from the community rather than paid work by their 
staff or not at all.

The Kerala community needs to contemplate and discuss if their goals in 
the long term in mapping their region (read long term here as 5-10 
years) are compatible with that approach and actually more work 
efficient for them than mapping by local mappers (which can still be 
supported by algorithm help). I don't know the answer to this question 
but i have not seen a serious discussion of this question by the local 
community either.

-- end of review --

As a general remark and recommendation to local communities approached 
by international corporations for approving import or organized editing 
plans:  Making such approval contingent on training and hiring locals 
to perform the work could be a useful approach on several levels - both 
to support the local economy and to ensure work is performed with 
proper knowledge of the local geography as well as to support a 
sustained growth of the local community.

-- 
Christoph Hormann
http://www.imagico.de/