[OSM-dev] Amazing OSM

Tue Nov 6 07:51:51 GMT 2012

It is well known that OSM is an excellent research and development arena
for a wide range of institutions. Of course, the major effort is done by
the huge number of enthusiastic editors and people taking care of the
source data services. Many thanks to them. But, the real value of all that
effort comes to the scene by work of many developers making OSM data based
applications ranging from simple static maps for a small area containing
just a few object classes up to the most complex and streaming based
mappings for the Planet. I am one of those working on that side, the
application side, of OSM.

Doing parametric (vs. raster) data format based mapping implicitly imply
certain form for data preparation (anomaly/error detection, reparation,
defragmentation, data reduction, format change …). The data preparation is
highly application dependent and as a rule based on subjective and
heuristic criteria (except some trivial cases). For example, the “8”, the
self-crossing area border line is error to some but not to others, the same
with a hole/inner area border line touching (or having a common section or
even partly being outside) the container/outer border line, partly
overlapping complex and complicated areas/multipolygons, lakes in/over
lakes, replications, ordinary road sections tagged as roundabouts (or the
contrary), line sections with consecutive nodes A,B,A,B,A, almost
overlapping areas … just to mention some. Many of these anomalies you never
see in raster mappings (blue is blue, brown is brown …). But then, when you
do all the intense preparation comes the highly rewarding part of the
mapping. You can do the most amazing things with your mapping. This
excitement has triggered me to write these bullets. Besides, there were
questions like how many points are in the OSM source DB, how many
poly-lines, and replications and so on.

Let me present some (maybe boring) data from the end of September Planet
dump (I do not make data logs and protocols for every preparation). The
object classes taken in account are: land/sea (area objects created from
the coastline class), lakes, forests, rivers, channels, farms, industry,
parks, residential and buildings. Further: motorway, trunk, primary,
secondary, tertiary roads, living-street, path, railroad, country border,
state border, ferry, tramway, river-lines, channel-lines and streets
line-work classes. Point objects are not in focus here.

The input/source number of poly-lines   135 657 272

The input/source number of points         1 667 326 987

The number of replicated poly-lines        132 528

The number of replicated points              10 250 786

The number of detected errors                 60 468

The number of corrected errors                47 039

Some notes:

-When detecting and removing replications the order of procedures is
essential. There is a considerable difference whether you first detect and
remove poly-line replications and so points or the contrary. Also, after
removing all the mentioned replications still may be many replicated
poly-lines after linear connections (two different poly-line sets still may
create the same poly-line after linear convention).

-Some other redundant data is not counted as replication. For example,
common border sections between area fragments in the same class. Or, some
editors (familiar with the white pixel paradigm on alignments) are making a
slight/thin overlap on common borders.

-Not all uncorrected erroneous objects are ignored. For example many
ordinary road sections tagged as roundabouts are just moved back to the
ordinary sections in the same class.

-Many errors are hidden in a single class but are present when more classes
are simultaneously analyzed. For example, many lakes (or lake sections) are
inside/over the water of the land/sea object class (so, careful with only
diff based updates) . And so on.

The data preparation reduces the data amount typically by 25-30% (before
the data scale levels generation).

But then, after all that heavy work when the mapping system is ready comes
the rewarding part. It is really amazing what you can do and only the
fantasy is a limitation:

Besides the usual LBSs, navigation/routing, sub-mappings … you can:

-With a single click se the huge Amazonas river system blinking (though
there are still some gaps on smaller side-rivers).

-In the matter of seconds calculate the total length of all Planet
coastlines, coastline of a continent, of a huge lake…

-In the same way estimate the fresh-water amount on the Planet, estimate
the water-surface of the Danube river system …

-Simulate how the planet looks like in the night when cities are
approximately with the same lightness …

-Safety monitoring within huge sea areas to avoid floating-object
collisions (of, practically, unlimited number), inside territorial waters
(a corridor belt along the coastline), around critical objects (like
platforms) and so on.

Finally, what is really exciting is that the experience shows that OSM is
becoming richer and richer with data and details and the number of
irresolvable errors (or those that need manual intervention) is decreasing.

Best regards, Sandor.
B.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20121106/071a5f47/attachment.html>