[OSM-dev] Anomaly detection. Ref Adam, Deric
Sandor Seres
sandors39 at gmail.com
Fri Jun 15 11:04:36 BST 2012
Hi. Reading Adam's Anomaly detection spec and Deric's comments I take the
liberty to make also some comments and some suggestions to Adam.
I assume you are using the anomaly notion as a polite synonym for errors.
As a political strategy this is fine but we should not close our eyes
regarding types and numbers of errors in OSM data. Talking of errors and
their number may irritate some people and you may easily be marked as
someone advertizing in favor of commercial data sources. I also fully
disagree with statements like “…make errors, we don’t mind…”. This might be
fine from editors’ point of view (they are rarely in conflict) but
some/many errors sometimes may appear as a nightmare for people
developing/maintaining mapping systems, navigation systems, LBSs and so on.
Besides, the errors (as opposed to correct) are often subjective criteria
based. Some events may appear as errors for someone and may just be fine
for others.
The subject you have selected is essential for the OSM users but too
general and abstract. You may create different error classifications like:
unintentional errors (the editor is not aware of it) and vandalism
(intentional errors to harm), formal (show stopper) and logical (some can
live with them) errors and the like. But again, there is no sharp border
between similar classes. Note that you even intend to develop an “engine”
(a running system) for anomaly (something that differs from normal)
detection. I am afraid you are not aware what and how many fine details you
are going to meet developing such engine. I don’t want to discourage you,
at the end; the decision is up to you and your mentor. Anyway, I would
suggest focusing and selecting a more specific and really actual OSM data
problem. There are many of them. Just to mention some examples:
-Side conflict (or self crossing) of area borders. There are many of them.
What more, this event is inevitable in vector smoothing/reduction when
creating scale levels for an area object class. Some GIS DB systems do not
tolerate such events and stop working or refuses such cases in a control
procedure. There are certain solutions to the problem but with side effects
and mostly in commercial versions. For example (a rather expensive system)
solves this problem by petitioning the area between self crossings into
smaller areas. A side effect is brakes on thin long areas like rivers,
fiords and so on. You could focus on exchanging (and re-orienting) the
border sections between the consecutive self crossing points. This may be a
good solution with no side effect.
-Detect erroneous roundabouts (RA) in a road/class, for instance in primary
roads. There are many, many of them. There are ordinary road sections
tagged as RA, RAs tagged as ordinary road sections (or not tagged as RAs),
formal errors on RAs, no connections (or disconnected) RAs and ordinary
road sections and so on. While in raster mapping many of these errors are
hidden in a vector based mapping they cause serious problems.
-And just to mention one more case, the river/channel fragmentation
problem. Naturally, rivers and channels are creating water-way systems,
just like roads and streets. But these are in a highly fragmented format in
the source data. What more, there are often missing (disappearing)
fragments from version to version. So, how to detect and repair the errors
and create a water-way system is still current.
You may find even more specific and motivating “anomalies” in some of my
OSM data error detection and reparation (internal) notes from some mounts
ago. You may download and freely use them from here (for the best view,
save the original format, G anti-aliases anti-aliased sections with a poor
result):
https://docs.google.com/open?id=0B6qGm3k2qWHqeTNYcVVpRy0zSVk
https://docs.google.com/open?id=0B6qGm3k2qWHqRlhfSU9MV2YxOHM
https://docs.google.com/open?id=0B6qGm3k2qWHqeHg2VEZXQVNNVjg
Of course, the errors/anomalies in the source data are just coming on
going, but they are constantly there. Therefore, any efforts providing
better data quality is of great importance and value for us OSM users.
I wish you (and your mentor) great success with your project. Best regards,
Sandor.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20120615/9f22518b/attachment.html>
More information about the dev
mailing list