# [OSRM-talk] Map Matching Plugin Questions

Patrick Niklaus patrick.niklaus at student.kit.edu
Thu May 7 10:40:43 UTC 2015

> - Did you implement all of the described HMM break conditions (route
> localization, low probability routes, GPS outliers)? After reading the
> code in OSRM, I was only able to find the "low probability routes"
> condition. Did I overlook something?

The localization is implemented by choosing the candidates before we
start the algorithm. For each input point we adaptively chose between
5 and 10 candidates based on the distance to the previous input point.
That part of the algorithm can be found in "plugins/match.hpp". The
outliers test is not implemented, I'm not sure it would add much value
over the limited search radius for candidates combined with the
pruning based on transition probability.

>
> - As far as I understand, MAX_DISTANCE_DELTA corresponds to the delta
> when comparing the route length and great circle distance for the "low
> probability routes" condition. The paper states a delta of 2000m, the
> implementation uses a delta of 200m. Feature or bug?
>

I found that 2000m is a little bit on the conservative side. At least
for my data 200m worked pretty well (sampling period was approximately
7s).
Please not that most parameters are tuned for sampling periods of
around 5 to 10 seconds.

> - What exactly does the "confidence" return value mean?
>

Since we are dealing with real world data, matching will fail for some
traces. That might be cause the trace is too noisy or the data from
OpenStreetMap has problems like connectivity errors. To get a handle
on that I gathered some empirical data on mismatched traces and tried
to find a good feature to classify matchings are valid or invalid. The
feature that worked best for me was the ratio between trace length and
matching length (the intuition here is that invalid matchings tend to
contain "loops" where detours are taken). I used that labeled data to
fit a Laplacian distribution and constructed a naive Bayes classifier
based on that.
The "confidence" is the probability P(x \in valid). The values are
only based on ~800 labeled traces which specific sampling rate, so
take that value with a grain of salt for your data.

What is missing is a good parameter selection based on the sample rate
of the input. Its not clear when I will have time again to do that
(for now massaging the data to fit the current constraints works quite
well).