[OSM-dev] GPX for the future

Wed Sep 5 15:10:28 BST 2007

On 9/5/07, Richard Fairhurst <richard at systemed.net> wrote:
> Someone at Microsoft did a talk at Where 2.0 called "What to do with
> thousands of GPS tracks"
> (http://conferences.oreillynet.com/cs/where2007/view/e_sess/13408), to
> which my first thought is "merely thousands?".
>
> So GPS tracks on OSM are currently stored in two ways:
>
> - as files on the server (accessible as, say,
> http://www.openstreetmap.org/trace/37369/data)
> - as points in the db
>
> The former isn't causing a problem AFAICS: storage isn't an issue. The
> latter may be, and is likely to get worse.
>
> We have probably all exacerbated this by being super-conscientious.
> Received opinion within OSM is that you set your GPS to 1point/sec
> where possible, which makes for lovely-looking traces, means that the
> amount of redundancy in the GPS database is massive, but not trivial
> to eliminate.
>
> If we take the position that the files form the "complete" record, and
> the db forms the delivery mechanism to users, then we can look at
> processing the data to make it more efficient.
>
> The obvious way to do this is to simplify on import. In other words,
> the full tracklog is still stored as a file, but surplus info is
> removed from the database: so if you have a straight line
>
>      .   .   .   .
>
> then the middle two points are redundant and can be removed.
>
> Douglas-Peucker is the standard polyline simplification algorithm but,
> as a recursive algorithm, is pretty processor-intensive. But there are
> simpler ways of doing it, e.g. iterate over each point and keep a note
> of the 'heading', and only store a point when it diverges by n
> degrees. We would obviously want to keep n very low so that fidelity
> is still retained for tracing, and at the same time include a minimum
> time threshold (so tracks where the average is every 10s, for example,
> aren't simplified any further).
>
> Because we still have the data stored in files, it doesn't stop us
> from doing funky stuff (e.g. calculating average speeds for a given
> road) in the future if we want to. It just makes the delivery faster
> for our main purpose right now.
>
> Am I smoking crack or would this help?

I'd rather it wasn't done this way. I means that )even if you could
order the gps information e.g. by time), you can't tell whether
there's no reception between the points or it's a straight line that's
been simplified. Also, there's no upper limit to the size of the
table, which will just keep growing as more traces are added.

An alternative* would be to store the gps points as "tiles" at a lower
resolution (e.g. 10cm x 10cm boxes) and on each import increase a
density counter for that tile. The resolution is still good enough for
OSM, and the table would be a known, fixed size (rows = number of
tiles in the planet, or fewer if you don't add lots of empty records).
Josm could draw the tiles as greyscale where a count of one is dark
and 100 is much brighter, and roads would be distinguishable and
traceable.

Quad tile references** could then very quickly return the tiles and
given densities, and a typical API call would return much less data
than at present***.

Cheers,
Andy

* I'm not a coder, and I don't know if this would work
** Definitely talking about my ass with this one
*** It's fun to pretend you're helping when you aren't really :-)