[OSM-dev] OSM API lookups to complement minutely diffs?

Thu Sep 15 21:11:54 UTC 2016

On Thu, Sep 15, 2016 at 4:53 PM, Stefan Keller <sfkeller at gmail.com> wrote:

> I'm setting up a Kafka publish-subscribe messaging system delivering
> minutely diffs.
>
> AFAIK augmented diffs are rather an experimental feature and I'd like
> to avoid the latency time and blackouts of overpass which runs in same
> server. So I'm concentrating on the main OSM API.
>

I've also experimented with this sort of thing and ran into similar hiccups
as you.

Also, if you want to further reduce latency don't forget about the
streaming replication service:
http://wiki.openstreetmap.org/wiki/Osmosis/Replication#Streaming_Replication_Wire_Protocol

Last I checked it wasn't running, and I think TomH would be hard pressed to
get it to work again, but if you end up building something that would be
useful for a wide array of people it might be worth revisiting that.

> Now, osmChange XML like
> http://www.osm.org/api/0.6/changeset/42143238/download obviously does
> not include all info (e.g. tags) from ref nodes/ways/relations.
>
> I'm not looking for specific info but I'd like to get at least all
> data about nodes/ways/relations which are created/modified/deleted in
> a changeset.
>
> Is it OK to do API lookups like this
> https://www.osm.org/api/0.6/nodes?nodes=59906080,4400821613 even for
> minutely diffs? Any alternatives?

This is where I stopped working on this idea. I ended up making *lots* of
requests to the API server to fill in missing data and I didn't think it
would be very nice to my fellow mappers if I scaled that up and made it run
24/7.

At this point I started experimenting with ways of maintaining a full
history database to help make these sorts of lookups more performant, but
then ran out of time to work on this project.

I think a more interesting path forward for this kind of problem is to
resurrect the aforementioned streaming replication protocol and modify it
to include changeset and "augmented diff"-style information. My intuition
says that making smart SQL queries against a read replica database rather
than adding API load might be more efficient and useful for lots of
consumers. I'm happy to be proven wrong, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/dev/attachments/20160915/e628b6e9/attachment.html>