[OSM-dev] Deriving Change Sets

Sun Jul 1 07:01:01 BST 2007

Brett Henderson wrote:
> Frederik Ramm wrote:
>> This is something I would really love to see get off the ground. (I 
>> remember an evening at the Essen meeting where I complained about the 
>> weekly dump, and Nick Black said something like "well we could do it 
>> daily", and I said "daily is not enough", and he went "well one could 
>> do hourly with proper equiment" and I said "dumps, dumps, dumps, I 
>> don't want no stupid dumps, I want live data..." - a discussion ensued 
>> about what you'd possibly need live data for, but until today I 
>> maintain that we should just provide data as live as possible without 
>> asking what people want to use it for.)
> At the risk of rambling, I feel the same way.  Dumps are extremely 
> valuable, simple to implement, and probably the right solution for many 
> problems.  But they have their limitations such as:
> * Data is always out of date.
> * Attempting to increase dump frequency adds significant load to the 
> data source (ie. the database and other osm server infrastructure).
> * Data is re-transmitted every time a new dump is requested adding to 
> network utilisation.
> * At the receiving end, the complete dataset must be processed every 
> time a new dump is utilised.

+1

> A method of synchronisation avoids the problems above.
> 
> A regular synchronisation mechanism enables some of the following 
> possibilities:
> * Some end users such as mapnik can produce more up-to-date maps and can 
> avoid significant processing by only importing changes.
> * In order to alleviate load from the current API and primary database, 
> current users of the API such as tiles at home *may* (this may be 
> controversial :-) be able to switch to using a "near live" feed.
> * Tasks can respond to changes more effectively.  To use tiles at home as 
> an example, the replication task from primary database to rendering 
> database could examine the nodes and automatically flag tiles that need 
> re-rendering thus eliminating the need to manually request tile re-renders.
> * Read-only tasks without hard real-time requirements can be moved off 
> the API (and core database) thus allowing the core infrastructure to 
> scale to a larger number of users.

+1

>> But I always thought - as long as "near live" feeds are what one wants 
>> - it would be much cheaper in terms of processing power to simply log 
>> each change as performed by the API.
> Agree.

that's what I've been proposing in the IRC channel