[OSM-dev] Osmosis Plans

Robert (Jamie) Munro rjmunro at arjam.net
Tue Sep 25 14:49:25 BST 2007

Hash: SHA1

Brett Henderson wrote:
> spaetz wrote:
>> On Tue, Sep 25, 2007 at 02:31:26PM +1000, Brett Henderson wrote:
>>> 1. Planet dumping.
>> If the simple planet.c is faster, it might do the job just as well. Don't know
> Yep, no issue here.  If planet.c and osmosis use the same query 
> mechanism the speed should be very similar but that's probably not the 
> case.  Osmosis is currently using temp files for way generation to avoid 
> query timeouts, there's probably a better way to do this.  I've avoided 
> using multiple queries per table but given MySQL constraints it may be 
> the best approach.
>>> 4. Database snapshotting.
>> Haven't used it myself, because, as you say, it isn't useful with inconsistent history tables.
>>> 5. Changeset derivation from database.
>> I am not sure, as I haven't tried it. Would it be possible to get eg daily diffs quickly from the main db? What's your estimation on how log such a diff generation would take?
>>> 6. Changeset application to offline mysql database.
>> I see that as very useful. We could update e.g. the zappy API a lot quicker in shorter intervals.
>>> 7. Polygon extraction.
>>> 8. User activity reporting.
>>> 9. Replication to alternative schema.
>>> I'm keen to hear people's thoughts.  I'm not sure what I should focus 
>>> on.  I believe the replication features would be useful to help the 
>>> project scale to a much larger size.
>> Personally, I would find it interesting to get daily diffs out and see what people do with that. I don't know if disk space is an issue, the planets are stored on dev and that has not infinite disk space.
> I produced some stats a while ago on an older database (up to August) 
> with full history, this is what I found.  To summarise, at that point in 
> time it took less than 5 minutes to generate an osc (osmChange) file 
> containing a day's worth of changes.  Note that it always runs far 
> quicker on my desktop than the production system.

If it only takes 5 minutes we can run it much more often than once a
day. IMHO, If we can run hourly dumps in less than half an hour, then we
should. Seeing as it looks (below) like we can run them in probably less
than a minute, I think we should consider going even more frequent -
perhaps every 10 minutes, and combining them into one changeset file
every hour, and combining those every day. Keep enough files around to
quickly go from the last planet dump to the latest 10 minute diff.

Robert (Jamie) Munro

> Selecting the biggest file in the interval (2007072004-2007072008.osc) 
> and breaking into 1 hour intervals produces the following stats.
> File                      Size    Duration (run 3 times)
> 2007072004-2007072005.osc 3160144 4.872s,4.655s,4.985s
> 2007072005-2007072006.osc 4156641 5.249s,5.185s,4.729s
> 2007072006-2007072007.osc 5766073 9.204s,8.871s,8.638s
> 2007072007-2007072008.osc 4584598 4.250s,4.650s,3.983s

Robert (Jamie) Munro
Version: GnuPG v1.4.6 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the dev mailing list