[OSM-dev] Osmosis Plans

Brett Henderson brett at bretth.com
Tue Sep 25 10:14:20 BST 2007


spaetz wrote:
> On Tue, Sep 25, 2007 at 02:31:26PM +1000, Brett Henderson wrote:
>   
>> 1. Planet dumping.
>>     
> If the simple planet.c is faster, it might do the job just as well. Don't know
>   
Yep, no issue here.  If planet.c and osmosis use the same query 
mechanism the speed should be very similar but that's probably not the 
case.  Osmosis is currently using temp files for way generation to avoid 
query timeouts, there's probably a better way to do this.  I've avoided 
using multiple queries per table but given MySQL constraints it may be 
the best approach.
>> 4. Database snapshotting.
>>     
> Haven't used it myself, because, as you say, it isn't useful with inconsistent history tables.
>
>   
>> 5. Changeset derivation from database.
>>     
>
> I am not sure, as I haven't tried it. Would it be possible to get eg daily diffs quickly from the main db? What's your estimation on how log such a diff generation would take?
>
>   
>> 6. Changeset application to offline mysql database.
>>     
>
> I see that as very useful. We could update e.g. the zappy API a lot quicker in shorter intervals.
>
>   
>> 7. Polygon extraction.
>> 8. User activity reporting.
>> 9. Replication to alternative schema.
>>     
>
>
>   
>> I'm keen to hear people's thoughts.  I'm not sure what I should focus 
>> on.  I believe the replication features would be useful to help the 
>> project scale to a much larger size.
>>     
>
> Personally, I would find it interesting to get daily diffs out and see what people do with that. I don't know if disk space is an issue, the planets are stored on dev and that has not infinite disk space.
>   

I produced some stats a while ago on an older database (up to August) 
with full history, this is what I found.  To summarise, at that point in 
time it took less than 5 minutes to generate an osc (osmChange) file 
containing a day's worth of changes.  Note that it always runs far 
quicker on my desktop than the production system.

As for disk space, in the short term it would probably make sense to run 
it alongside the normal planet creation processes and delete daily files 
after a week or so.  This would provide an opportunity to test it out, 
iron out any kinks and generally test its usefulness.  I'm happy to help 
out in any way I can.

I produced a set of monthly changesets from the start of 2006.  The file 
sizes are as follows:
20060101.osm            53452410
20060101-20060201.osc   29113922
20060201-20060301.osc   39577139
20060301-20060401.osc   25482918
20060401-20060501.osc  279046515
20060501-20060601.osc 1401380463
20060601-20060701.osc  747336625
20060701-20060801.osc  862656139
20060801-20060901.osc  777061744
20060901-20061001.osc  838770700
20061001-20061101.osc 1261867463
20061101-20061201.osc  327296919
20061201-20070101.osc  959647780
20070101-20070201.osc 1223347868
20070201-20070301.osc  298255681
20070301-20070401.osc  326782626
20070401-20070501.osc  802615880
20070501-20070601.osc  851776524
20070601-20070701.osc  848936402
20070701-20070801.osc 1012877148

The most recent one (20070701-20070801.osc) is quite large so I'll focus 
on it.  It took 43m26s to produce.

Splitting into 7 day intervals and ignoring the last couple of days 
produces the following stats.
File                  Size      Duration
20070701-20070708.osc 164027848 10m37s
20070708-20070715.osc 220498572 10m10s
20070715-20070722.osc 347431979 14m42s
20070722-20070729.osc 253686702 13m40s

Selecting the biggest file in the interval (20070715-20070722.osc) and 
breaking into 1 day intervals produces the following stats.
File                  Size     Duration
20070715-20070716.osc 16684056 1m41s
20070716-20070717.osc 44242820 2m55s
20070722-20070729.osc 62863288 3m02s
20070718-20070719.osc 45727744 2m31s
20070719-20070720.osc 54787426 2m27s
20070720-20070721.osc 77041093 3m07s
20070721-20070722.osc 63953029 2m44s

Selecting the biggest file in the interval (20070720-20070721.osc) and 
breaking into 4 hour intervals produces the following stats.
File                      Size     Duration
2007072000-2007072004.osc  8932383 10.229s
2007072004-2007072008.osc 17590822 17.930s
2007072008-2007072012.osc 12615890 16.762s
2007072012-2007072016.osc 14360116 21.107s
2007072016-2007072020.osc 13339044 23.095s
2007072020-2007072100.osc 10841897 31.845s (Note: running a second time 
took 21.129s)

Selecting the biggest file in the interval (2007072004-2007072008.osc) 
and breaking into 1 hour intervals produces the following stats.
File                      Size    Duration (run 3 times)
2007072004-2007072005.osc 3160144 4.872s,4.655s,4.985s
2007072005-2007072006.osc 4156641 5.249s,5.185s,4.729s
2007072006-2007072007.osc 5766073 9.204s,8.871s,8.638s
2007072007-2007072008.osc 4584598 4.250s,4.650s,3.983s





More information about the dev mailing list