[OSM-dev] Osmosis Plans
brett at bretth.com
Tue Sep 25 10:14:20 BST 2007
> On Tue, Sep 25, 2007 at 02:31:26PM +1000, Brett Henderson wrote:
>> 1. Planet dumping.
> If the simple planet.c is faster, it might do the job just as well. Don't know
Yep, no issue here. If planet.c and osmosis use the same query
mechanism the speed should be very similar but that's probably not the
case. Osmosis is currently using temp files for way generation to avoid
query timeouts, there's probably a better way to do this. I've avoided
using multiple queries per table but given MySQL constraints it may be
the best approach.
>> 4. Database snapshotting.
> Haven't used it myself, because, as you say, it isn't useful with inconsistent history tables.
>> 5. Changeset derivation from database.
> I am not sure, as I haven't tried it. Would it be possible to get eg daily diffs quickly from the main db? What's your estimation on how log such a diff generation would take?
>> 6. Changeset application to offline mysql database.
> I see that as very useful. We could update e.g. the zappy API a lot quicker in shorter intervals.
>> 7. Polygon extraction.
>> 8. User activity reporting.
>> 9. Replication to alternative schema.
>> I'm keen to hear people's thoughts. I'm not sure what I should focus
>> on. I believe the replication features would be useful to help the
>> project scale to a much larger size.
> Personally, I would find it interesting to get daily diffs out and see what people do with that. I don't know if disk space is an issue, the planets are stored on dev and that has not infinite disk space.
I produced some stats a while ago on an older database (up to August)
with full history, this is what I found. To summarise, at that point in
time it took less than 5 minutes to generate an osc (osmChange) file
containing a day's worth of changes. Note that it always runs far
quicker on my desktop than the production system.
As for disk space, in the short term it would probably make sense to run
it alongside the normal planet creation processes and delete daily files
after a week or so. This would provide an opportunity to test it out,
iron out any kinks and generally test its usefulness. I'm happy to help
out in any way I can.
I produced a set of monthly changesets from the start of 2006. The file
sizes are as follows:
The most recent one (20070701-20070801.osc) is quite large so I'll focus
on it. It took 43m26s to produce.
Splitting into 7 day intervals and ignoring the last couple of days
produces the following stats.
File Size Duration
20070701-20070708.osc 164027848 10m37s
20070708-20070715.osc 220498572 10m10s
20070715-20070722.osc 347431979 14m42s
20070722-20070729.osc 253686702 13m40s
Selecting the biggest file in the interval (20070715-20070722.osc) and
breaking into 1 day intervals produces the following stats.
File Size Duration
20070715-20070716.osc 16684056 1m41s
20070716-20070717.osc 44242820 2m55s
20070722-20070729.osc 62863288 3m02s
20070718-20070719.osc 45727744 2m31s
20070719-20070720.osc 54787426 2m27s
20070720-20070721.osc 77041093 3m07s
20070721-20070722.osc 63953029 2m44s
Selecting the biggest file in the interval (20070720-20070721.osc) and
breaking into 4 hour intervals produces the following stats.
File Size Duration
2007072000-2007072004.osc 8932383 10.229s
2007072004-2007072008.osc 17590822 17.930s
2007072008-2007072012.osc 12615890 16.762s
2007072012-2007072016.osc 14360116 21.107s
2007072016-2007072020.osc 13339044 23.095s
2007072020-2007072100.osc 10841897 31.845s (Note: running a second time
Selecting the biggest file in the interval (2007072004-2007072008.osc)
and breaking into 1 hour intervals produces the following stats.
File Size Duration (run 3 times)
2007072004-2007072005.osc 3160144 4.872s,4.655s,4.985s
2007072005-2007072006.osc 4156641 5.249s,5.185s,4.729s
2007072006-2007072007.osc 5766073 9.204s,8.871s,8.638s
2007072007-2007072008.osc 4584598 4.250s,4.650s,3.983s
More information about the dev