[OSM-dev] Minute Diffs Broken
brett at bretth.com
Tue May 5 10:36:44 BST 2009
Tom Hughes wrote:
> Brett Henderson wrote:
>> That does look interesting. I'd hope to use that outside the main
>> database though. My thoughts were to use triggers to populate short
>> term flag tables which a single threaded process would read, use as
>> keys to select modified data into an offline database, then clear.
>> This offline database could then use a queueing system such as PgQ (I
>> haven't seen it before, will have to check it out) to send events to
>> the various consumers of the data. I'd like to minimise access to
>> the central database if possible because 1. it will scale better, and
>> 2. it adds less burden to existing DBAs.
> It is highly unlikely that anything which requires modifications to
> the database schema and/or adding triggers or anything like that to
> the database will be possible, at least in the short to medium term.
Come on Tom, where's your sense of adventure ;-)
> We're only just getting things stable again and I have no desire to
> start fiddling with things just yet - we need time to let what we have
> bed in properly.
I understand your concerns. I wouldn't have even mentioned it if I had
valid alternatives. At this point I'm feeling somewhat stymied though.
I had a system that worked well under 0.5, but I can't offer the same
service under 0.6.
> On top of that if we're going to start talking about replication style
> solutions then we will need to look carefully at all the available
> systems and consider what will best lend itself to what will doubtless
> be our need to scale to multiple database servers in the future. That
> isn't something we can do quickly.
If you're referring to multi-mastered clustered databases then that is a
whole different problem that shouldn't be confused with what I'm trying
to achieve. I'm simply trying to provide a way for people to access
regular updates in a read-only fashion where data integrity is the
highest priority. By allowing delays in delivery (I'd like to get it
down to a couple of minutes but I'm not aiming for anything like
real-time) it becomes a simpler problem with hopefully a simpler
solution. Any multiple database system is likely to be a long way off
so I can't wait for that.
I have a couple of questions for you in particular.
1. Are there any known issues with the current API that could cause
delays in excess of 5 minutes? Or is it just a fact of life with large
changesets? I guess what I'm asking is longer than 5 minutes a regular
occurrence with a large changeset or is something strange going on in
2. What appear to be the current system bottlenecks? Is the database
already approaching processing capacity or is rails the limiting factor?
3. Is there any way I can change your mind on making db changes ;-)
If 1 is just an intermittent issue then the current issues may be
solveable without changes at the osmosis end. If not then I have to
make a change of some kind.
If the existing db is already a bottleneck then I have to tread very
carefully with what I do. If not then I have some more flexibility.
Having said that, I believe osmosis adds very load on the database
judging by the munin graphs.
As for 3, I won't be asking you to start adding a bunch of triggers and
tables to the database just yet. Any change would have to go through
significant testing to measure its impact. Just as I spent a lot of
time testing before introducing the existing osmosis diffs, I'd be doing
the same for a more reliable replication mechanism. But if there's no
chance of it happening then I won't bother. It might be worth nothing
that there are currently 5 osmosis processes reading from the database,
it is possible to reduce that to 1 with a smarter solution.
More information about the dev