[Rebuild] Status update on the redaction bot

Andy Allan gravitystorm at gmail.com
Thu Jul 12 15:06:49 BST 2012


Hi All,

It's been a busy first 24 hours!

First off, I discovered a bug in the bot. Both way_nodes and
relation_members are ordered, and both have a sequence_id column. What
I hadn't been expecting is that the sequences would start from
different numbers - way_nodes start from sequence_id 1, whereas
relation_members start from sequence_id 0. It hadn't been picked up in
testing, since the same assumption was built into the history extract
loader, and so the two bugs cancelled each other out! Still, simple to
fix.

Second, we discovered a problem with the way that Osmosis generates
the replication diffs. We found out that whenever a row in the history
tables (nodes, ways or relations) changes, it treats that row as
needing to be put into the diff. Unfortunately since we introduced the
redaction_id column in March that's been a bit of a ticking time bomb!
If you redact, say, version 2 of a node with 7 versions, then the next
diff contained version 2, since osmosis detected the update to the
row. We suspended the diffs immediately yesterday (thanks Grant) and
worked on a patch for osmosis this morning (thanks to Frederik and the
rest of #osm-dev). Diffs have now resumed (thanks Tom), without any
garbage being spewed into them any more.

Third, it turns out that certain very old relations have unexpected
sequence_ids stored in the database. Gah! relation_members.sequence_id
always resets to 0 for each version of the relation, but some very old
ones the sequence starts at higher numbers (actually, the sequence_id
increases monotonically regardless of version) due to the way the code
used to work. This caused the generation of sparse arrays, which the
bot rightly stopped on, and took several hours to figure out, track
down and test the fix for. Thanks to Matt for helping with the
investigations.

So now we're back on track, and the bot is running again. If you are
interested in the detailed logs it produces, they are copied over to
http://gravitystorm.dev.openstreetmap.org/redactions/logs-live/ every
minute or so. I'm still keeping a close eye on it as it progresses, so
it's not running flat out yet.

Cheers,
Andy



More information about the Rebuild mailing list