[Rebuild] Strategy for running the bot in "regions"

Mon Jun 25 16:27:39 BST 2012

On 25 June 2012 11:42, Andy Allan <gravitystorm at gmail.com> wrote:

> I guess also the bot will need to deal with making its selection,
> doing the processing, and finding that the changeset gets rejected due
> to version-mismatches as people continue editing as it does its job.

So putting all the ideas together, and thrashing it through with Matt,
here's the proposal for how the bot should approach handling the
entire planet. The key points I think are that it uses a "candidate
list" to avoid processing entities only ever touched by acceptors, and
that it uses regions to both order the work and allows us to run
multiple copies in parallel.

Feedback of course is welcome!

============================

* Candidate list of all possible entities needing work
** either from wtfe or from processing the history dump

* Region list
** Stored as an ordered list of 1 degree tiles
** Ordered by processing a list of bounding boxes generated by LWG or CWG
** Bot takes next (where not a neighbour of one marked in progress
already) from list, marks as in progress, then done when complete

* Postgres meta-information database containing the above
** Candidate list of entities
** Regions of the world, in order of processing

* The bot then
** Picks an area, marking it as in progress
** Binary chops it into much smaller fragments
** Makes these map calls, binary chopping as required, to make a list
of entities
** Uses the candidate list to filter the results
** Uses the exclusion lists and adopted users etc.
** Fetches history through direct db calls or via moderators non-redacted call
** Processes the histories and builds changesets
** Breaks down changesets into manageable sizes
** Tries applying changesets
*** If works, marks the redactions and logs the candidates as processed
*** If it fails, either retries (to deal with changed data) or bails
** Logs the generated changeset XML and redaction lists to plain-text logs
** Marks regions as completed (or failed) and entities as processed in
meta-information db

** Multiple bots can work in parallel, taking next unstarted area from
the list where not neighbouring an existing "in-progress" area (this
is to try to avoid bots causing unnecessary retries for each other by
stomping on each others' data).

* Second-pass bot then
** Ignores the regions
** Works through the candidates processing changesets and redactions
** This picks up deleted items not in map calls, failed changesets,
and floating relations
** Logs failed changesets as needing manual intervention

==============

Cheers,
Andy