[openstreetmap/openstreetmap-website] Backfill enhanced changeset stats (PR #6347)

Tom Hughes notifications at github.com
Tue Aug 26 23:24:54 UTC 2025


tomhughes left a comment (openstreetmap/openstreetmap-website#6347)

It would have been better if I'd tested that the results were correct though, which they aren't :-(

You can get rid of the nested grouping but you do still need the union, like this:

```sql
WITH changes AS (
    SELECT
      nodes.changeset_id,
      CASE WHEN nodes.version = 1 THEN 1 ELSE 0 END AS num_created_nodes,
      CASE WHEN nodes.version > 1 AND nodes.visible THEN 1 ELSE 0 END AS num_modified_nodes,
      CASE WHEN nodes.version > 1 AND NOT nodes.visible THEN 1 ELSE 0 END AS num_deleted_nodes,
      0 AS num_created_ways,
      0 AS num_modified_ways,
      0 AS num_deleted_ways,
      0 AS num_created_relations,
      0 AS num_modified_relations,
      0 AS num_deleted_relations
    FROM nodes
    WHERE nodes.changeset_id = ANY($1::bigint[])
  UNION ALL
    SELECT
      ways.changeset_id,
      0 AS num_created_nodes,
      0 AS num_modified_nodes,
      0 AS num_deleted_nodes,
      CASE WHEN ways.version = 1 THEN 1 ELSE 0 END AS num_created_ways,
      CASE WHEN ways.version > 1 AND ways.visible THEN 1 ELSE 0 END AS num_modified_ways,
      CASE WHEN ways.version > 1 AND NOT ways.visible THEN 1 ELSE 0 END AS num_deleted_ways,
      0 AS num_created_relations,
      0 AS num_modified_relations,
      0 AS num_deleted_relations
    FROM ways
    WHERE ways.changeset_id = ANY($1::bigint[])
  UNION ALL
    SELECT
      relations.changeset_id,
      0 AS num_created_nodes,
      0 AS num_modified_nodes,
      0 AS num_deleted_nodes,
      0 AS num_created_ways,
      0 AS num_modified_ways,
      0 AS num_deleted_ways,
      CASE WHEN relations.version = 1 THEN 1 ELSE 0 END AS num_created_relations,
      CASE WHEN relations.version > 1 AND relations.visible THEN 1 ELSE 0 END AS num_modified_relations,
      CASE WHEN relations.version > 1 AND NOT relations.visible THEN 1 ELSE 0 END AS num_deleted_relations
    FROM relations
    WHERE relations.changeset_id = ANY($1::bigint[])
),
total AS (
  SELECT
    changes.changeset_id,
    SUM(changes.num_created_nodes) AS num_created_nodes,
    SUM(changes.num_modified_nodes) AS num_modified_nodes,
    SUM(changes.num_deleted_nodes) AS num_deleted_nodes,
    SUM(changes.num_created_ways) AS num_created_ways,
    SUM(changes.num_modified_ways) AS num_modified_ways,
    SUM(changes.num_deleted_ways) AS num_deleted_ways,
    SUM(changes.num_created_relations) AS num_created_relations,
    SUM(changes.num_modified_relations) AS num_modified_relations,
    SUM(changes.num_deleted_relations) AS num_deleted_relations
  FROM changes
  GROUP BY changes.changeset_id
)
UPDATE changesets
SET num_created_nodes      = total.num_created_nodes,
    num_modified_nodes     = total.num_modified_nodes,
    num_deleted_nodes      = total.num_deleted_nodes,
    num_created_ways       = total.num_created_ways,
    num_modified_ways      = total.num_modified_ways,
    num_deleted_ways       = total.num_deleted_ways,
    num_created_relations  = total.num_created_relations,
    num_modified_relations = total.num_modified_relations,
    num_deleted_relations  = total.num_deleted_relations
FROM total
WHERE changesets.id = total.changeset_id
 ```

It turns out though that there doesn't seem to be much point as a naive version using subqueries is just as fast:

```sql
UPDATE changesets
   SET num_created_nodes = (SELECT COUNT(*) FROM nodes WHERE changeset_id = changesets.id AND version = 1),
       num_modified_nodes = (SELECT COUNT(*) FROM nodes WHERE changeset_id = changesets.id AND version > 1 AND visible),
       num_deleted_nodes = (SELECT COUNT(*) FROM nodes WHERE changeset_id = changesets.id AND version > 1 AND NOT visible),
       num_created_ways = (SELECT COUNT(*) FROM ways WHERE changeset_id = changesets.id AND version = 1),
       num_modified_ways = (SELECT COUNT(*) FROM ways WHERE changeset_id = changesets.id AND version > 1 AND visible),
       num_deleted_ways = (SELECT COUNT(*) FROM ways WHERE changeset_id = changesets.id AND version > 1 AND NOT visible),
       num_created_relations = (SELECT COUNT(*) FROM relations WHERE changeset_id = changesets.id AND version = 1),
       num_modified_relations = (SELECT COUNT(*) FROM relations WHERE changeset_id = changesets.id AND version > 1 AND visible),
       num_deleted_relations = (SELECT COUNT(*) FROM relations WHERE changeset_id = changesets.id AND version > 1 AND NOT visible)
WHERE id = ANY($1::bigint[])
```

Both take about 5s to do the first 10000 changesets when the cache is hot for the indexes.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/openstreetmap/openstreetmap-website/pull/6347#issuecomment-3226054386
You are receiving this because you are subscribed to this thread.

Message ID: <openstreetmap/openstreetmap-website/pull/6347/c3226054386 at github.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/rails-dev/attachments/20250826/a63d834a/attachment-0001.htm>


More information about the rails-dev mailing list