<p></p>

<p><b>@tomhughes</b> requested changes on this pull request.</p>


<p dir="auto">I'm not sure why you say union is hard? We have <a href="https://github.com/brianhempel/active_record_union">https://github.com/brianhempel/active_record_union</a> installed already so you can just use the <code class="notranslate">union</code> or <code class="notranslate">union_all</code> method on a relation to merge it with another one?</p><hr>


<p>In <a href="https://github.com/openstreetmap/openstreetmap-website/pull/5761#discussion_r2001813159">lib/user_activities.rb</a>:</p>

<pre style='color:#555'>> +  # Common SQL for activity days

+  private_class_method def self.activity_days_sql(_user_id)

+    <<~SQL.squish

+      SELECT DATE_TRUNC('day', changesets.created_at) AS day FROM changesets WHERE user_id = :user_id

+      UNION ALL

+      SELECT DATE_TRUNC('day', diary_entries.created_at) AS day FROM diary_entries WHERE user_id = :user_id AND visible = true

+      UNION ALL

+      SELECT DATE_TRUNC('day', changeset_comments.created_at) AS day FROM changeset_comments WHERE author_id = :user_id

+      UNION ALL

+      SELECT DATE_TRUNC('day', note_comments.created_at) AS day FROM note_comments WHERE author_id = :user_id

+      UNION ALL

+      SELECT DATE_TRUNC('day', diary_comments.created_at) AS day FROM diary_comments WHERE user_id = :user_id AND visible = true

+      UNION ALL

+      SELECT DATE_TRUNC('day', gpx_files.timestamp) AS day FROM gpx_files WHERE user_id = :user_id

+    SQL

+  end

</pre>

<p dir="auto">The problem with doing the union first, and only applying the order and limit to the result is that we can't use any indexes and will wind up fetching all the user's activity for all time and the sorting the combined result before applying the limit as this plan shows:</p>

<pre class="notranslate"><code class="notranslate">                                                                       QUERY PLAN                                                                       

--------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=2981.15..2981.16 rows=5 width=8)

   ->  Sort  (cost=2981.15..2995.09 rows=5575 width=8)

         Sort Key: (date_trunc('day'::text, changesets.created_at)) DESC

         ->  Append  (cost=0.57..2888.55 rows=5575 width=8)

               ->  Index Only Scan using changesets_user_id_created_at_idx on changesets  (cost=0.57..1406.24 rows=4373 width=8)

                     Index Cond: (user_id = 3980)

               ->  Index Scan using diary_entry_user_id_created_at_index on diary_entries  (cost=0.42..5.02 rows=4 width=8)

                     Index Cond: (user_id = 3980)

                     Filter: visible

               ->  Index Only Scan using index_changeset_comments_on_author_id_and_created_at on changeset_comments  (cost=0.43..19.49 rows=96 width=8)

                     Index Cond: (author_id = 3980)

               ->  Index Only Scan using index_note_comments_on_author_id_and_created_at on note_comments  (cost=0.43..151.67 rows=418 width=8)

                     Index Cond: (author_id = 3980)

               ->  Index Scan using index_diary_comments_on_user_id_and_id on diary_comments  (cost=0.41..263.85 rows=268 width=8)

                     Index Cond: (user_id = 3980)

                     Filter: visible

               ->  Index Scan using gpx_files_user_id_idx on gpx_files  (cost=0.43..206.02 rows=416 width=8)

                     Index Cond: (user_id = 3980)

(18 rows)

</code></pre>

<p dir="auto">if we use subqueries to sort and limit each branch before the union like this:</p>

<div class="highlight highlight-source-sql" dir="auto"><pre class="notranslate">(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">changesets</span>.<span class="pl-c1">created_at</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> changesets <span class="pl-k">WHERE</span> user_id <span class="pl-k">=</span> :user_id <span class="pl-k">ORDER BY</span> created_at <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)

<span class="pl-k">UNION ALL</span>

(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">diary_entries</span>.<span class="pl-c1">created_at</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> diary_entries <span class="pl-k">WHERE</span> user_id <span class="pl-k">=</span> :user_id <span class="pl-k">AND</span> visible <span class="pl-k">=</span> true <span class="pl-k">ORDER BY</span> created_at <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)

<span class="pl-k">UNION ALL</span>

(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">changeset_comments</span>.<span class="pl-c1">created_at</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> changeset_comments <span class="pl-k">WHERE</span> author_id <span class="pl-k">=</span> :user_id <span class="pl-k">ORDER BY</span> created_at <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)

<span class="pl-k">UNION ALL</span>

(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">note_comments</span>.<span class="pl-c1">created_at</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> note_comments <span class="pl-k">WHERE</span> author_id <span class="pl-k">=</span> :user_id <span class="pl-k">ORDER BY</span> created_at <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)

<span class="pl-k">UNION ALL</span>

(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">diary_comments</span>.<span class="pl-c1">created_at</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> diary_comments <span class="pl-k">WHERE</span> user_id <span class="pl-k">=</span> :user_id <span class="pl-k">AND</span> visible <span class="pl-k">=</span> true <span class="pl-k">ORDER BY</span> created_at <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)

<span class="pl-k">UNION ALL</span>

(<span class="pl-k">SELECT</span> DATE_TRUNC(<span class="pl-s"><span class="pl-pds">'</span>day<span class="pl-pds">'</span></span>, <span class="pl-c1">gpx_files</span>.<span class="pl-c1">timestamp</span>) <span class="pl-k">AS</span> day <span class="pl-k">FROM</span> gpx_files <span class="pl-k">WHERE</span> user_id <span class="pl-k">=</span> :user_id <span class="pl-k">ORDER BY</span> id <span class="pl-k">DESC</span> <span class="pl-k">LIMIT</span> :<span class="pl-k">limit</span>)</pre></div>

<p dir="auto">then we get a much better plan where we only read N items from each activity class before sorting the result and picking the top N overall items:</p>

<pre class="notranslate"><code class="notranslate">                                                                                  QUERY PLAN                                                                                  

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=25.51..25.53 rows=5 width=8)

   ->  Sort  (cost=25.51..25.59 rows=29 width=8)

         Sort Key: "*SELECT* 1_1".day DESC

         ->  Append  (cost=0.57..25.03 rows=29 width=8)

               ->  Subquery Scan on "*SELECT* 1_1"  (cost=0.57..2.68 rows=5 width=8)

                     ->  Limit  (cost=0.57..2.18 rows=5 width=16)

                           ->  Index Only Scan Backward using changesets_user_id_created_at_idx on changesets  (cost=0.57..1406.24 rows=4373 width=16)

                                 Index Cond: (user_id = 3980)

               ->  Subquery Scan on "*SELECT* 2"  (cost=0.42..5.42 rows=4 width=8)

                     ->  Limit  (cost=0.42..5.02 rows=4 width=16)

                           ->  Index Scan Backward using diary_entry_user_id_created_at_index on diary_entries  (cost=0.42..5.02 rows=4 width=16)

                                 Index Cond: (user_id = 3980)

                                 Filter: visible

               ->  Subquery Scan on "*SELECT* 3"  (cost=0.43..1.92 rows=5 width=8)

                     ->  Limit  (cost=0.43..1.42 rows=5 width=16)

                           ->  Index Only Scan Backward using index_changeset_comments_on_author_id_and_created_at on changeset_comments  (cost=0.43..19.49 rows=96 width=16)

                                 Index Cond: (author_id = 3980)

               ->  Subquery Scan on "*SELECT* 4"  (cost=0.43..2.74 rows=5 width=8)

                     ->  Limit  (cost=0.43..2.24 rows=5 width=16)

                           ->  Index Only Scan Backward using index_note_comments_on_author_id_and_created_at on note_comments  (cost=0.43..151.67 rows=418 width=16)

                                 Index Cond: (author_id = 3980)

               ->  Subquery Scan on "*SELECT* 5"  (cost=0.41..5.83 rows=5 width=8)

                     ->  Limit  (cost=0.41..5.33 rows=5 width=16)

                           ->  Index Scan Backward using diary_comment_user_id_created_at_index on diary_comments  (cost=0.41..263.85 rows=268 width=16)

                                 Index Cond: (user_id = 3980)

                                 Filter: visible

               ->  Subquery Scan on "*SELECT* 6"  (cost=0.43..4.99 rows=5 width=8)

                     ->  Limit  (cost=0.43..4.49 rows=5 width=16)

                           ->  Index Scan Backward using index_gpx_files_on_user_id_and_id on gpx_files  (cost=0.43..337.88 rows=416 width=16)

                                 Index Cond: (user_id = 3980)

(30 rows)

</code></pre>

<p dir="auto">The only real problem is that when you do that you can't support offset as you don't know what offset to start at in each table, but do we actually need offset? We're only interested in the most recent activity? If we do need it than the subqueries would each need to use offset+limit as their limit which is still better until offset gets too large at least.</p>

<p dir="auto">Note that CTEs are an alternative to inline subqueries here - the result should be the same.</p>


<hr>


<p>In <a href="https://github.com/openstreetmap/openstreetmap-website/pull/5761#discussion_r2001819539">lib/user_activities.rb</a>:</p>

<pre style='color:#555'>> +      SELECT DATE_TRUNC('day', diary_comments.created_at) AS day FROM diary_comments WHERE user_id = :user_id AND visible = true

+      UNION ALL

+      SELECT DATE_TRUNC('day', gpx_files.timestamp) AS day FROM gpx_files WHERE user_id = :user_id

+    SQL

+  end

+

+  # Fetch activities for specific days

+  private_class_method def self.fetch_activities(user_id, days)

+    [

+      fetch_changesets(user_id, days),

+      fetch_diary_entries(user_id, days),

+      fetch_changeset_comments(user_id, days),

+      fetch_note_comments(user_id, days),

+      fetch_diary_comments(user_id, days),

+      fetch_gpx_files(user_id, days)

+    ].flatten

</pre>

<p dir="auto">I'm not really sure why joining the relations returned by these functions with <code class="notranslate">union</code> or <code class="notranslate">union_all</code> is so hard? It would mean one database round trip instead of six?</p>


<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />Reply to this email directly, <a href="https://github.com/openstreetmap/openstreetmap-website/pull/5761#pullrequestreview-2695950194">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAK2OLIVV33D5KWNY3KAYVL2VBWJTAVCNFSM6AAAAABYQQXBLGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDMOJVHE2TAMJZGQ">unsubscribe</a>.<br />You are receiving this because you are subscribed to this thread.<img src="https://github.com/notifications/beacon/AAK2OLLSRONAWSI7SOM36K32VBWJTA5CNFSM6AAAAABYQQXBLGWGG33NNVSW45C7OR4XAZNRKB2WY3CSMVYXKZLTORJGK5TJMV32UY3PNVWWK3TUL5UWJTVAWDXXE.gif" height="1" width="1" alt="" /><span style="color: transparent; font-size: 0; display: none; visibility: hidden; overflow: hidden; opacity: 0; width: 0; height: 0; max-width: 0; max-height: 0; mso-hide: all">Message ID: <span><openstreetmap/openstreetmap-website/pull/5761/review/2695950194</span><span>@</span><span>github</span><span>.</span><span>com></span></span></p>

<script type="application/ld+json">[

{

"@context": "http://schema.org",

"@type": "EmailMessage",

"potentialAction": {

"@type": "ViewAction",

"target": "https://github.com/openstreetmap/openstreetmap-website/pull/5761#pullrequestreview-2695950194",

"url": "https://github.com/openstreetmap/openstreetmap-website/pull/5761#pullrequestreview-2695950194",

"name": "View Pull Request"

},

"description": "View this Pull Request on GitHub",

"publisher": {

"@type": "Organization",

"name": "GitHub",

"url": "https://github.com"

}

}

]</script>