Why is this Postgresql UPDATE statement so slow even when it doesn't update any rows? [closed]

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 3 months ago.

I see that a full table scan is planned, however it's never executed and the UPDATE takes a long time anyway. Why??

Here's the EXPLAIN output

Update on public.hone_cohortuser  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=309.154..309.156 rows=0 loops=1)
  ->  Hash Join  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=33.922..52.839 rows=42329 loops=1)
        Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, 'COMPLETED'::character varying(254), hone_cohortuser.ctid, u0.ctid
        Inner Unique: true
        Hash Cond: ((hone_cohortuser.cohort_id = u0.cohort_id) AND (hone_cohortuser.user_id = u0.user_id))
        ->  Seq Scan on public.hone_cohortuser  (cost=0.00..4309.98 rows=83498 width=42) (actual time=0.009..6.899 rows=83498 loops=1)
              Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, hone_cohortuser.ctid
        ->  Hash  (cost=2792.57..2792.57 rows=25850 width=14) (actual time=32.784..32.785 rows=47630 loops=1)
              Output: u0.ctid, u0.cohort_id, u0.user_id
              Buckets: 65536 (originally 32768)  Batches: 1 (originally 1)  Memory Usage: 2745kB
              ->  HashAggregate  (cost=2534.07..2792.57 rows=25850 width=14) (actual time=24.829..28.675 rows=47645 loops=1)
                    Output: u0.ctid, u0.cohort_id, u0.user_id
                    Group Key: u0.cohort_id, u0.user_id
                    Batches: 1  Memory Usage: 3857kB
                    ->  Seq Scan on public.hone_programparticipant u0  (cost=0.00..2295.03 rows=47808 width=14) (actual time=0.006..14.322 rows=48036 loops=1)
                          Output: u0.ctid, u0.cohort_id, u0.user_id
                          Filter: ((u0.learner_group_status)::text = 'COMPLETED'::text)
                          Rows Removed by Filter: 41086
Planning Time: 0.768 ms
Execution Time: 309.481 ms

Here's the query:

UPDATE
  "hone_cohortuser"
SET
  "learner_program_status" = 'COMPLETED'
WHERE
  EXISTS(
    SELECT
      1 AS "a"
    FROM
      "hone_programparticipant" U0
    WHERE
      (
        U0."cohort_id" = ("hone_cohortuser"."cohort_id")
        AND U0."learner_group_status" = 'COMPLETED'
        AND U0."user_id" = ("hone_cohortuser"."user_id")
      )
    LIMIT
      1
  )

What I ended up doing to optimize it was to execute a SELECT id... first and then run the UPDATE on putting the IDs directly in the WHERE clause. It reduced the total time to ~10% of the previous benchmark when no updates are needed.

*If anything about the data model feels off to you, it is. This query translates a new data model into a legacy table for backward compatibility.

According to the execution plan, 42329 rows are updated. 300 milliseconds is not bad for that. In case you wonder why the top node has "actual rows" = 0, that is because an UPDATE does not produce any results unless you use a RETURNING clause. — Laurenz Albe, Commented Apr 9 at 0:01
Thanks @LaurenzAlbe! Shortly after posting I realized that all these rows were actually being updated but seeing the "actual rows"=0 definitely tripped me. — Marcos, Commented Apr 9 at 1:11

Marcos · Accepted Answer · 2024-04-09 01:18:06Z

1

After running some more tests I realized the UPDATE is actually being executed on 40k+ rows, as pointed out by @Laurenz Albe.

The straightforward fix to the query is to add the "learner_program_status" != 'COMPLETED' condition to the WHERE clause since the redundant overhead was being caused by rows where "'COMPLETED' was being updated to 'COMPLETED'"

answered Apr 9 at 1:18

Marcos

1,33512 silver badges18 bronze badges

1

See: stackoverflow.com/a/12632129/939860
– Erwin Brandstetter
Commented Apr 9 at 2:31

Add a comment |

Collectives™ on Stack Overflow

Why is this Postgresql UPDATE statement so slow even when it doesn't update any rows? [closed]

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
performance
query-optimization
sql-execution-plan
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged sqlpostgresqlperformancequery-optimizationsql-execution-plan or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
sql
postgresql
performance
query-optimization
sql-execution-plan
or ask your own question.