0

I see that a full table scan is planned, however it's never executed and the UPDATE takes a long time anyway. Why??

Here's the EXPLAIN output

Update on public.hone_cohortuser  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=309.154..309.156 rows=0 loops=1)
  ->  Hash Join  (cost=3180.32..8951.51 rows=83498 width=564) (actual time=33.922..52.839 rows=42329 loops=1)
        Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, 'COMPLETED'::character varying(254), hone_cohortuser.ctid, u0.ctid
        Inner Unique: true
        Hash Cond: ((hone_cohortuser.cohort_id = u0.cohort_id) AND (hone_cohortuser.user_id = u0.user_id))
        ->  Seq Scan on public.hone_cohortuser  (cost=0.00..4309.98 rows=83498 width=42) (actual time=0.009..6.899 rows=83498 loops=1)
              Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, hone_cohortuser.ctid
        ->  Hash  (cost=2792.57..2792.57 rows=25850 width=14) (actual time=32.784..32.785 rows=47630 loops=1)
              Output: u0.ctid, u0.cohort_id, u0.user_id
              Buckets: 65536 (originally 32768)  Batches: 1 (originally 1)  Memory Usage: 2745kB
              ->  HashAggregate  (cost=2534.07..2792.57 rows=25850 width=14) (actual time=24.829..28.675 rows=47645 loops=1)
                    Output: u0.ctid, u0.cohort_id, u0.user_id
                    Group Key: u0.cohort_id, u0.user_id
                    Batches: 1  Memory Usage: 3857kB
                    ->  Seq Scan on public.hone_programparticipant u0  (cost=0.00..2295.03 rows=47808 width=14) (actual time=0.006..14.322 rows=48036 loops=1)
                          Output: u0.ctid, u0.cohort_id, u0.user_id
                          Filter: ((u0.learner_group_status)::text = 'COMPLETED'::text)
                          Rows Removed by Filter: 41086
Planning Time: 0.768 ms
Execution Time: 309.481 ms

Here's the query:

UPDATE
  "hone_cohortuser"
SET
  "learner_program_status" = 'COMPLETED'
WHERE
  EXISTS(
    SELECT
      1 AS "a"
    FROM
      "hone_programparticipant" U0
    WHERE
      (
        U0."cohort_id" = ("hone_cohortuser"."cohort_id")
        AND U0."learner_group_status" = 'COMPLETED'
        AND U0."user_id" = ("hone_cohortuser"."user_id")
      )
    LIMIT
      1
  )

What I ended up doing to optimize it was to execute a SELECT id... first and then run the UPDATE on putting the IDs directly in the WHERE clause. It reduced the total time to ~10% of the previous benchmark when no updates are needed.

*If anything about the data model feels off to you, it is. This query translates a new data model into a legacy table for backward compatibility.

4
  • 4
    According to the execution plan, 42329 rows are updated. 300 milliseconds is not bad for that. In case you wonder why the top node has "actual rows" = 0, that is because an UPDATE does not produce any results unless you use a RETURNING clause. Commented Apr 9 at 0:01
  • What indexes do you have? Commented Apr 9 at 0:09
  • I don't see any 'never executed' seq scans there.
    – jjanes
    Commented Apr 9 at 0:33
  • Thanks @LaurenzAlbe! Shortly after posting I realized that all these rows were actually being updated but seeing the "actual rows"=0 definitely tripped me.
    – Marcos
    Commented Apr 9 at 1:11

1 Answer 1

1

After running some more tests I realized the UPDATE is actually being executed on 40k+ rows, as pointed out by @Laurenz Albe.

The straightforward fix to the query is to add the "learner_program_status" != 'COMPLETED' condition to the WHERE clause since the redundant overhead was being caused by rows where "'COMPLETED' was being updated to 'COMPLETED'"

1

Not the answer you're looking for? Browse other questions tagged or ask your own question.