I see that a full table scan is planned, however it's never executed and the UPDATE takes a long time anyway. Why??
Here's the EXPLAIN output
Update on public.hone_cohortuser (cost=3180.32..8951.51 rows=83498 width=564) (actual time=309.154..309.156 rows=0 loops=1)
-> Hash Join (cost=3180.32..8951.51 rows=83498 width=564) (actual time=33.922..52.839 rows=42329 loops=1)
Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, 'COMPLETED'::character varying(254), hone_cohortuser.ctid, u0.ctid
Inner Unique: true
Hash Cond: ((hone_cohortuser.cohort_id = u0.cohort_id) AND (hone_cohortuser.user_id = u0.user_id))
-> Seq Scan on public.hone_cohortuser (cost=0.00..4309.98 rows=83498 width=42) (actual time=0.009..6.899 rows=83498 loops=1)
Output: hone_cohortuser.id, hone_cohortuser.created_at, hone_cohortuser.updated_at, hone_cohortuser.cohort_id, hone_cohortuser.user_id, hone_cohortuser.onboarding_completed_at_datetime, hone_cohortuser.ctid
-> Hash (cost=2792.57..2792.57 rows=25850 width=14) (actual time=32.784..32.785 rows=47630 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Buckets: 65536 (originally 32768) Batches: 1 (originally 1) Memory Usage: 2745kB
-> HashAggregate (cost=2534.07..2792.57 rows=25850 width=14) (actual time=24.829..28.675 rows=47645 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Group Key: u0.cohort_id, u0.user_id
Batches: 1 Memory Usage: 3857kB
-> Seq Scan on public.hone_programparticipant u0 (cost=0.00..2295.03 rows=47808 width=14) (actual time=0.006..14.322 rows=48036 loops=1)
Output: u0.ctid, u0.cohort_id, u0.user_id
Filter: ((u0.learner_group_status)::text = 'COMPLETED'::text)
Rows Removed by Filter: 41086
Planning Time: 0.768 ms
Execution Time: 309.481 ms
Here's the query:
UPDATE
"hone_cohortuser"
SET
"learner_program_status" = 'COMPLETED'
WHERE
EXISTS(
SELECT
1 AS "a"
FROM
"hone_programparticipant" U0
WHERE
(
U0."cohort_id" = ("hone_cohortuser"."cohort_id")
AND U0."learner_group_status" = 'COMPLETED'
AND U0."user_id" = ("hone_cohortuser"."user_id")
)
LIMIT
1
)
What I ended up doing to optimize it was to execute a SELECT id...
first and then run the UPDATE
on putting the IDs directly in the WHERE clause. It reduced the total time to ~10% of the previous benchmark when no updates are needed.
*If anything about the data model feels off to you, it is. This query translates a new data model into a legacy table for backward compatibility.
UPDATE
does not produce any results unless you use aRETURNING
clause.