I have this query I'm running in MySQL:
SELECT
count(*)
FROM
library AS l
JOIN plays AS p ON p.user_id = l.user_id AND
l.path = p.path
WHERE
l.user_id = 20977 AND
p.time >= '2022-10-17';
When EXPLAIN ANALYZE is run:
| -> Aggregate: count(0) (cost=1085653.55 rows=6692) (actual time=12576.265..12576.266 rows=1 loops=1)
-> Nested loop inner join (cost=1084984.37 rows=6692) (actual time=40.604..12566.569 rows=56757 loops=1)
-> Index lookup on l using user_id_2 (user_id=20977) (cost=116747.95 rows=106784) (actual time=13.153..3783.204 rows=59631 loops=1)
-> Filter: ((p.user_id = 20977) and (p.`time` >= TIMESTAMP'2022-10-17 00:00:00')) (cost=8.24 rows=0) (actual time=0.135..0.147 rows=1 loops=59631)
-> Index lookup on p using path (path=l.`path`) (cost=8.24 rows=8) (actual time=0.090..0.146 rows=1 loops=59631)
|
1 row in set (12.76 sec)
I obviously want to make this faster!
Table definitions
CREATE TABLE `library` (
`user_id` int NOT NULL,
`name` varchar(20) COLLATE utf8mb4_general_ci NOT NULL,
`path` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
`title` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`db_id` int NOT NULL,
`tag` varchar(64) COLLATE utf8mb4_general_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
CREATE TABLE `plays` (
`user_id` int DEFAULT NULL,
`name` varchar(20) CHARACTER SET utf8 DEFAULT NULL,
`path` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
`time` datetime DEFAULT CURRENT_TIMESTAMP,
`play_id` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
ALTER TABLE `library`
ADD PRIMARY KEY (`db_id`),
ADD KEY `user_id_loc` (`user_id`,`name`,`path`(191)),
ADD KEY `edited` (`edited`),
ADD KEY `created` (`created`),
ADD KEY `title` (`title`),
ADD KEY `user_id` (`user_id`),
ADD INDEX `user_id_by_title` (`user_id`, `title`);
ALTER TABLE `plays`
ADD PRIMARY KEY (`play_id`),
ADD KEY `user_id` (`user_id`,`name`,`path`(255)),
ADD KEY `user_id_2` (`user_id`,`name`),
ADD KEY `time` (`time`),
ADD KEY `path` (`path`),
ADD KEY `user_id_3` (`user_id`,`name`,`path`,`time`);
It looks like the killer is the looping over 59631 rows.
Would an index on (user_id, time)
make it faster?
Interestingly, the user_id_2
index is actually an index on (user_id, title)
, rather than the plain user_id
index. I'm not sure why user_id_2
is chosen given title
isn't used in the query.
ERROR 1072 (42000): Key column 'id' doesn't exist in table library
). I can't run it to try to reproduce the issue. This makes me doubt if any of the code is relevant to your query. Please post the actual code you're having trouble with. Or better yet, make a dbfiddle.id
. However I posted a previous question on SO where a lot of feedback was about column naming, so I tried to find and replace the column name, but I missed one out. I'll fix that so it's alluser_id
now.plays
table should only be brought back havingp.time >= '2022-10-17'
. Do you mean just the fact it should not be aLEFT JOIN
because of the implicit criteria? Agree - but I was just trying to use the query I have - if this is the cause of the slowdown - great!LEFT
inLEFT JOIN
shouldn't be there. It misguides your analysis. From the optimizer's perspective there are big differences between an outer join (less options to optimize) and an inner join (more avenues for optimization).