0

There are two tables one with the actual fact data(table1) and the other table(table2) with more like a tracker information on date. I am trying to get all the fact data from table1 where the date field is less than max(date) from table 2. Following is the query I am using

SELECT * FROM TABLE1 WHERE date < SELECT MAX(date) from TABLE2

When I check the execution plan, I see SQL server is trying to optimize the query by using an inner join instead of straightforward filter condition

Subquery in where condition

When I replace the same subquery with a value, the execution plan shows a straightforward filter

SELECT * FROM TABLE1 WHERE date < '2023-03-01'

value in where conditon

My understanding is that an inner join is a costly operation when compared with a filter. How can I optimize the where condition with a subquery to avoid inner join and run it before the main query and provide it as a value in the final where condition

5
  • 1
    Are the two queries comparable? They're filtering TABLE1 in opposite directions so are likely entirely different sets of rows. Commented Jun 13, 2023 at 3:48
  • It's true, the queries are different so that's the first thing to check. Anyway, as I understand it, using a static value (i.e. 2023-03-01) allows the query planner to work smarter, as opposed to using an unknown / runtime value SELECT MAX(date) from TABLE2 an inner join is a costly operation when compared with a filter That sounds right but I'm not sure if it's true. It's actually a very strange choice given that it doesn't actually have a list of dates to join to, it only has one. Are you absolutely certain the queries are functionally the same?
    – Nick.Mc
    Commented Jun 13, 2023 at 4:11
  • Apologies guys. It's a mistake I had different operators. I have corrected it now. Both functionally are returning the same output. It's just that I want to filter the data based on the value that table2 returns. Stored procedure is an option where I can store that in a variable, but for business reasons I cannot have that in a stored procedure as these queries are built on the fly when an API endpoint is called and the logic is common for multiple tables and implementing a stored procedure for one table can become a huge investment. Commented Jun 13, 2023 at 4:52
  • Please share the execution plan via brentozar.com/pastetheplan. Please also show tables and index definitions Commented Jun 13, 2023 at 10:17
  • This showplan is Synapse DW dedicated pools. You should specify that to not cause confusion - there is no broadcast operator in normal SQL Server. In general, unless you know a lot about query optimizers and execution engines, it is not recommended that you try to muck with the plan shapes. In this case, the order of the operations is just fine - you want to take a scalar operation and broadcast it to the other execution nodes for a DW gen2 dedicated plan. Commented Jun 13, 2023 at 13:57

1 Answer 1

0

My understanding is that an inner join is a costly operation when compared with a filter

This is categorically not true. Joins can often be efficient when used correctly.

The reason this query is slow is not because of the join. The join is only presented with a single outer row, whose single value is pushed down to the lower side as an outer reference.

The real reason is because the tables are not indexed properly (or at all). You probably need the following indexes for this to work well

CREATE CLUSTERED INDEX CX ON TABLE1 (date);
CREATE NONCLUSTERED INDEX IX ON TABLE2 (date);

Table1 needs to have a clustered index because you are querying all columns. You can obviously also use a non-clustered index with INCLUDE on all columns but that seems pointless. If you narrow down the columns you are querying then you may be able to get away with an NCI and INCLUDE.

Table2 does not need a clustered index for this query as you are only querying the one column. But it may make sense to make this a clustered index anyway.

Primary keys are by default a clustered index, so you may need to rebuild it.

Not the answer you're looking for? Browse other questions tagged or ask your own question.