How to use the right index to optimize a database to get rid of unwanted hash matches?

Question

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

Slightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so these hash matches would go away or be replaced for something like a Nested Loop.

This can help, please read: sqlshack.com/sql-server-execution-plan-operators-part-3 — sa-es-ir, Commented Jul 5 at 3:55
Rather than images, please share query plans via brentozar.com/pastetheplan. Please also show full tables and index definitions. — Charlieface, Commented Jul 5 at 4:19
GROUP BY C.categoryid, C.CategoryName might be a good idea, means the compiler can infer uniqueness over the Categories table. — Charlieface, Commented Jul 5 at 4:21
I don't have any specific advice for you here. But I would like to introduce you to one of my favorite games. It's called "what's that knob for?". It's played when some so-called "best practice" says that something's bad. But! if it were always bad, it wouldn't be in the product, right? So the game is determining under what circumstances that thing isn't bad. — Ben Thul, Commented 2 days ago

SQLpro · Accepted Answer · 2024-07-05 06:59:32Z

In your case the best index will be on a view...

CREATE VIEW V_STATS
WITH SCHEMABINDING
AS
SELECT C.categoryname
       ,COUNT_BIG(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS TotalAmountUSD
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;
GO

CREATE UNIQUE CLUSTERED INDEX XCV_STATS ON V_STATS (categoryname);
GO

test it !

Collectives™ on Stack Overflow

How to use the right index to optimize a database to get rid of unwanted hash matches?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
sql
sql-server
database
optimization
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged sqlsql-serverdatabaseoptimization or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
sql
sql-server
database
optimization
or ask your own question.