lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.
For context we were working with 4 tables from an example database:
[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]
And had to show The category name, the amount of products sold and the amount of money exchanged.
In the image you can find the current state of the consult
Slightly optimized execution plan
For this I did the next consult:
SELECT C.categoryname
,COUNT(*) AS QTY
,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
JOIN Sales.Orders O
ON (OD.orderid = O.orderid)
JOIN Production.Products P
ON (OD.productid = P.productid)
JOIN Production.Categories C
ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;
Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:
CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);
Then with the hash tables I tried by using indexes of the individual values:
CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);
And indexes of links between the joins and their uses:
CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);
I expected to find the point where you apply an index so these hash matches would go away or be replaced for something like a Nested Loop.
GROUP BY C.categoryid, C.CategoryName
might be a good idea, means the compiler can infer uniqueness over theCategories
table.