Revisions to How to use the right index to optimize a database to get rid of unwanted hash matches?

edited tags

Link

edited Jul 5 at 13:14

The Impaler

48.1k
10
46
82

added 1 character in body

Source Link

edited Jul 5 at 3:32

Iker-JK

1
2

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

Slightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so thisthese hash matches would go away or be replaced for something like a Nested Loop.

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

Slightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so this hash matches would go away or be replaced for something like a Nested Loop.

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

Slightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so these hash matches would go away or be replaced for something like a Nested Loop.

added 5 characters in body

Source Link

edited Jul 5 at 3:31

Iker-JK

1
2

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

enter image description hereSlightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so this hash matches would go away or be replaced for something like a Nested Loop.

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

enter image description here

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so this hash matches would go away or be replaced for something like a Nested Loop.

lately we have been learning about database managind through microsoft sql server management studio and were learning of how to use the index function to optimize a consult and data management, and a part of this is that we should keep hash matches and Table Scans to a minimum. While doing so I came in contanct with a consult which I was able to get rid of various Table Scans but was unable to figure out how to get rid of these hash matches.

For context we were working with 4 tables from an example database:

[Sales].[OrderDetails]
[Sales].[Orders]
[Production].[Products]
[Production].[Categories]

And had to show The category name, the amount of products sold and the amount of money exchanged.

In the image you can find the current state of the consult

Slightly optimized execution plan

For this I did the next consult:

SELECT C.categoryname
       ,COUNT(*) AS QTY
       ,SUM(OD.unitprice * OD.qty) AS 'TotalAmountUSD'
FROM Sales.OrderDetails AS OD
    JOIN Sales.Orders O 
        ON (OD.orderid = O.orderid)
    JOIN Production.Products P 
        ON (OD.productid = P.productid)
    JOIN Production.Categories C 
        ON (P.categoryid = C.categoryid)
WHERE O.orderdate BETWEEN CAST('2015-01-01' AS DATE) AND CAST('2015-12-31' AS DATE)
GROUP BY C.CategoryName;

Which worked fine. And I was able to optimize it so Table Scans didn't appear with the next Indexes:

CREATE INDEX IDX_OrderDetails_AllColumns ON Sales.OrderDetails (orderid, productid, unitprice, qty);
CREATE INDEX IDX_Categories_AllColumns ON Production.Categories (categoryid, categoryname);
CREATE INDEX IDX_Products_AllColumns ON Production.Products (productid, categoryid);
CREATE INDEX IDX_Orders_OrderDate_OrderId ON Sales.Orders (orderdate) INCLUDE (orderid);

Then with the hash tables I tried by using indexes of the individual values:

CREATE INDEX IDX_Orders_OrderDate ON Sales.Orders(orderdate);
CREATE INDEX IDX_OrderDetails_OrderId ON Sales.OrderDetails(orderid);
CREATE INDEX IDX_Products_CategoryId ON Production.Products(categoryid);
CREATE INDEX IDX_Categories_CategoryName ON Production.Categories(categoryname);

And indexes of links between the joins and their uses:

CREATE INDEX IDX_OrderDetails_OrderID_UnitPrice_QTY ON Sales.OrderDetails(orderid, unitprice, qty);
CREATE INDEX IDX_OrdersDetails_OrderID_ProductID_UnitPrice_QTY ON Sales.OrderDetails(orderid, productid, unitprice, qty);

I expected to find the point where you apply an index so this hash matches would go away or be replaced for something like a Nested Loop.

Source Link

asked Jul 5 at 3:22

Iker-JK

1
2

Loading

Collectives™ on Stack Overflow

Return to Question