I was asked for a demonstration of how searching on an indexed column is faster than searching for a string prefix, so I created a quick test but the results were surprising and I can't see why.
The database consists of one table (Products
) with productName
and Brand
columns and a few other columns, just to bulk up the data with an index on Brand
:
CREATE TABLE [dbo].[Products]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[ProductName] [nvarchar](500) NOT NULL,
[Brand] [nvarchar](100) NOT NULL,
[Field1] [nvarchar](50) NULL,
[Field2] [nvarchar](50) NULL,
[Field3] [nvarchar](50) NULL,
[Field4] [nvarchar](50) NULL,
[Field5] [nvarchar](50) NULL,
Ix_Brands index(Brand),
CONSTRAINT [PK_Products]
PRIMARY KEY CLUSTERED ([ID] ASC)
)
I then get the products by brand using 4 different methods and time how long each one takes.
using System;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
namespace SpeedTest
{
internal class Program
{
static void Main(string[] args)
{
var connectionString = "data source=.<Redacted>";
string[] Brands = new string[] { "Tesco", "Asda", "Boots", "Morrisons", "Amazon", "Ebay" };
var rnd = new Random();
DateTime startTime;
using (var con = new SqlConnection(connectionString))
{
var cmd = new SqlCommand("delete from products", con);
con.Open();
cmd.ExecuteNonQuery();
Console.WriteLine("Creating 100,000 products");
for (int i = 0; i < 100000; i++)
{
var brand = Brands[rnd.Next(Brands.Length)];
cmd.CommandText = $"insert into products(productName, brand, field1, field2, field3, field4, field5) values ('{brand}_{Guid.NewGuid()}', '{brand}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}', '{Guid.NewGuid()}')";
cmd.ExecuteNonQuery();
}
Console.WriteLine("Getting products by brand via ADO and product name prefix");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
cmd.CommandText = $"select * from products where productName like '{brand}_%'";
var da = new SqlDataAdapter(cmd);
var dt = new DataTable();
da.Fill(dt);
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.WriteLine("Getting products by brand via ADO and indexed brand column");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
cmd.CommandText = $"select * from products where brand='{brand}'";
var da = new SqlDataAdapter(cmd);
var dt = new DataTable();
da.Fill(dt);
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
con.Close();
}
var db = new SpeedTestEntities();
Console.WriteLine("Getting products by brand via entity framework and product name prefix");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
var products = db.Products.Where(p => p.ProductName.StartsWith(brand + "_")).ToList();
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.WriteLine("Getting products by brand via entity framework and indexed brand column");
startTime = DateTime.Now;
foreach (var brand in Brands)
{
var products = db.Products.Where(p => p.Brand.Equals(brand, StringComparison.OrdinalIgnoreCase)).ToList();
}
Console.WriteLine($"Time taken: {(DateTime.Now - startTime).TotalMilliseconds}ms");
Console.ReadLine();
}
}
}
The Entity Framework results were about what I was expecting, but the ADO results showed searching on the indexed column was slower than on the product name prefix which surely can't be right:
Creating 100,000 products
Getting products by brand via ADO and product name prefix
Time taken: 558.9306ms
Getting products by brand via ADO and indexed brand column
Time taken: 642.5258ms
Getting products by brand via entity framework and product name prefix
Time taken: 3266.8438ms
Getting products by brand via entity framework and indexed brand column
Time taken: 204.932ms
I must have messed up somewhere but I can't see where. My demonstration of why we should be adding indexed columns to our database tables rather than prefixing other strings is going badly.
Can anyone rescue me and see what's going on here?
EDIT: it turns out that both ADO queries are doing an index scan on PK_Products
. Both execution plans are the same. This surprises me and I thought adding an indexed column would certainly be faster but apparently not.
select *
so there's gonna be a lot of key lookups when doing brand index, especially if it's not unique, sql server might decide to use the PK for all your searches. Also, it should be N'brand', or even better you should parametrize properly if not already doing it.Stopwatch
, it is much more accurate than DateTime.