The Rambling DBA: Creating an Indexed View with a Self-Join (Kinda)

The following is based on a post on the MSDN Forums regarding building an indexed view in SQL Server.

Indexed Views are another tool in the toolset for squeezing performance out of SQL Server. When applied correctly they can be very powerful, but they have so many limitations, that they really are very difficult to use. The poster on the above thread hit one of these limitations that really seems kind of nonsensical when you are looking at the code, that being that a Indexed View can't have a self-join in its definition. If you would like to know all about indexed views and how they work, as well as some examples you can play with in AdventureWorks take a look at the following whitepaper on MSDN:

Improving Performance with SQL Server 2005 Indexed Views ...

First to reproduce the problem we'll need to create some tables and data. Don't worry, these will be reusable for this whole exercise:

SET NOCOUNT ON; GO USE tempdb GO SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON SET CONCAT_NULL_YIELDS_NULL ON SET NUMERIC_ROUNDABORT OFF SET QUOTED_IDENTIFIER ON SET ARITHABORT ON GO CREATE TABLE dbo.dimColor (ColorID INT IDENTITY PRIMARY KEY, ColorName VARCHAR(10)) GO INSERT INTO dbo.dimColor VALUES ('red') INSERT INTO dbo.dimColor VALUES ('blue') INSERT INTO dbo.dimColor VALUES ('black') INSERT INTO dbo.dimColor VALUES ('silver') GO CREATE TABLE dbo.dimManufacturer (ManufacturerID INT IDENTITY PRIMARY KEY, ManufacturerName VARCHAR(10)) GO INSERT INTO dbo.dimManufacturer VALUES ('Ford') INSERT INTO dbo.dimManufacturer VALUES ('Chevrolet') GO CREATE TABLE dbo.Cars (CarID INT IDENTITY PRIMARY KEY, ManufacturerID INT, FirstColorID INT, SecondColorID INT) GO INSERT INTO dbo.Cars VALUES (1, 1, 1) INSERT INTO dbo.Cars VALUES (1, 1, 2) INSERT INTO dbo.Cars VALUES (1, 1, 3) INSERT INTO dbo.Cars VALUES (1, 1, 4) INSERT INTO dbo.Cars VALUES (1, 2, 1) INSERT INTO dbo.Cars VALUES (1, 2, 2) INSERT INTO dbo.Cars VALUES (1, 2, 3) INSERT INTO dbo.Cars VALUES (1, 2, 4) INSERT INTO dbo.Cars VALUES (1, 3, 1) INSERT INTO dbo.Cars VALUES (1, 3, 2) INSERT INTO dbo.Cars VALUES (1, 3, 3) INSERT INTO dbo.Cars VALUES (1, 3, 4) INSERT INTO dbo.Cars VALUES (1, 4, 1) INSERT INTO dbo.Cars VALUES (1, 4, 2) INSERT INTO dbo.Cars VALUES (1, 4, 3) INSERT INTO dbo.Cars VALUES (1, 4, 4) INSERT INTO dbo.Cars VALUES (2, 1, 1) INSERT INTO dbo.Cars VALUES (2, 1, 2) INSERT INTO dbo.Cars VALUES (2, 1, 3) INSERT INTO dbo.Cars VALUES (2, 1, 4) INSERT INTO dbo.Cars VALUES (2, 2, 1) INSERT INTO dbo.Cars VALUES (2, 2, 2) INSERT INTO dbo.Cars VALUES (2, 2, 3) INSERT INTO dbo.Cars VALUES (2, 2, 4) INSERT INTO dbo.Cars VALUES (2, 3, 1) INSERT INTO dbo.Cars VALUES (2, 3, 2) INSERT INTO dbo.Cars VALUES (2, 3, 3) INSERT INTO dbo.Cars VALUES (2, 3, 4) INSERT INTO dbo.Cars VALUES (2, 4, 1) INSERT INTO dbo.Cars VALUES (2, 4, 2) INSERT INTO dbo.Cars VALUES (2, 4, 3) INSERT INTO dbo.Cars VALUES (2, 4, 4) GO

To reproduce the original error reported in the forums post, we'll first create a view that has the "self-join" in its definition:

CREATE VIEW dbo.CarDetails WITH SCHEMABINDING AS SELECT c.CarID, m.ManufacturerName, pc.ColorName AS [FirstColor], sc.ColorName AS [SecondColor] FROM dbo.Cars c JOIN dbo.dimManufacturer m ON c.ManufacturerID = m.ManufacturerID INNER JOIN dbo.dimColor pc ON c.FirstColorID = pc.ColorID INNER JOIN dbo.dimColor sc ON c.SecondColorID = sc.ColorID GO CREATE UNIQUE CLUSTERED INDEX CarDetails_CarID ON dbo.CarDetails (CarID) GO

If you try to create this view, you will get an exception like the following

Msg 1947, Level 16, State 1, Line 1 Cannot create index on view "tempdb.dbo.CarDetails". The view contains a self join on "tempdb.dbo.dimColor".

So what exactly is it complaining about, after all there isn't a self join in the normal way that we would think of it? Well the self join statement is somewhat ambiguous, as is the self join limitation as listed in the whitepaper on MSDN. You actually can't join the same table two times in a indexed view, even if it is through another table. So what do you do if you have an actual structure like the one posted above where you have two foreign key columns to the same table and you need the double join to actually make sense of the data?

I personally was confused by the error, so I built the above example and posted it to the MVP private groups to see if someone else could help shed some light onto the problem. Assistance and a solution were provided by Steve Kass and Aaron Bertrand, and the following example will show you how to code around this limitation.

CREATE TABLE dbo.Two (i INT) INSERT INTO dbo.Two VALUES (1) INSERT INTO dbo.Two VALUES (2) GO CREATE VIEW dbo.CarDetails_Imed WITH SCHEMABINDING AS SELECT c.CarID, m.ManufacturerName, CASE WHEN i = 1 THEN 'FirstColor' ELSE 'SecondColor' END AS whichColor, CASE WHEN i = 1 AND c.FirstColorID = pc.ColorID THEN pc.ColorName WHEN i = 2 AND c.SecondColorID = pc.ColorID THEN pc.ColorName END AS [Color] FROM dbo.Cars c JOIN dbo.dimManufacturer m ON c.ManufacturerID = m.ManufacturerID CROSS JOIN dbo.Two INNER JOIN dbo.dimColor pc ON (c.FirstColorID = pc.ColorID OR c.SecondColorID = pc.ColorID) WHERE CASE WHEN i = 1 AND c.FirstColorID = pc.ColorID THEN pc.ColorName WHEN i = 2 AND c.SecondColorID = pc.ColorID THEN pc.ColorName END IS NOT NULL GO CREATE UNIQUE CLUSTERED INDEX CarDetailsImed_CarID ON dbo.CarDetails_Imed (CarID, ManufacturerName, whichColor, Color) GO CREATE VIEW dbo.CarDetails_Final WITH SCHEMABINDING AS SELECT CarID, ManufacturerName, [FirstColor], [SecondColor] FROM ( SELECT CarID, ManufacturerName, whichColor, Color FROM dbo.CarDetails_Imed WITH (NOEXPAND) ) src PIVOT ( MAX(Color) FOR whichColor IN ([FirstColor], [SecondColor]) ) pvt GO

The first thing Steve did was add a new table called dbo.Two into the mix. This table has, you guessed it, two rows holding values 1 and 2 respectively. Then he rewrote the query to use a cross join, so the output is going to be double the size in rows. A slight error in code was corrected by Aaron initially, and then Steve provided the fix as well with the idea that you could pivot the data to get the output to match that of the original view since the cross join is going to be a normalized return instead of a denormalized output as in the original view.

This view can now be indexed and used as an intermediate view to the actual results view which will pivot the data to provide the formatted output that would come from the original view that could not be indexed. One nice thing is that you can put the table hint WITH(NOEXPAND) directly into the output view DDL definition so that queries don't have to use this option explicitly. It has already been defined in the view itself. Since the intermediate view is now indexable, you can also create any needed covering indexes to satisfy the queries against the output view and assist with the pivot.

I know this is a big hack to make something work, but it is one way to solve the problem. One final note is that since this uses the PIVOT operator it is not possible as coded above in SQL Server 2000. However, you can write a 2000 compliant pivot query that would still make use of the indexed intermediate view.

The Rambling DBA

Wednesday, December 3, 2008

Creating an Indexed View with a Self-Join (Kinda)

Improving Performance with SQL Server 2005 Indexed Views ...

No comments:

Post a Comment

Blog Archive

Tagged Posts

My Favorite Blogs

About Me