Wednesday, December 3, 2008

Creating an Indexed View with a Self-Join (Kinda)

The following is based on a post on the MSDN Forums regarding building an indexed view in SQL Server.

Indexed Views are another tool in the toolset for squeezing performance out of SQL Server.  When applied correctly they can be very powerful, but they have so many limitations, that they really are very difficult to use.  The poster on the above thread hit one of these limitations that really seems kind of nonsensical when you are looking at the code, that being that a Indexed View can't have a self-join in its definition.  If you would like to know all about indexed views and how they work, as well as some examples you can play with in AdventureWorks take a look at the following whitepaper on MSDN:

Improving Performance with SQL Server 2005 Indexed Views ...

First to reproduce the problem we'll need to create some tables and data.  Don't worry, these will be reusable for this whole exercise:

SET NOCOUNT ON;
GO
USE tempdb
GO
SET ANSI_NULLS ON
SET
ANSI_PADDING ON
SET
ANSI_WARNINGS ON
SET
CONCAT_NULL_YIELDS_NULL ON
SET
NUMERIC_ROUNDABORT OFF
SET
QUOTED_IDENTIFIER ON
SET
ARITHABORT ON
GO
CREATE TABLE dbo.dimColor (ColorID INT IDENTITY PRIMARY KEY, ColorName VARCHAR(10))
GO
INSERT INTO dbo.dimColor VALUES ('red')
INSERT INTO dbo.dimColor VALUES ('blue')
INSERT INTO dbo.dimColor VALUES ('black')
INSERT INTO dbo.dimColor VALUES ('silver')
GO
CREATE TABLE dbo.dimManufacturer (ManufacturerID INT IDENTITY PRIMARY KEY, ManufacturerName VARCHAR(10))
GO
INSERT INTO dbo.dimManufacturer VALUES ('Ford')
INSERT INTO dbo.dimManufacturer VALUES ('Chevrolet')
GO
CREATE TABLE dbo.Cars (CarID INT IDENTITY PRIMARY KEY, ManufacturerID INT, FirstColorID INT, SecondColorID INT)
GO
INSERT INTO dbo.Cars VALUES (1, 1, 1)
INSERT INTO dbo.Cars VALUES (1, 1, 2)
INSERT INTO dbo.Cars VALUES (1, 1, 3)
INSERT INTO dbo.Cars VALUES (1, 1, 4)
INSERT INTO dbo.Cars VALUES (1, 2, 1)
INSERT INTO dbo.Cars VALUES (1, 2, 2)
INSERT INTO dbo.Cars VALUES (1, 2, 3)
INSERT INTO dbo.Cars VALUES (1, 2, 4)
INSERT INTO dbo.Cars VALUES (1, 3, 1)
INSERT INTO dbo.Cars VALUES (1, 3, 2)
INSERT INTO dbo.Cars VALUES (1, 3, 3)
INSERT INTO dbo.Cars VALUES (1, 3, 4)
INSERT INTO dbo.Cars VALUES (1, 4, 1)
INSERT INTO dbo.Cars VALUES (1, 4, 2)
INSERT INTO dbo.Cars VALUES (1, 4, 3)
INSERT INTO dbo.Cars VALUES (1, 4, 4)
INSERT INTO dbo.Cars VALUES (2, 1, 1)
INSERT INTO dbo.Cars VALUES (2, 1, 2)
INSERT INTO dbo.Cars VALUES (2, 1, 3)
INSERT INTO dbo.Cars VALUES (2, 1, 4)
INSERT INTO dbo.Cars VALUES (2, 2, 1)
INSERT INTO dbo.Cars VALUES (2, 2, 2)
INSERT INTO dbo.Cars VALUES (2, 2, 3)
INSERT INTO dbo.Cars VALUES (2, 2, 4)
INSERT INTO dbo.Cars VALUES (2, 3, 1)
INSERT INTO dbo.Cars VALUES (2, 3, 2)
INSERT INTO dbo.Cars VALUES (2, 3, 3)
INSERT INTO dbo.Cars VALUES (2, 3, 4)
INSERT INTO dbo.Cars VALUES (2, 4, 1)
INSERT INTO dbo.Cars VALUES (2, 4, 2)
INSERT INTO dbo.Cars VALUES (2, 4, 3)
INSERT INTO dbo.Cars VALUES (2, 4, 4)
GO

To reproduce the original error reported in the forums post, we'll first create a view that has the "self-join" in its definition:

CREATE VIEW dbo.CarDetails
WITH SCHEMABINDING
AS
SELECT
c.CarID,
 
m.ManufacturerName,
 
pc.ColorName AS [FirstColor],
 
sc.ColorName AS [SecondColor]
FROM dbo.Cars c
JOIN dbo.dimManufacturer m ON c.ManufacturerID = m.ManufacturerID
INNER JOIN dbo.dimColor pc ON c.FirstColorID = pc.ColorID
INNER JOIN dbo.dimColor sc ON c.SecondColorID = sc.ColorID
GO
CREATE UNIQUE CLUSTERED INDEX CarDetails_CarID ON dbo.CarDetails (CarID)
GO

If you try to create this view, you will get an exception like the following

Msg 1947, Level 16, State 1, Line 1 Cannot create index on view "tempdb.dbo.CarDetails". The view contains a self join on "tempdb.dbo.dimColor".

So what exactly is it complaining about, after all there isn't a self join in the normal way that we would think of it? Well the self join statement is somewhat ambiguous, as is the self join limitation as listed in the whitepaper on MSDN. You actually can't join the same table two times in a indexed view, even if it is through another table. So what do you do if you have an actual structure like the one posted above where you have two foreign key columns to the same table and you need the double join to actually make sense of the data?

I personally was confused by the error, so I built the above example and posted it to the MVP private groups to see if someone else could help shed some light onto the problem.  Assistance and a solution were provided by Steve Kass and Aaron Bertrand, and the following example will show you how to code around this limitation.

CREATE TABLE dbo.Two (i INT)
INSERT INTO dbo.Two VALUES (1)
INSERT INTO dbo.Two VALUES (2)
GO
CREATE VIEW dbo.CarDetails_Imed
WITH SCHEMABINDING
AS
SELECT
  
c.CarID,
  
m.ManufacturerName,
  
CASE WHEN i = 1 THEN 'FirstColor' ELSE 'SecondColor' END AS whichColor,
  
CASE WHEN i = 1 AND c.FirstColorID = pc.ColorID THEN pc.ColorName
       
WHEN i = 2 AND c.SecondColorID = pc.ColorID THEN pc.ColorName END AS [Color]
FROM dbo.Cars c
JOIN dbo.dimManufacturer m ON c.ManufacturerID = m.ManufacturerID
CROSS JOIN dbo.Two
INNER JOIN dbo.dimColor pc ON (c.FirstColorID = pc.ColorID OR c.SecondColorID = pc.ColorID)
WHERE
  
CASE WHEN i = 1 AND c.FirstColorID = pc.ColorID THEN pc.ColorName
       
WHEN i = 2 AND c.SecondColorID = pc.ColorID THEN pc.ColorName END IS NOT NULL
GO
CREATE UNIQUE CLUSTERED INDEX CarDetailsImed_CarID ON dbo.CarDetails_Imed (CarID, ManufacturerName, whichColor, Color)
GO
CREATE VIEW dbo.CarDetails_Final
WITH SCHEMABINDING
AS
SELECT
CarID, ManufacturerName, [FirstColor], [SecondColor]
FROM
(
SELECT
  
CarID,
  
ManufacturerName,
  
whichColor,
  
Color
FROM dbo.CarDetails_Imed WITH (NOEXPAND)
)
src
PIVOT
(
MAX(Color)
FOR whichColor IN ([FirstColor], [SecondColor])
)
pvt
GO

The first thing Steve did was add a new table called dbo.Two into the mix.  This table has, you guessed it, two rows holding values 1 and 2 respectively.  Then he rewrote the query to use a cross join, so the output is going to be double the size in rows.  A slight error in code was corrected by Aaron initially, and then Steve provided the fix as well with the idea that you could pivot the data to get the output to match that of the original view since the cross join is going to be a normalized return instead of a denormalized output as in the original view.

This view can now be indexed and used as an intermediate view to the actual results view which will pivot the data to provide the formatted output that would come from the original view that could not be indexed. One nice thing is that you can put the table hint WITH(NOEXPAND) directly into the output view DDL definition so that queries don't have to use this option explicitly. It has already been defined in the view itself. Since the intermediate view is now indexable, you can also create any needed covering indexes to satisfy the queries against the output view and assist with the pivot.

I know this is a big hack to make something work, but it is one way to solve the problem. One final note is that since this uses the PIVOT operator it is not possible as coded above in SQL Server 2000. However, you can write a 2000 compliant pivot query that would still make use of the indexed intermediate view.

No comments:

Post a Comment