SQL Server SELECT: Performance Issue: UNPIVOT with a Large Number of Columns

I love the PIVOT and UNPIVOT commands. They have saved me a ton of time on quite a few projects. UNPIVOT especially, since many times when working on BI projects I'm handed a monster spreadsheet by someone on the business side. Business people like seeing pivoted data. Database people like seing normalized data. I won't get into the keywords/commands here since there is plenty of material out there, most of which do a better job than I ever could of explaining their use.

However, I will tell you that SQL Server does not like "UNPIVOTing" a large number of columns as I recently found out. I should note that the query I have contains a few nested SELECTs (which I'm sure is part of the problem). I don't know the internals of what the database engine is doing when you using these commands but I do know they produce some pretty ugly execution plans. Since I just started this project and have a year of work to deliver in about 2 months (I'm not kidding) I haven't had the time to deconstruct the execution (nor should I really be writing about this topic).

What I have figured out is that if I break up a 280+ column UNPIVOT into 20 column chunks (an unpivot for every 20 columns - yes this is a maintenance nightmare) then I get reasonable performance (a couple of minutes to execute) whereas my original query never completed.

There's no way I'm going to be ok with having 15 queries instead of one to solve a performance issue but this is what I know for now and I'm able to present a working proof-of-concept. I hope to have a better understanding of the issue and as a result, the solution soon. I'll keep you posted. In the meantime, if you know what's going on please share . . .

1 comment:

Dave ElishApril 26, 2010 at 9:57 AM
UPDATE: There's always a solution! It turns out that most of my pain was related to nested SELECTs (derived tables). Many of these SELECTs were returning the same value for all rows (due to the denormalized nature of the source data) so I was able to extract these queries into single-row table variables. From there I was able to get these values into the "main" query by doing a select against a one row in-memory table.

The conclusion here is that I was wrong about performance issues with the UNPIVOT command. My solution "unpivots" a 200+ column table within my performance requirements (sub 1 minute for about 60,000 rows and 200+ columns).

Monday, March 29, 2010

Performance Issue: UNPIVOT with a Large Number of Columns

1 comment: