Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Sunday, March 11, 2018

Estimated Number of Rows = 1 Query Very Slow or Not Returning Data



A query that is slow that should be fast (e.g. underlying tables are not enormous, has been fast in the past, etc. - I'll let you determine what should or should not be fast) is often (but not always) slow because of a "bad" query plan. Determining what is and what is not a good query plan is not something we're going to get into here but if you're at your wits end with a query that has run for hours and should be much faster, there's a good chance you don't have an optimal plan.

If you've landed on this page it's likely that you've already looked at your query plan and seen that some operators are showing Estimated Number of Rows = 1 when you know there are definitely more rows than that.

A key ingredient to a good query plan is accurate cardinality estimation. Cardinality refers to the uniqueness of values contained within a particular column or index and has a strong influence on the SQL Server Query Optimizer. It will drive decisions such as which join operator (e.g. hash match, netsted loops, merge) to use which will significantly impact query performance.

If you're encountering the Estimated Number of Rows = 1 there's a good chance that either your statistics need to be updated and/or the Cardinality Estimator is failing you.

To resolve the issue first trying updating statistics on the table(s) in question. Use the WITH FULLSCAN option to cover your bases.

 UPDATE STATISTICS table_or_indexed_view_name WITH FULLSCAN 
 ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE; 

After you've updated your statistics, take another look at your query plan. Has the Estimated Number of Rows value changed to something more accurate? If so, try running your query again. If not, it might be that you're having issues with the Cardinality Estimator.

The Cardinality Estimator received an overhaul with SQL Server 2014 and while it's better in most cases, it is not in all. In fact, I stumbled upon a bug in the CE just last year that Microsoft acknowleged and fixed. If you suspect the CE is not working as it should be there are still a few things you can try.

First, determine if you're having an issue that someone else has already encountered and that Microsoft has fixed. You can do this by enabling trace flag 4199 which will enable all query optimizer fixes. You can first try this at the query level by using the following option.

 OPTION(QUERYTRACEON 4199)  

If this works and you want to apply this globally, you can execute the following.

 DBCC TRACEON (4199, -1); 
 ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE;   

If you still haven't solved the problem you can have SQL Server use the old or "legacy" CE as they call it. This can be used at the query level using the following hint after your query

 OPTION (USE HINT ( 'FORCE_LEGACY_CARDINALITY_ESTIMATION' ))  

or at the database level using the following script.

 ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON; 
 ALTER DATABASE SCOPED CONFIGURATION CLEAR PROCEDURE_CACHE; 

And if none of that works...
Take a look at your query plan. Do you have Merge Joins that require sorts on large data sets? Are there Nested Loops that are taking forever (try using Live Query Statistics to find your bottlenecks). You can try using join hints to force a HASH join and see if that helps.

I know all too well how painful this type of situation can be. While we were very grateful that Microsoft fixed our problem, it took months to identify it and then months more before it was available in a service pack. Unfortunately for us, using the Legacy CE was not an option. Hopefully for you, one of these fixes will work!

Friday, August 10, 2012

SSIS Event Handlers: Behavior In Package Hierarchy


I recently stumbled onto what seemed to be an odd performance problem in an SSIS project I was working on. The problem was that I had a number of straightforward packages that ran extremely fast but sometimes, without explanation, ran really slow.

Luckily the project included a robust logging system that provided, among other things, the execution duration of every package and Control Flow task.  The logs showed that the Data Flows in the packages in question were running in 10 to 20 seconds yet the packages were at times taking upwards of 20 minutes. Furthermore, I found that when called on their own, each package did not run much longer than its single data flow, yet when run as a child of another package (via an execute package task) the execution duration more than doubled. In one example I had a package that ran in 10 seconds on its own, 23 seconds as a child, 56 seconds as a child 3 levels deep, and almost 3 minutes when run as a child four levels deep. In that same example the Data Flow took only 7 seconds. So why was the package so slow when run as a child of one or more packages?

Time for SQL Server Profiler.

The logging mechanisms in place were stored procedures being called by Execute SQL tasks and were being passed things like a batch Id, Package Id, Source Name, Source GUID, Start Time, End Time, etc. Once I began profiling it was clear that there were way too many Pre and Post execute logging stored procedure calls.

Let me take a step back and explain the project/package design. The project contained about 50 SSIS packages, some of which were being called by Execute Package tasks within the project. For example, there was a Master package which called some staging packages and some staging packages that called lower level staging packages, and those in turn called some generic logging packages. There were as many as four packages in any given package execution hierarchy. In each package, regardless of where in the package hierarchy it fell, there were four event handlers:
  1. OnError: Handled exceptions that were then logged to the log database.
  2. OnVariableValueChanged: Handled the change in those variables we were interested in and logged to the log database.
  3. OnPreExecute: Handled the logging of the pre-execution data (e.g. start time, source name, etc.) for all package tasks.
  4. OnPostExecute: Handled the logging of post-execution data (e.g. end time) for all package tasks.
Getting back to what I was seeing in Profiler (many calls to the Pre and Post execute logging stored procedures), it only made sense that the OnPreExecute and OnPostExecute event handlers were being fired each time the logging stored procedures were being called. So why would those event handlers fire so many times?

I started to research event handlers propagation to understand what happens in SSIS when we have package level event handlers and a package design in which packages call other packages. I quickly ran into Jamie Thomson’s “SSIS: Understanding event propagation” blog post. In his post he talks about three things:
  1. Events being caught by more than one event handler.
  2. Event handlers firing events.
  3. The System::Propagate variable.

Numbers 1 and 2 were what I was most interested in. Number 3 is something I’ve written about in the past in a blog post title “SSIS Foreach Loop Container: Continue on Error” when explaining how to continue a Foreach Loop container when a child task fails.

I took Jamie’s example and modified it to fit my situation which was a four package hierarchy with a single Control Flow task (empty script task just to get the event handler to fire) in the lowest level package and an OnPostExecute event handler in all packages. Instead of an empty Control Flow task in the parent packages, an Execute Package task was used to call the package one level down (e.g. Package_Level2 called Package_Level1, PacakageLevel3 called Package_Level2, etc.). Screen shots below depict the design.

Package_Level1 Control Flow

Package_Level1 OnPostExecute Event Handler

Package_Level2 Control Flow w/ Execute Package Edit Dialog


When running this sample package(s) I found that each subsequent level in the hierarchy more than doubles the number of events fired. When running Package_Level1 the OnPostExecute event handler is executed twice. Once for the Level1_EmptyControlFlowTask task and then again for the OnPostExecute of the package itself.  When running Package_Level2 the OnPostExecute event handler is fired a total of 8 times. See the table below. Package_Level1 bubbles up the Level1_EventHandler (twice), the Level1_EmptyControlFlowTask, and the Package_Level1 OnPostExecute events to the Package_Level2 event handler. In addition to those 4 events we still have the original two events we say when executing Package_Level1 on its own. And, we have the Package_Level2 OnPostExecute events for the Level2_ExecuteChildPackage (call to execute Pacakge_Level1 package) and the Package_Level2 package execution.

Sequence
Event Handler Package
Task
Event
1
Package_Level1
Level1_EmptyControlFlowTask
OnPostExecute
2
Package_Level2
Level1_EventHandler
OnPostExecute
3
Package_Level2
Level1_EmptyControlFlowTask
OnPostExecute
4
Package_Level1
Package_Level1
OnPostExecute
5
Package_Level2
Level1_EventHandler
OnPostExecute
6
Package_Level2
Package_Level1
OnPostExecute
7
Package_Level2
Level2_ExecuteChildPackage
OnPostExecute
8
Package_Level2
Package_Level2
OnPostExecute

When we execute Package_Level4 we see a total of 52 events fired, all triggered by the single Level1_EmptyControlFlowTask task in Package_Level1.
Below is the formula for determining how many additional events will be fired by the top-level package in a package hierarchy, per Control Flow task when there is a single OnPostExecute event handler task in all packages in the hierarchy.

y = 2^n + x

x = previous package level’s additional events
n = level in package hierarchy (starting at base package level 1)
y = total additional events fired at level n

Package Level
Calculation
Additional Events Fired
Total Events Fired
1
2^1 + 0
2
2
2
2^2 + 2
6
8
3
2^3 + 6
14
22
4
2^4 + 14
30
52
5
2^5 + 30
62
114

As you can see, having nested packages with package level event handlers can create quite a bit of overhead. I initially failed to see this in my logs because my logging stored procedures were ignoring the event handlers that were fired by other event handlers. However, the process still had the overhead of all the events firing and the stored procedure calls, the data just wasn’t being persisted to the database.

Event handler behavior can be very tricky. Don’t be like me, understand what’s happening before implementing your event handler logic.

Monday, September 12, 2011

OLE DB Destination or SQL Server Destination

SSIS offers a number of Data Flow destination components. When working with SQL Server databases as a destination there are two obvious choices.



OLE DB Destination




SQL Server Destination


Microsoft claims up to a 15% performance increase by using the SQL Server destination. Therefore it seems like a good choice. However, there is one rather large caveat. SSIS must be running on the same SQL Server as the destination database. It cannot write to a remote SQL Server.

That said, I have seen many solutions that have both SSIS and the destination database on the same server so it seems to make sense to use it in these situations. Personally, I choose not to. By using the SQL Server destination you're locking yourself into having to make a code change if you decide to split your SSIS and database instances across multiple servers. Having SSIS code dependent on a solution's hardware architecture is something I always try to avoid.

Sunday, October 3, 2010

Nested Loops Join - No Join Predicate

I was recently troubleshooting a severe query performance issue when I ran into a query plan that used a nested loops join to combine data from two tables. The join was 97% of the cost. When I looked closer I saw that there were no join predicates specified. The join in the query looked pretty straightforward to me and used a single column on each side. Why would the optimizer do what equated to a cartesian join?

It turns out the query was a little more complicated than it looked. I was actually joining two views that were part of a view-based (non-persisted) dimension model. I was joining on logical primary and foreign keys which made sense.

When I dug into the views I found that one of the key columns was based on a user-defined fuction that was converting a datetime to an integer value so that it could be used to join to a date dimension view. Once I changed the underlying view to apply the same logic as the UDF at the view level, so that a call to the UDF was unecessary, the query executed within a second as expected.

Other behavior that I noticed was that if I changed the inner join to a left join, the optimizer came up with a different much more efficient plan. This appears to be a flaw in the optimizer but I would like to speak to someone at Microsoft before making that claim.

The lesson learned here is that if you have a poorly performing query due to a nested loops join with no join predicate(s) it's not ncessarily the case that you're missing join criteria in your query (as all the posts I was able to find seemed to point to). The culprit could be a UDF on a join column.

Sunday, August 29, 2010

OPTION (FORCE ORDER)

I was recently faced with a difficult performance issue. In an effort to save a client some time and money I convinced them that a view-based dimensional model was the way to go over a table/ETL based solution. I will blog about when this option is an option and the benefits to going this route some time in the near future. Anyhow, based on this decision I ended with a performance issue that I wouldn't have otherwise had.

The view-based dimensional database consisted of 30-40 fact and dimension views. As I created these views I made sure to test them for performance issues and tuned accordingly. I tested each view on its own and I tested individual joins between fact and dimension views.

Everything was working fine until I began creating sample queries to show end-users possible queries they could write against the new database. What I found was that when a particular dimension view was joined to a fact view along with one or more other dimension views I had unexpected performance issues. Queries that should take a few seconds to execute were taking 45-50 seconds.

In looking at the execution plans I saw that the optimizer was not doing what I had expected based on what I had seen for plans for the views executed on their own. I saw weird things like LEFT OUTER joins instead of INNER joins to my views resolving or partially resolving the issue.

I couldn't understand why the optimizer would treat the SQL in my views differently now that it was joined to other tables or views. To be honest, I'm still not sure. What I am sure of is that there is a query hint that forces the optimizer to create a plan for the view portion of the SQL irrespective of any other objects in the statement by forcing the optimizer to preserve the table/join order contained in the view definition. This hint is the FORCE ORDER hint and can be added to the end of the SQL statement using the following syntax.

    OPTION(FORCE ORDER)

Below is an example.

SELECT *
FROM table1 t1
    INNER JOIN view1 v1 ON t1.col1 = v1.col3

OPTION(FORCE ORDER)

Monday, June 21, 2010

DISTINCT or GROUP BY?

GROUP BY and DISTINCT can return the same results when looking for uniqueness in a resultset. Consider the Sales table below.

Store     Product        Date

-------------------------------------
A          Monopoly    1/10/2010
B          Connect 4    1/12/2010
A          Monopoly    1/14/2010
A          Battle Ship   1/16/2010
B          Othelo         1/16/2010
 
If you wanted to know what games each store sold from 1/10/2010 through 1/16/2010 you could write either of the following queries:

SELECT DISTINCT
    Store,
    Product
FROM
    Sales

-OR-

SELECT
    Store,
    Product
FROM
    Sales
GROUP BY
    Store,
    Product

So which one?? I hate to admit it but back when I first got into SQL performance tuning I was working in an Oracle 8i environment. If I remember correctly, GROUP BY performed better than DISTINCT for one reason or another. Well in the days of SQL Server 2005 and 2008, GROUP BY and DISTINCT perform equivalently. Therefore, my rule of thumb is to use DISTINCT when no aggregations on additional, non-unique columns are necessary. If you just want unique combinations of store and game use distinct. If you want to know how many of each game were sold by store then use GROUP BY.

Monday, March 29, 2010

Performance Issue: UNPIVOT with a Large Number of Columns

I love the PIVOT and UNPIVOT commands. They have saved me a ton of time on quite a few projects. UNPIVOT especially, since many times when working on BI projects I'm handed a monster spreadsheet by someone on the business side. Business people like seeing pivoted data. Database people like seing normalized data. I won't get into the keywords/commands here since there is plenty of material out there, most of which do a better job than I ever could of explaining their use.

However, I will tell you that SQL Server does not like "UNPIVOTing" a large number of columns as I recently found out. I should note that the query I have contains a few nested SELECTs (which I'm sure is part of the problem). I don't know the internals of what the database engine is doing when you using these commands but I do know they produce some pretty ugly execution plans. Since I just started this project and have a year of work to deliver in about 2 months (I'm not kidding) I haven't had the time to deconstruct the execution (nor should I really be writing about this topic).

What I have figured out is that if I break up a 280+ column UNPIVOT into 20 column chunks (an unpivot for every 20 columns - yes this is a maintenance nightmare) then I get reasonable performance (a couple of minutes to execute) whereas my original query never completed.

There's no way I'm going to be ok with having 15 queries instead of one to solve a performance issue but this is what I know for now and I'm able to present a working proof-of-concept. I hope to have a better understanding of the issue and as a result, the solution soon. I'll keep you posted. In the meantime, if you know what's going on please share . . .

Wednesday, January 6, 2010

Should Foreign Key Columns be Indexed?

People seem to be confused when it comes to indexing of foreign key columns. For one, I believe some people assume that when you create a foreign key constraint (in SQL Server) an index is also created on the column(s) that make up the key/constraint. This is not the case.

That leads us to the original question . . . Should we create indexes on our foreign keys??

I believe that in most cases the answer is yes.

There are a few things to consider here:
1) Constraint checks
2) Cascading Updates and Deletes
2) Joins

1) If you delete a row in a table whose primary key or unique constraint is referenced by one or more foreign keys, SQL Server will search the foreign key tables to determine if the row can be deleted (assuming you have no cascade behavior defined). An index on the foreign key column(s) will speed this search.

2) The same is also true for update and delete cascades. If either are defined, the child/foreign key rows must be identified and updated and/or deleted. Again, the index helps find these rows more efficiently.

3) More often than not foreign key columns will be joined to primary key columns in queries. Joins are always more efficient when an index exists on both sides of the join.

If you ask me, it's a no-brainer to index your foreign key columns. How often do you have a database with foreign key relationships that don't fall under one of the 3 above scenarios??

Tuesday, December 8, 2009

SQL Server 2005/2008 Maximum Server Memory - What should it be?

While questions like this don't have black or white answers there is some simple guidance here . . .

You probably want to change it from the default configuration of 2,147,483,647 MB which is over 2 terabytes. I believe the idea here is to allow SQL Server to use as much memory as it wants. That's all well and good until it begins to starve the OS of memory which will in turn bring SQL Server to its knees.

There are implications related to OS editions, SQL Server editions, 32 versus 64 bit, AWE, etc. I won't get into that here but I will say that it's a good idea to set a maximum value and leave at least a half of a GB and if possible 1+ GB for the operating system.

For example, if you're on a budget and running 32 bit Windows Server 2003 Enterprise with 4GB of RAM (the max ram allowed on this OS) I would suggest setting this value to somewhere between 3,072 – 3,584 MB (3 - 3.5 GB).

Here are some recommendations from Glenn Berry's blog.

Thursday, October 8, 2009

SQL Server Management Studio Default Isolation Level Can Cause Locking/Blocking

Many unsofisticated and/or underfunded organizations have people executing ad hoc queries against production SQL Server instances with little or no oversight. I am currently in an organization where employees on the operations team as well as the marketing and business intelligence teams have the ability to execute queries via SQL Server Management Studio, directly against production OLTP databases. If a query takes an hour to run, so be it. They need their data and they will wait. These users know nothing about locks, blocking, resource contention, IO bottlenecks, etc and therefore have no idea what problems they might be causing.

Unfortunately, the default behavior for SQL Server Management Studio is to default each connection's Isolation Level to Read Committed. This can cause blocking under a number of different scenarios. I won't detail them here but believe me, it can be a problem.
However, this problem can be avoided. SSMS provides the ability to set the default Isolation Level for all queries. The option can be found by navigating to Tools -> Options. From there, Query Execution -> SQL Server -> Advanced.


I suggest setting the TRANSACTION ISOLATION LEVEL to READ UNCOMMITTED and the DEADLOCK_PRIORITY to Low. Setting the DEADLOCK_PRIORITY to Low will mean that, in the case of a deadlock, the SSMS query will most likely be aborted. Also note the LOCK TIMEOUT value of 30,000 milliseconds. This is the equivalent of 30 seconds and means that if an SSMS query encounters a lock for more than 30 seconds, the query will be aborted.
Note that you can also set the Isolation Level for the active connection by issuing the SET TRANSACTION ISOLATION LEVEL command. For example:
     SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

This command can also be used in stored procedures instead of using NOLOCK hints on all tables.

**Keep in mind that using this transaction isolation level can lead to inconsistent data in your result set since you can read uncommitted data. 
 

Below are the descriptions of the different Isolation Levels from SQL Server BOL.

READ COMMITTED
Specifies that shared locks are held while the data is being read to avoid dirty reads, but the data can be changed before the end of the transaction, resulting in nonrepeatable reads or phantom data. This option is the SQL Server default.


READ UNCOMMITTED
Implements dirty read, or isolation level 0 locking, which means that no shared locks are issued and no exclusive locks are honored. When this option is set, it is possible to read uncommitted or dirty data; values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the four isolation levels.


REPEATABLE READ
Locks are placed on all data that is used in a query, preventing other users from updating the data, but new phantom rows can be inserted into the data set by another user and are included in later reads in the current transaction. Because concurrency is lower than the default isolation level, use this option only when necessary.


SERIALIZABLE
Places a range lock on the data set, preventing other users from updating or inserting rows into the data set until the transaction is complete. This is the most restrictive of the four isolation levels. Because concurrency is lower, use this option only when necessary. This option has the same effect as setting HOLDLOCK on all tables in all SELECT statements in a transaction.

Monday, August 24, 2009

Checkpoint Files - Performance Issues w/ Loop Containers

During an interview today I asked a candidate if she had experience using SSIS Checkpoint functionality. Although I haven't confirmed what she told me, I suspect that it is true. She said that she has experienced performance issues related to using Checkpoint files in packages containing loop containers. My guess is that SSIS tracks the value of the variable being used for the loop counter in the checkpoint file. Writing a value to disk for every iteration in a loop probably has very significant performance effects.

Clustered Index Update - Why?

I was just troubleshooting a long-running query that happened to be an UPDATE statement and found that a Clustered Index Update operation was a good portion of the total query execution plan. Unfortunately, the table I'm updating has a composite primary key that is also the clustered index. However, the statement was not updating any columns in the key.

So why would there be a Clustered Index Update operation? Simply put, the clustered index is the table. What I mean by this is that table data is stored alongside the clustered key with this key dictating the order in which the rows are stored. Hence, any update to a table is an update to its clustered index.

Also, keep in mind that a clustered index update that affects the clustered key also has to update the all non-clustered indexes since all non clustered indexes reference the clustered key. However, this is not applicable to the above scenario as my UPDATE statement was not updating the clustered index/key.