I recently stumbled onto what seemed to be an odd
performance problem in an SSIS project I was working on. The problem was that I
had a number of straightforward packages that ran extremely fast but sometimes,
without explanation, ran really slow.
Luckily the project included a robust logging system that
provided, among other things, the execution duration of every package and
Control Flow task. The logs showed that
the Data Flows in the packages in question were running in 10 to 20 seconds yet
the packages were at times taking upwards of 20 minutes. Furthermore, I found that when called on their own, each package did not run much longer than its single data flow, yet when run as a child of another package (via an execute package task) the execution duration more than doubled. In one example I had a package that ran in 10 seconds on its own, 23 seconds as a child, 56 seconds as a child 3 levels deep, and almost 3 minutes when run as a child four levels deep. In that same example the Data Flow took only 7 seconds. So why was the package so slow when run as a child of one or more packages?
Time for SQL Server Profiler.
Time for SQL Server Profiler.
The logging mechanisms in place were stored procedures being
called by Execute SQL tasks and were being passed things like a batch Id,
Package Id, Source Name, Source GUID, Start Time, End Time, etc. Once I began profiling it was clear that there were way too many Pre and Post execute
logging stored procedure calls.
Let me take a step back and explain the project/package
design. The project contained about 50 SSIS packages, some of which were being
called by Execute Package tasks within the project. For example, there was a
Master package which called some staging packages and some staging packages
that called lower level staging packages, and those in turn called some generic logging packages. There were as many as four packages
in any given package execution hierarchy. In each package, regardless of where in the
package hierarchy it fell, there were four event handlers:
- OnError: Handled exceptions that were then logged to the log database.
- OnVariableValueChanged: Handled the change in those variables we were interested in and logged to the log database.
- OnPreExecute: Handled the logging of the pre-execution data (e.g. start time, source name, etc.) for all package tasks.
- OnPostExecute: Handled the logging of post-execution data (e.g. end time) for all package tasks.
Getting back to what I was seeing in Profiler (many calls to
the Pre and Post execute logging stored procedures), it only made sense that
the OnPreExecute and OnPostExecute event handlers were being fired each time
the logging stored procedures were being called. So why would those event
handlers fire so many times?
I started to research event handlers propagation to
understand what happens in SSIS when we have package level event handlers and a
package design in which packages call other packages. I quickly ran into Jamie
Thomson’s “SSIS:
Understanding event propagation” blog post. In his post he talks about
three things:
- Events being caught by more than one event handler.
- Event handlers firing events.
- The System::Propagate variable.
Numbers 1 and 2 were what I was most interested in. Number 3
is something I’ve written about in the past in a blog post title “SSIS
Foreach Loop Container: Continue on Error” when explaining how to continue
a Foreach Loop container when a child task fails.
I took Jamie’s example and modified it to fit my situation
which was a four package hierarchy with a single Control Flow task (empty
script task just to get the event handler to fire) in the lowest level package
and an OnPostExecute event handler in all packages. Instead of an empty Control
Flow task in the parent packages, an Execute Package task was used to call the
package one level down (e.g. Package_Level2 called Package_Level1,
PacakageLevel3 called Package_Level2, etc.). Screen shots below depict the
design.
Package_Level1 OnPostExecute Event Handler
Package_Level2 Control Flow w/ Execute Package Edit Dialog
When running this sample package(s) I found that each
subsequent level in the hierarchy more than doubles the number of events fired.
When running Package_Level1 the OnPostExecute event handler is executed twice.
Once for the Level1_EmptyControlFlowTask task and then again for the OnPostExecute
of the package itself. When running
Package_Level2 the OnPostExecute event handler is fired a total of 8 times. See
the table below. Package_Level1 bubbles up the Level1_EventHandler (twice), the
Level1_EmptyControlFlowTask, and the Package_Level1 OnPostExecute events to the
Package_Level2 event handler. In addition to those 4 events we still have the
original two events we say when executing Package_Level1 on its own. And, we
have the Package_Level2 OnPostExecute events for the Level2_ExecuteChildPackage
(call to execute Pacakge_Level1 package) and the Package_Level2 package
execution.
Sequence
|
Event Handler Package
|
Task
|
Event
|
1
|
Package_Level1
|
Level1_EmptyControlFlowTask
|
OnPostExecute
|
2
|
Package_Level2
|
Level1_EventHandler
|
OnPostExecute
|
3
|
Package_Level2
|
Level1_EmptyControlFlowTask
|
OnPostExecute
|
4
|
Package_Level1
|
Package_Level1
|
OnPostExecute
|
5
|
Package_Level2
|
Level1_EventHandler
|
OnPostExecute
|
6
|
Package_Level2
|
Package_Level1
|
OnPostExecute
|
7
|
Package_Level2
|
Level2_ExecuteChildPackage
|
OnPostExecute
|
8
|
Package_Level2
|
Package_Level2
|
OnPostExecute
|
When we execute Package_Level4 we see a total of 52 events
fired, all triggered by the single Level1_EmptyControlFlowTask task in
Package_Level1.
Below is the formula for determining how many additional events
will be fired by the top-level package in a package hierarchy, per Control Flow
task when there is a single OnPostExecute event handler task in all packages in
the hierarchy.
y = 2^n + x
x = previous package level’s
additional events
n = level in package hierarchy
(starting at base package level 1)
y = total additional events fired
at level n
Package Level
|
Calculation
|
Additional Events Fired
|
Total Events Fired
|
1
|
2^1 + 0
|
2
|
2
|
2
|
2^2 + 2
|
6
|
8
|
3
|
2^3 + 6
|
14
|
22
|
4
|
2^4 + 14
|
30
|
52
|
5
|
2^5 + 30
|
62
|
114
|
As you can see, having nested packages with package level
event handlers can create quite a bit of overhead. I initially failed to see
this in my logs because my logging stored procedures were ignoring the event
handlers that were fired by other event handlers. However, the process still had
the overhead of all the events firing and the stored procedure calls, the data
just wasn’t being persisted to the database.
Event handler behavior can be very tricky. Don’t be like me,
understand what’s happening before implementing your event handler logic.
Wonderful explanation! I never thought i would get a good guidance Hierarchy Structure
ReplyDeleteNice article!!!!!!!!!
ReplyDelete