Current File Name: Extract.xls
Historical File Name: Extract_YYYYMMDD_MMSS.xls
I figured this would be easy. The first thing I did was look for the OverwriteFile or ReplaceFileproperty. What I found was that the only non-connection specific property the Excel Connection Manager exposed was FirstRowHasColumnName. At that point I figured I would experiment and find out what would happen if I tried to write to a pre-existing file. The result was that the data being written to the file was simply appended to the pre-existing data in the file. Definitely not the behavior I was looking for.
I then hit "the Google" as George W. so fondly calls it. I found solutions that proposed deleting data from worksheet(s), deleting worksheets, and creating new worksheets using the Jet 4.0 SQL syntax to "DROP" and "CREATE" tables. In Jet/Excel terms, a table is an Excel worksheet. I found that the DROP didn't actually delete the worksheet but deleted rows from the worksheet. This could have worked except that the next time the worksheet is written to, the new rows start after the last row of the previously deleted data, leaving a bunch of empty rows in the beginning of the worksheet. This approach did not cut the mustard. So ended my search for the wheel I was attempting to not reinvent.
The approach I came up with does the following.
- Creates a shell (only the headers) Excel file with the name Extract_wip.xls.
- Uses a Data Flow to populate the wip Excel file.
- Copies the wip file to an Archive folder but renames to Extract_YYYYMMDD_MMSS.xls.
- Renames the wip file to Extract.xls, overwriting the previously created Extract.xls files (if one exists).
This is a screen shot of the Control Flow.
*If you don't have an archival/historical requirement you can remove that functionality and make this solution even simpler.
Connection Managers
ExcelFile
The Connection Manager used in the final step of the process to rename the wip file to the final file name. Nothing fancy here although I would suggest making this configurable.
ExcelFileArchive
The Connection Manager used to copy the wip file to an archive folder with the date and time in the file name. Below is the expression used to generate the file name. You will want to set the Connection Manager's ConnectionString expression to this.
"C:\\ExcelExtract\\Archive\\Extract_" + RIGHT( "0000" + (DT_STR, 4 , 1252) DATEPART( "year" , GETDATE() ) , 4 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "month" , GETDATE() ) , 2 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "day" , GETDATE() ) , 2 ) + "_" + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "hour" , GETDATE() ) , 2 ) + RIGHT( "00" + (DT_STR, 2 , 1252) DATEPART( "minute" , GETDATE() ) , 2 ) + ".xls"
ExcelFileForDataFlow
This Connection Manager should point to the wip file. You'll want to have a ConnectionString expression similar to the following.
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\ExcelExtract\Extract_wip.xls;Extended Properties=\"EXCEL 8.0;HDR=YES\";"
*HDR is whether the file has a header row.
ExcelFileWIP
This Connection Manager points to the wip file and is used to copy and rename the populated Excel file.
SomeOLEDBDataSource
This is used by the data flow. There could be one or more Connection Managers for your data flow.
Control Flow Tasks
Connection Type: EXCEL
Connection: ExcelFileForDataFlow
SQL Statement:
CREATE TABLE [ThisWillBeYourWorksheetName](
[SomeCharVal1] nvarchar(25),
[SomeCharVal2] nvarchar(255),
[SomeCharVal3] nvarchar(50),
[SomeDecimalVal1] double precision,
[SomeDecimalVal2] double precision,
[SomeIntegerVal1] integer)
*It is important that you use the JET syntax including the appropriate data types when writing your CREATE TABLE script.
This task will create a new Excel file with a worksheet named using the name in your CREATE TABLE script. It will use the file name found in the ExcelFileForDataFlow Connection Manager.
*If a file by the same name exists it will add a worksheet to that file. This is why we want to make sure this file does not exist when we start our process. Hence the wip file.
This is the Data Flow that writes to your Excel File. It doesn't matter what the data flow does, just make sure that it writes to an Excel Destination that uses the ExcelFileForDataFlow Connection Manager.
Destination Connection: ExcelFileForArchive
OverwriteDestination: True
Operation: Copy File
Source Connection: ExcelFileWIP
This File System Task copies the wip file to an Archive (or any other) folder using the Extract_YYYYMMDD_MMSS.xls naming convention.
Destination Connection: ExcelFile
OverwriteDestination: True
Operation: Rename File
Source Connection: ExcelFileWIP
This File System Task renames the wip file to the Extract.xls name, effectively overwriting the previously created Extract.xls file (if one exists).
Summary
While it seems like it would make sense for the Excel Connection Manager to allow for the overwriting of an existing file, it doesn't. The above process is the simplest way I have found to reproduce this seemingly lacking functionality.
Excellent article.
ReplyDeleteNot sure how this would work. When you use ExcelFileForDataFlow as destination for df_PopulateExcelFile it creates a WIP file and because it creates the WIP file the CreateExcelFileAndHeader task fails. If you delete the excel file the package wont run because the WIP file used as destination does not exist. Or am I missing something.
ReplyDelete