I've recently begun using Visual Studio 2010 database projects (DBPro). I think it's a great tool and having been a .NET developer for many years it is nice to see SQL Server development get the respect it deserves in a full-fledged (well almost) Visual Studio project type.
While development productivity has been great with features like object level dependency checking, automated deployment with VSDBCMD, database dependency, data generation, pre and post build and deployment scripts, etc. I've recently encountered a rather glaring shortcoming that keeps Database Projects from having the full functionality of a C#, VB.NET, ASP.NET, etc. Visual Studio project.
The problem is related to the build process. Currently, to build a database project you need to have Visual Studio present on the machine that you are executing the build on. That might not seem like a big deal to many people since you might assume that any developer doing a build would have Visual Studio installed. Not the case. I'm currently working in a Java shop that uses SQL Server as the relational database platform. There are about 12+ developers that run any number of ANT targets to build their development environments on a frequent basis.
ANT can call MSBuild targets. That's not the problem. The problem is that to build a Visual Studio database project you must have certain MSBuild targets and a number of Microsoft assemblies in the GAC that are only installed with Visual Studio 2010. This is not the case for other Visual Studio project types. In most cases all that is needed is .NET 4.0.
While we're not dead in the water we have to rely on a solution that is less than ideal although not altogether terrible. What we have to do is build the project on one of the database developer's machines that has Visual Studio present. We then take the build output (.dbschema, .deploymanifest, etc.) and commit it to source control. From there any developer can run the ANT targets which then call VSDBCMD and deploy the database to their local development environment.
What we have works but it is a departure from how a standard Visual Studio project would work. It also requires the commital of "compiled" code which is never a good idea. The biggest struggle is making sure we the database developers build in the right configuration (Release) and commit the correct build output. It's an easy thing get wrong and can cause a lot of wasted troubleshooting time down stream if it is done incorrectly.
I have spoke to Microsoft about the issue. I am waiting to hear back from a Microsoft support escalation engineer that is in term getting in touch with the DBPro project manager for alternative options. Additionally, I have submitted a request on Microsoft Connect. Feel free to check if any progress has been made on the Microsoft Connect website.
SELECT [tips tricks and commentary] FROM [my experiences] WHERE subject IN ('database', 'data warehouse', 'business intelligence')
Wednesday, July 28, 2010
Monday, July 26, 2010
Report Functional Requirements - Too Often Overlooked
A template for what follows can be found on my personal website.
Reports are often an afterthought and commonly believed to be an easy thing to create. As a result they are often overlooked during the requirements gathering and planning phases of a project and when they are considered they tend to be the red-headed stepchild and don't get the attention they deserve. This is usually a time consuming and costly mistake.
Reporting can often be complex and requirements interpreted many different ways by many stakeholders (e.g. business users, business managers, developers, architects, etc.). Reporting tools have come a long way and can do many different things many different ways. The tools can actually serve as a fully functioning UI (user interface) in many cases. Not to mention, data itself is just that - data. It can be manipulated in infinite ways to provide information that not only looks different but is actually different and leads to different values and therefore conclusions.
Most, if not all of the time, report writing will be an iterative process whether your an agile shop or not. The key here is to not iterate to the point that a single report costs $20k to create (yes I have seen this) and/or that the developers, business analysts, end-users, etc. are completely frustrated and just want to say it's done so they don't have to deal with it anymore (yes I have seen this as well).
I believe there are a few key here that can help avoid the $20k meaningless report.
It doesn't matter what tool(s) you are using. It can be Crystal Reports, Cognos Report Studio, Jasper Reports, SQL Server Reporting Services, etc. They all have the same basic functionality and therefore the same requirements needed to develop a report using them.
Reports are often an afterthought and commonly believed to be an easy thing to create. As a result they are often overlooked during the requirements gathering and planning phases of a project and when they are considered they tend to be the red-headed stepchild and don't get the attention they deserve. This is usually a time consuming and costly mistake.
Reporting can often be complex and requirements interpreted many different ways by many stakeholders (e.g. business users, business managers, developers, architects, etc.). Reporting tools have come a long way and can do many different things many different ways. The tools can actually serve as a fully functioning UI (user interface) in many cases. Not to mention, data itself is just that - data. It can be manipulated in infinite ways to provide information that not only looks different but is actually different and leads to different values and therefore conclusions.
Most, if not all of the time, report writing will be an iterative process whether your an agile shop or not. The key here is to not iterate to the point that a single report costs $20k to create (yes I have seen this) and/or that the developers, business analysts, end-users, etc. are completely frustrated and just want to say it's done so they don't have to deal with it anymore (yes I have seen this as well).
I believe there are a few key here that can help avoid the $20k meaningless report.
- Gather the minimum set of requirements necessary to create the fundamental pieces of a report. Use a template to collect this information. This will ensure that you have consistent information from which you can create a consistent set of reports.
- Require the report requester to provide mock-ups. When possible, have them create the mock-ups in Excel and use real formulas for calculated columns. The greatest value mock-ups provide is that they force the requester to think about what they are asking for. They make the requester realize things like their idea was malformed and what they're asking for doesn't make sense and/or isn't possible. It also helps them think through things they wouldn't have thought about until the first iteration of report was delivered which helps cut down in the number of iterations and the need for developers and/or BAs to asked "stupid" questions that can annoy and embarrass the requester.
It doesn't matter what tool(s) you are using. It can be Crystal Reports, Cognos Report Studio, Jasper Reports, SQL Server Reporting Services, etc. They all have the same basic functionality and therefore the same requirements needed to develop a report using them.
- A meaningful title for the report (e.g. Active Customers Name & Address)
- A concise description of the report that includes
- The primary "thing" being reported on (e.g. customer)
- The subject of the attributes being report (e.g. name and address)
- The main criteria (e.g. status = active)
- For example, "The Active Customers Name and Address report includes the names and addresses of all customers with a Status of "Active" and is grouped by State of residence."
- A list of the columns with
- Database table and column name if possible
- Label text for the report
- Any calculation
- If it's a link to another report or external source and if so a description of how the link should work
- If it's a sortable column
- Right, left, or middle alignment
- Value formatting (MM-DD-YYYY, YYYY-MM-DD, $100, ($100.00), etc.)
- Layout
- Column order
- Landscape or portrait
- Cross-tab or row/column
- Headers and footers
- Image/logo
- Creation date
- Page number and format (p. 1, page 1 of 10, etc.)
- Selected parameter value
- Grouping
- If grouping, what are the group levels?
- How should each group be sorted?
- Should the group have subtotals and if so which columns and what is the calculation if complex (e.g. weighted average)?
- Should there be a page break and/or line before/between/after the group?
- Sorting
- How should the report be sorted (which column(s), in what order, and which direction - ascending or descending)?
- Should the user be able to specify the sort order (dynamic sorting)?
- If grouping is present, what order should the groups be sorted?
- Is run-time sorting allowed on a column after report execution?
- Filter Criteria
- What filter criteria (WHERE clause) are there if any? For example, should only "active" customers be included. Anothe example would be "completed" orders.
- Should these criterion be evaluated using an AND operator or an OR operator?
- The column(s) that the filter is applied to should be clearly stated.
- Parameters/Filters
- Label/text for the parameter
- Drop-down, text, yes/no
- Are multiple selections allowed?
- What are the values or where do they come from (static list of values or from a table)
- Should we assume that all criteria are applied to the result set or are the "OR" conditions?
- Does one parameter drive the value list of another parameter (e.g. Country changes values in state/region parameter drop-down)?
- What column or field in the result set should this parameter/filter be applied to?
- Schedule
- If the report is deployed to a "report server" should it have one or more standard schedules on which it runs?
- Export format
- If the report is deployed to a "report server" what is the default format (Excel, Word, CSV, PDF) that it should be written to? If there are scheduled instances, what format should those be written to?
- Recipients (burst or data-driven - what is the logic?)
- If a scheduled report instance is burstable (Cognos) or a data-driven email subscription (SSRS) who are the recipients? Is it a static list or is it data-driven?
- Misc
- Charts or graphs?
- Drill-downs?
- Links to other reports (drill-through)?
Thursday, July 15, 2010
Data Warehouse vs. Data Mart
II'm currently working on a Business Intelligence project and depending on who I am talking to we are either building a Data Mart or a Data Warehouse and everyone's definitions of both seem to be vastly different. Since this makes communication a bit difficult at times I took it up on myself to attempt to clarify the difference between a mart and warehouse.
My project is following the Kimball approach of dimensional modeling so I tailored my definitions to this approach. Keep in mind that the detailed definitions differ depending based on whether you're talking about the top-down (Inmon) approach or the bottom-up (Kimball) approach. The differences are mostly around the modeling techniques and whether or not the data warehouse is in 3NF or de-normalized.
My non-technical description of data mart and data warehouse are as follows.
A data mart is typically a single star schema representing a specific subject area. Sales is a good example of this and Manufacturing is another. A data warehouse contains the subject specific data marts and attempts to tie an entire organization’s subject data together using conformed dimensions (attribute data that can be used consistently across multiple facts/marts/star schemas).
That is the very simple description I provided the group I am working with. That said, I still needed to determine if we were building a data mart or a data warehouse. From my definition you would think it would be an easy thing to figure out. Since I tend to overthink most things in life I struggled a bit with this.
The database I'm building to support the Business Intelligence (BI) solution consists of subject areas such as Orders, Complaints, Deliveries, and about 4 more subject areas. They will each have their own fact table and share a number of dimensions. Based on that I would say it is a data warehouse because it is a set of data marts/start schemas providing data across numerous subject areas and sharing conformed dimensions. However, this solution is part of a commercial software product that supports Restaurant Order Taking and is not being developed for or within a specific organization. So now I believe it's a question of perspective. From the customers/restauranteur's perspective, the order taking application and the new BI add-on is simply a small part of a much larger picture. The data contained with the mart/warehouse might be considered a single subject area to this organization. The subject area would be Order Taking or more specifically Web and/or Call Center Order Taking. The customer might have other data around accounting, human resources, inventory, supply chain, etc. The solution I'm building could just be part of a bottom-up enterprise data warehouse at some point at which point it would be considered a data mart by the purchasing organization.
So which one is it? I believe you can argue that it's both. However, I believe we must look at it from our perspective of my client who is a software products company. My client sells a suite of order taking products to the restaurant industry. From within the software product company this is a data warehouse that brings together all aspects (subject areas) of their products.
You could argue that it doesn't matter and that this is purely a symantics or philisophical debate. Or possibly I've missed the boat. What do you think?
In relation to the product and the company that owns and sells this product to restauranteurs, this solution can be considered a data warehouse as I stated above but what about looking at it from the customers (restaurant owner) perspective?
will be an add-on to I would say it is a data warehouse because it captures data across the solution and consists of many subject areas.
My project is following the Kimball approach of dimensional modeling so I tailored my definitions to this approach. Keep in mind that the detailed definitions differ depending based on whether you're talking about the top-down (Inmon) approach or the bottom-up (Kimball) approach. The differences are mostly around the modeling techniques and whether or not the data warehouse is in 3NF or de-normalized.
My non-technical description of data mart and data warehouse are as follows.
A data mart is typically a single star schema representing a specific subject area. Sales is a good example of this and Manufacturing is another. A data warehouse contains the subject specific data marts and attempts to tie an entire organization’s subject data together using conformed dimensions (attribute data that can be used consistently across multiple facts/marts/star schemas).
That is the very simple description I provided the group I am working with. That said, I still needed to determine if we were building a data mart or a data warehouse. From my definition you would think it would be an easy thing to figure out. Since I tend to overthink most things in life I struggled a bit with this.
The database I'm building to support the Business Intelligence (BI) solution consists of subject areas such as Orders, Complaints, Deliveries, and about 4 more subject areas. They will each have their own fact table and share a number of dimensions. Based on that I would say it is a data warehouse because it is a set of data marts/start schemas providing data across numerous subject areas and sharing conformed dimensions. However, this solution is part of a commercial software product that supports Restaurant Order Taking and is not being developed for or within a specific organization. So now I believe it's a question of perspective. From the customers/restauranteur's perspective, the order taking application and the new BI add-on is simply a small part of a much larger picture. The data contained with the mart/warehouse might be considered a single subject area to this organization. The subject area would be Order Taking or more specifically Web and/or Call Center Order Taking. The customer might have other data around accounting, human resources, inventory, supply chain, etc. The solution I'm building could just be part of a bottom-up enterprise data warehouse at some point at which point it would be considered a data mart by the purchasing organization.
So which one is it? I believe you can argue that it's both. However, I believe we must look at it from our perspective of my client who is a software products company. My client sells a suite of order taking products to the restaurant industry. From within the software product company this is a data warehouse that brings together all aspects (subject areas) of their products.
You could argue that it doesn't matter and that this is purely a symantics or philisophical debate. Or possibly I've missed the boat. What do you think?
In relation to the product and the company that owns and sells this product to restauranteurs, this solution can be considered a data warehouse as I stated above but what about looking at it from the customers (restaurant owner) perspective?
will be an add-on to I would say it is a data warehouse because it captures data across the solution and consists of many subject areas.