Thursday, July 15, 2010

Data Warehouse vs. Data Mart

II'm currently working on a Business Intelligence project and depending on who I am talking to we are either building a Data Mart or a Data Warehouse and everyone's definitions of both seem to be vastly different. Since this makes communication a bit difficult at times I took it up on myself to attempt to clarify the difference between a mart and warehouse.

My project is following the Kimball approach of dimensional modeling so I tailored my definitions to this approach. Keep in mind that the detailed definitions differ depending based on whether you're talking about the top-down (Inmon) approach or the bottom-up (Kimball) approach. The differences are mostly around the modeling techniques and whether or not the data warehouse is in 3NF or de-normalized.

My non-technical description of data mart and data warehouse are as follows.

A data mart is typically a single star schema representing a specific subject area. Sales is a good example of this and Manufacturing is another.  A data warehouse contains the subject specific data marts and attempts to tie an entire organization’s subject data together using conformed dimensions (attribute data that can be used consistently across multiple facts/marts/star schemas).

That is the very simple description I provided the group I am working with. That said, I still needed to determine if we were building a data mart or a data warehouse. From my definition you would think it would be an easy thing to figure out. Since I tend to overthink most things in life I struggled a bit with this.

The database I'm building to support the Business Intelligence (BI) solution consists of subject areas such as Orders, Complaints, Deliveries, and about 4 more subject areas. They will each have their own fact table and share a number of dimensions. Based on that I would say it is a data warehouse because it is a set of data marts/start schemas providing data across numerous subject areas and sharing conformed dimensions. However, this solution is part of a commercial software product that supports Restaurant Order Taking and is not being developed for or within a specific organization. So now I believe it's a question of perspective. From the customers/restauranteur's perspective, the order taking application and the new BI add-on is simply a small part of a much larger picture. The data contained with the mart/warehouse might be considered a single subject area to this organization. The subject area would be Order Taking or more specifically Web and/or Call Center Order Taking. The customer might have other data around accounting, human resources, inventory, supply chain, etc. The solution I'm building could just be part of a bottom-up enterprise data warehouse at some point at which point it would be considered a data mart by the purchasing organization.

So which one is it? I believe you can argue that it's both. However, I believe we must look at it from our perspective of my client who is a software products company. My client sells a suite of order taking products to the restaurant industry. From within the software product company this is a data warehouse that brings together all aspects (subject areas) of their products.

You could argue that it doesn't matter and that this is purely a symantics or philisophical debate. Or possibly I've missed the boat. What do you think?

In relation to the product and the company that owns and sells this product to restauranteurs, this solution can be considered a data warehouse as I stated above but what about looking at it from the customers (restaurant owner) perspective?

will be an add-on to I would say it is a data warehouse because it captures data across the solution and consists of many subject areas.

No comments:

Post a Comment