The problem I am referring to is finding a source for core organizational data such as products, customers, store locations, raw material types, and business units - the data that ultimately makes up the dimensional model's dimension tables. Often times this data is spread across numerous systems or better yet lives in one or more spreadsheets that are maintained by numerous people and are missing data as well has contain duplicates. In addition, the data that is there is often out of date or simply invalid.
A data warehouse is just about worthless if the dimensions by which the measures will be evaluated are of low quality. It could even be detrimental if business decisions are being made based on invalid data.
Before moving on to a technical solution to these data issues it is very import that everyone involved in the success (or failure) of an MDM and/or data warehousing solution understand that technology will not make the problem of unreliable, disparate data go away. Technology acts as an enabling platform for well thought out, planned, and accepted data governance and stewardship. A brief definition of the two can be found here.
MDS assumes you have these things figured out and that you want to implement them on a technology platform that provides one or more of the following:
- Data modeling (entities, attributes, hierarchies, etc.)
- Data browsing
- Manual data maintenance (add, edit, delete) via a modern UI
- Bulk import and export
- Data and Model versioning
- Audit capabilities (who changed what when)
- Business rules for data validation
- Workflow based data validation
- Ability to integrate (both ways) with other systems
- Fine-grained security
It's my experience that building an MDM database is easy. It's the UI, complex data validation, workflow, and integration with other systems that's the tough part. Lucky for me Microsoft now provides a platform that does all these things for me. How well it does them is what I spent a few days trying to figure out. Before I detail what I liked and didn't like about the product I will provide a general overview as well as an overview of the underlying architecture, technology, user interface, and integration points.
The purpose of MDS is to provide a platform for the creation, maintenance, and integration of high quality master data throughout an organization.
MDS comes with Enterprise and Data Center editions of SQL Server. Licensed the same way as SSIS, SSAS, and SSRS in that it can be installed free of any additional license fees on a machine that already has a paid SQL Server license. However, if installing an a server without SQL Server installed, it is necessary to pay for another license.
MDS can only be installed on a 64-bit server.
All model metadata and data (members) are stored in a SQL Server database.
Master Data Manager Web application and Web service layer built on ASP.NET and WCF respectively and are hosted by IIS. All web application methods ultimately pass through the WCF layer.
Windows service that acts as a broker between MDS and SharePoint in order to allow MDS business rules to use SharePoint 2010 workflows (build on WWF).This is optional.
Configuration Manager Web application that is similar to that of SSRS and allows for such things as enabling the web service and attaching to an MDS database.
When new entities are created or modified via Master Data Manager, calls are made to the WCF based web service that ultimately result in the creation or modification of SQL Server tables, views, etc. Unfortunately the tables are created with a “code name” such as tbl_2_10_EN. However, this shouldn’t be an issue as all data access should be done via auto-generated views, exports, and/or web services. There is also a view that maps the table code names to meaningful entity names.
There are a number of CLR-based functions that are used for merging, de-duping, and survivorship based on rules created by the data stewards and/or administrators.
Service Broker is used in scenarios where many emails are being sent or large amounts of data are being loaded. This allows for asynchronous processing that frees up the users to continue using the web application.
Database Mail, which resides in the msdb database, is used to send e-mail alerts.
Staging tables are used when importing data into MDS. This allows for a load to be completed in a single batch once it has been staged without error.
IIS hosts a default named application “MDS”. It also hosts a WCF Service called Service.svc which is located in the same directory as the application. It is highly recommended that all custom processes use the web service to interact with the database.
User Interface (Master Data Services Manager)
Allows administrative users to create data models to hold entity members (term member is used the same as in dimensional modeling to refer to a row/entity instance).
Allows users to add, edit, and delete members (entity instances).
Allows for the creation of hierarchical relationships between members.
Allows for the creation of collections of members for non-mutually exclusive groups of members.
Copies of entity data and related metadata (models) can be archived to create a separate version of the data.
Includes a business rules engine that can be used to enforce data quality as well as assist with data stewardship via workflows. Both data validation and manipulation rules as well as workflows have the ability to send e-mail alerts.
A granular security model can be used to fine tune specific user’s access to certain entities, attributes, and even members (rows).
When importing bulk data it is necessary to get that data into existing SQL Server staging tables. This can be done by making calls to out-of-the-box stored procedures and/or writing directly to the staging tables. For scheduled and programmatic imports, custom SSIS packages are recommended. Once these tables are populated you can invoke the import by using the API, web service, or the Master Data Manager user interface.
All MDS functionality can be executed via the IIS hosted, WCF-based web service named Service. It can be used to create a custom UI and/or integrate it with existing and/or new applications.
Although using the web service directly is recommended, MDS provides three .NET assemblies (Microsoft.MasterDataServices.Core, Microsoft.MasterDataServices.Deployment, Microsoft.MasterDataServices.Services). These are typically used for administrative functions such as creating and deploying update and deployment packages and cloning and deleting models.
I got started with the tool by installing it on a VMware player virtual machine with Windows Server 2008 R2. The install was relatively easy although I did have to do some post install tinkering to get IIS configured correctly to serve the web app and WCF web service. MDS has a configuration utility that feel somewhat similar to the SSRS configuration utility. It was pretty easy to use although it has an option to enable the web service yet you still have to change the web.config file to actually enable it.
The web-based user interface is on the right track but it is unbearably slow. It can take 15-30 seconds to respond to a simple request. I have seen others complain and responses that say that once the application caches some data things should speed up. I have seem some improvement but it's still way too slow. There are also no indicators letting you know that the application is doing something so many times I found myself wondering if I had even clicked something.
I also found myself trying to remember where to go to do certain tasks. I couldn't quite put my finger on it but the UI just doesn't seem intuitive and is definitely inconsistent in some areas. For example, many pages have data grids that look like they are for display only but if you click on a row you are presented with add, edit, and delete buttons. Additionally, some pages require the user to double-click on on a cell in a grid to edit the value. It took a while for me to figure this out as there was no indication that this was the expected behavior.
The System Administration area of the application is where models, entities, attributes, etc. are created and maintained.
I found it easy enough to create entities and attributes although there seems to be a lack of data types for attributes. The biggest shortfall for me being the lack of a boolean data type. The workaround to this is to create a domain-based attribute with values of yes/no or true/false. Seems kind of lame to me.
There are also the concepts of derived hierarchies versus explicit hierarchies, as well as collections. It was unclear to me when to use which feature.
Attribute groups is a nice feature that allows you to group certain types of attributes such as address type attributes which ultimately groups these attributes on different tabs for end-users to maintain.
The business rules functionality is pretty nice. It lets you define complex, row-level validation and data maintenance but doesn't allow you to access any aggregate or related values in your logic. One of the actions that can be triggered is a workflow. This workflow is a SharePoint 2010 WF based workflow. My feelings are that there's not much special here as you have to develop a customer WF component and deploy it to your SharePoint server to make it work. All that MDS provides is the ability to call this workflow. This can be done with just about any application. It falls short for me.
As much as I want a free tool to ease my workload, MDS just doesn't feel ready for prime-time although I'm on the fence as to whether I would use it in an environment that had no other tool and the ability to install it on a machine in which a SQL Server license was already paid for. It should be noted that a new version will be released with SQL Server 2012 and promises to be a substantial improvement over the current version.