A few things to keep in mind when deciding whether you want a unique index or constraint.
1) The index creation options other than FILLFACTOR that are available for a unique index are not available for a unique constraint.
2) A unique key can be referenced by a foreign key constraint but a column with a unique index cannot be referenced by a foreign key constraint.
3) A less subtle difference is related to the timing of validation. Constraints are checked before indexes and this can lead to a large multi-row insert/select or update to fail before modification. Often times indexes are validated at the end of a large modification. As a result it will take longer for a failure to occur with a rollback at the end of the modification.
SELECT [tips tricks and commentary] FROM [my experiences] WHERE subject IN ('database', 'data warehouse', 'business intelligence')
Showing posts with label Referential Integrity. Show all posts
Showing posts with label Referential Integrity. Show all posts
Wednesday, June 16, 2010
Sunday, February 28, 2010
Clustered Index Doesn't Have to be the Primary Key
Believe it or not, you don’t have to cluster your primary keys.
Most SQL Server mortals assume that their primary keys should be clustered because that is the default behavior of SQL Server Management Studio when using the designer to create a table. Unfortunately, the primary key is not always the best candidate for the one and only clustered index you're allowed on a table.
I won't go into detail here about what a clustered index is but I'll say that it is essentially your table and the order in which your table rows/records/tuples are stored is based on the column(s) used in your clustered index.
In an OLTP database where your pimary keys consist of a single integer identity column you're usually in good shape if your clustered index is also your primary key.
However, if you use composite and/or "intelligent" keys you're going to experience a number of performance problems. I won't expound on these problems here because I almost never use intelligent or composite keys in an OLTP system and don't believe they should be used in such systems.
The place I most often have a non-primary key clustered index is in an OLAP fact table. OLAP systems are more concerned about high performance reads of blocks of data as opposed to reads and writes of individual rows as is the case in transactional systems.
In a data warehouse/OLAP database date is often an attribute used in most analyses. By creating a clustered index on the most important date in a fact table (i.e. order date in an order table), you enable better I/O and therefore faster query response times for the majority of queries executed.
Some might argue that the unique surrugate key of the fact table (if you even have one) should have a clustered index. Hmmm . . . how often do you query a fact table and have a surrogate key based predicate? The answer should probably be never (although I almost never say never)!
How about writes? Fact tables are often written to based on a date. By clustering on date we typically have sequential writes to the disk. When we load today's data we end up adding data to the end of the last data page for the table because today's date comes after yesterday's date. If we were to cluster on some kind of an intelligent key we'd end up having to insert rows into pre-existing data pages which leads to much higher I/O and therefore decreased performance.
How about archiving and partitioning? In most OLAP scenarios the business requirement is to have something like 25 months worth of data available in the system at all times. It's often beneficial for performance and disk expense reasons to take old data offline. The best way to do this in a SQL Server environment is to partition your fact tables using a year/month based partition function. For partitioning to be successful it is necessary for your partition column to be part of the clustered index. If you cluster on your date column and partition on your date column as well, you're all set.
In short, if you're working in a data warehouse/OLAP environment pay careful attention to what column(s) you create your clustered indexes on. It can make or break the performance of your solution!
Most SQL Server mortals assume that their primary keys should be clustered because that is the default behavior of SQL Server Management Studio when using the designer to create a table. Unfortunately, the primary key is not always the best candidate for the one and only clustered index you're allowed on a table.
I won't go into detail here about what a clustered index is but I'll say that it is essentially your table and the order in which your table rows/records/tuples are stored is based on the column(s) used in your clustered index.
In an OLTP database where your pimary keys consist of a single integer identity column you're usually in good shape if your clustered index is also your primary key.
However, if you use composite and/or "intelligent" keys you're going to experience a number of performance problems. I won't expound on these problems here because I almost never use intelligent or composite keys in an OLTP system and don't believe they should be used in such systems.
The place I most often have a non-primary key clustered index is in an OLAP fact table. OLAP systems are more concerned about high performance reads of blocks of data as opposed to reads and writes of individual rows as is the case in transactional systems.
In a data warehouse/OLAP database date is often an attribute used in most analyses. By creating a clustered index on the most important date in a fact table (i.e. order date in an order table), you enable better I/O and therefore faster query response times for the majority of queries executed.
Some might argue that the unique surrugate key of the fact table (if you even have one) should have a clustered index. Hmmm . . . how often do you query a fact table and have a surrogate key based predicate? The answer should probably be never (although I almost never say never)!
How about writes? Fact tables are often written to based on a date. By clustering on date we typically have sequential writes to the disk. When we load today's data we end up adding data to the end of the last data page for the table because today's date comes after yesterday's date. If we were to cluster on some kind of an intelligent key we'd end up having to insert rows into pre-existing data pages which leads to much higher I/O and therefore decreased performance.
How about archiving and partitioning? In most OLAP scenarios the business requirement is to have something like 25 months worth of data available in the system at all times. It's often beneficial for performance and disk expense reasons to take old data offline. The best way to do this in a SQL Server environment is to partition your fact tables using a year/month based partition function. For partitioning to be successful it is necessary for your partition column to be part of the clustered index. If you cluster on your date column and partition on your date column as well, you're all set.
In short, if you're working in a data warehouse/OLAP environment pay careful attention to what column(s) you create your clustered indexes on. It can make or break the performance of your solution!
Wednesday, January 6, 2010
Should Foreign Key Columns be Indexed?
People seem to be confused when it comes to indexing of foreign key columns. For one, I believe some people assume that when you create a foreign key constraint (in SQL Server) an index is also created on the column(s) that make up the key/constraint. This is not the case.
That leads us to the original question . . . Should we create indexes on our foreign keys??
I believe that in most cases the answer is yes.
There are a few things to consider here:
1) Constraint checks
2) Cascading Updates and Deletes
2) Joins
1) If you delete a row in a table whose primary key or unique constraint is referenced by one or more foreign keys, SQL Server will search the foreign key tables to determine if the row can be deleted (assuming you have no cascade behavior defined). An index on the foreign key column(s) will speed this search.
2) The same is also true for update and delete cascades. If either are defined, the child/foreign key rows must be identified and updated and/or deleted. Again, the index helps find these rows more efficiently.
3) More often than not foreign key columns will be joined to primary key columns in queries. Joins are always more efficient when an index exists on both sides of the join.
If you ask me, it's a no-brainer to index your foreign key columns. How often do you have a database with foreign key relationships that don't fall under one of the 3 above scenarios??
That leads us to the original question . . . Should we create indexes on our foreign keys??
I believe that in most cases the answer is yes.
There are a few things to consider here:
1) Constraint checks
2) Cascading Updates and Deletes
2) Joins
1) If you delete a row in a table whose primary key or unique constraint is referenced by one or more foreign keys, SQL Server will search the foreign key tables to determine if the row can be deleted (assuming you have no cascade behavior defined). An index on the foreign key column(s) will speed this search.
2) The same is also true for update and delete cascades. If either are defined, the child/foreign key rows must be identified and updated and/or deleted. Again, the index helps find these rows more efficiently.
3) More often than not foreign key columns will be joined to primary key columns in queries. Joins are always more efficient when an index exists on both sides of the join.
If you ask me, it's a no-brainer to index your foreign key columns. How often do you have a database with foreign key relationships that don't fall under one of the 3 above scenarios??
Subscribe to:
Posts (Atom)