Monday, June 21, 2010

DISTINCT or GROUP BY?

GROUP BY and DISTINCT can return the same results when looking for uniqueness in a resultset. Consider the Sales table below.

Store     Product        Date

-------------------------------------
A          Monopoly    1/10/2010
B          Connect 4    1/12/2010
A          Monopoly    1/14/2010
A          Battle Ship   1/16/2010
B          Othelo         1/16/2010
 
If you wanted to know what games each store sold from 1/10/2010 through 1/16/2010 you could write either of the following queries:

SELECT DISTINCT
    Store,
    Product
FROM
    Sales

-OR-

SELECT
    Store,
    Product
FROM
    Sales
GROUP BY
    Store,
    Product

So which one?? I hate to admit it but back when I first got into SQL performance tuning I was working in an Oracle 8i environment. If I remember correctly, GROUP BY performed better than DISTINCT for one reason or another. Well in the days of SQL Server 2005 and 2008, GROUP BY and DISTINCT perform equivalently. Therefore, my rule of thumb is to use DISTINCT when no aggregations on additional, non-unique columns are necessary. If you just want unique combinations of store and game use distinct. If you want to know how many of each game were sold by store then use GROUP BY.

2 comments:

  1. Thanks good to know. Have you done some testing on tables with large amount of data or found articles that supports using distinct over group by?

    ReplyDelete
  2. GMAN - I only recommend DISTINCT over GROUP BY in situations where there is no aggregation being done, simply because it's more intuitive.

    I believe it's more intuitive because DISTINCT cannot be used in conjuction with aggregate functions so if you see a query with DISTINCT you know there are no aggregations to look for.

    As the original post states, "... SQL Server 2005 and 2008, GROUP BY and DISTINCT perform equivalently." I have confirmed this by looking at execution plans although I did not test with a variety of sizes of resultsets as I don't believe that would affect the plan (at least not related to whether you use DISTINCT or GROUP BY).

    ReplyDelete