Apparently, MS SQL 2014 has a new cardinality estimator. Goodie. Ah, what was the old one? And why is it a cardinal and not a blackbird or robin?
The Query Optimizer picks among one or more query plans to pick the ‘cheapest’ plan. Cheapness is based on CPU and I/O costs, and an important part of that is related to the number of rows of data returned at various steps of the query. The cardinality estimator determines, or rather guesses, this count as well as the distribution of values, value counts and duplicate counts. It uses statistics and heuristics to make the estimate.
If the CE overestimates the number of rows, the costs will be wrong and the right query might not be chosen, specifically
• Selection of a parallel plan when a serial plan might be more optimal.
• Inappropriate join strategy selection.
• Inefficient index navigation strategies (scan versus seek).
• Inflated memory grants.
• Wasted memory and unnecessarily throttled concurrency.
if the number of rows is underestimated
• The selection of serial plan when parallelism would have been more optimal.
• Inappropriate join strategies.
• Inefficient index selection and navigation strategies.
Kendra Little at BrentOzar.com has a query that does not optimize well for 2012 and earlier. It is designed to confuse the Optimizer and estimates it will return millions of rows. The 2014 version handles it well.
Upgraded servers or databases that are migrated or restored from an older version will not use the new CE.
Paul White (http://sqlperformance.com/2014/01/sql-plan/cardinality-estimation-for-multiple-predicates) has several nice articles about this, including getting histogram data directly from the statistics (“When a predicate makes a simple comparison between a column and a scalar value, the chances are good that the cardinality estimator will be able to derive a good quality estimate”). Multiple column predicates (where A and B) can be equally exact except that we can’t tell from the histograms how much overlap there is between a and b, or rather there are rows that satisfy only A, which satisfes only B and where both A and B are true in the row are are true.
In the old CE, we assume that the two attributes are distributed independently of each other. In this case, if selectivity is the fraction of returned values divides by the total values, over all selectivity is s(A)*s(B) for conjustion (AND) or s(A) + s(B) – (s(A)*s(B))
— uses the new CE on MS SQL 2014 – level 120 is the latest version level.
ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=120;
— uses the old CE
ALTER DATABASE AdventureWorks2012 SET COMPATIBILITY_LEVEL=110;