How to calculate statistical mode in SQL Server?

Earlier, we have seen how to calculate statistical median and mean in SQL Server. Now we will see how to calculate mode.

In mathematical statistics, the value which appears most often in the given data set is called a mode of the data set. Based upon the number of modes, a data set can have one mode (unimodal) or multiple modes (bimodal, trimodal or multimodal).

SQL Server does not have a specific function to calculate mode. However, there are several workarounds to find the mode of the given data set in SQL Server. Let us see one of the workarounds which I am using for my projects.

Create table and data set

For this illustration, to start with, let us create a table with 3 sets of data. There are 3 customers, each one has 10 values (amount) against them. We have to find the mode amount of each customer.

CREATE TABLE [dbo].[MTB_Statistics](
	[CustomerID] [int] NOT NULL,
	[Amount] [numeric](18, 2) NOT NULL
) ON [PRIMARY]
GO

INSERT INTO [dbo].[MTB_Statistics] VALUES
	(1, 100), (1, 100), (1, 100), (1, 400), (1, 500), 
	(1, 600), (1,600), (1, 800), (1, 800), (1, 1000),
	(2, 10), (2, 20), (2, 30), (2, 30), (2, 30), 
	(2, 60), (2,80), (2, 80), (2, 30), (2, 100),
	(3, 4), (3, 1), (3, 6), (3, 2), (3, 6), (3, 5), 
	(3, 5), (3, 12), (3, 7), (3, 7)
GO

SELECT * FROM [dbo].[MTB_Statistics]
GO

CustomerID  Amount
----------- ---------------------------------------
1           100.00
1           100.00
1           100.00
1           400.00
1           500.00
1           600.00
1           600.00
1           800.00
1           800.00
1           1000.00
2           10.00
2           20.00
2           30.00
2           30.00
2           30.00
2           60.00
2           80.00
2           80.00
2           30.00
2           100.00
3           4.00
3           1.00
3           6.00
3           2.00
3           6.00
3           5.00
3           5.00
3           12.00
3           7.00
3           7.00

(30 rows affected)

Calculate Mode

Step 1 (Find Frequency):

The first step is to find the frequency of the amounts paid by each customer. I have used COUNT() and GROUP BY to get the frequencies.

SELECT 
	CustomerID, 
	Amount, 
	COUNT(Amount) AS [Frequency]
	FROM [dbo].[MTB_Statistics]
	GROUP BY CustomerID, Amount
	ORDER BY CustomerID, 3 DESC
GO

If you notice, customer 1 has paid $100 three times. So, $100 is the mode for this customer. Likewise, customer 2 has paid $30 two times. Another customer 3 has paid $5 , #6 and $7 two times each. So, this customer has multiple modes.

Calculate statistical mode in SQL Server

Step 2 (Apply Rank):

The next step is to apply a ranking factor to find the highest frequency of amount for each customer. Here I have used the RANK() function to apply ranking to the result set.

SELECT 
	CustomerID, 
	Amount, 
	COUNT(Amount) AS [Frequency],
	RANK() OVER (PARTITION BY CustomerID ORDER BY COUNT(Amount) DESC) As [Rank]
	FROM [dbo].[MTB_Statistics]
	GROUP BY CustomerID, Amount
	ORDER BY CustomerID, 3 DESC
GO

In the result set, you can find the modes are ranked as 1 followed by the others. Now it is easy to get the modes alone.

Calculate statistical mode and apply ranking

Step 3 (List only the Modes):

Finally, I have converted the above select statement into a CTE to get only the 1st ranked rows, i.e. the mode alone for each customer.

WITH CTE_Statistics AS
(
	SELECT 
		CustomerID, 
		Amount, 
		COUNT(Amount) AS [Frequency],
		RANK() OVER (PARTITION BY CustomerID ORDER BY COUNT(Amount) DESC) As [Rank]
		FROM [dbo].[MTB_Statistics]
		GROUP BY CustomerID, Amount
)
SELECT 
	CustomerID, 
	Amount, 
	[Frequency],
	[Rank]
	FROM CTE_Statistics
	WHERE [Rank] = 1
GO
Get old the modes

Reference

  • Read more about statistical mode @ Wikipedia.


Leave your thoughts...

This site uses Akismet to reduce spam. Learn how your comment data is processed.