Earlier, we have seen how to calculate statistical median and mean in SQL Server. Now we will see how to calculate mode.
In mathematical statistics, the value which appears most often in the given data set is called a mode of the data set. Based upon the number of modes, a data set can have one mode (unimodal) or multiple modes (bimodal, trimodal or multimodal).
SQL Server does not have a specific function to calculate mode. However, there are several workarounds to find the mode of the given data set in SQL Server. Let us see one of the workarounds which I am using for my projects.
Create table and data set
For this illustration, to start with, let us create a table with 3 sets of data. There are 3 customers, each one has 10 values (amount) against them. We have to find the mode amount of each customer.
CREATE TABLE [dbo].[MTB_Statistics](
[CustomerID] [int] NOT NULL,
[Amount] [numeric](18, 2) NOT NULL
) ON [PRIMARY]
GO
INSERT INTO [dbo].[MTB_Statistics] VALUES
(1, 100), (1, 100), (1, 100), (1, 400), (1, 500),
(1, 600), (1,600), (1, 800), (1, 800), (1, 1000),
(2, 10), (2, 20), (2, 30), (2, 30), (2, 30),
(2, 60), (2,80), (2, 80), (2, 30), (2, 100),
(3, 4), (3, 1), (3, 6), (3, 2), (3, 6), (3, 5),
(3, 5), (3, 12), (3, 7), (3, 7)
GO
SELECT * FROM [dbo].[MTB_Statistics]
GO
CustomerID Amount
----------- ---------------------------------------
1 100.00
1 100.00
1 100.00
1 400.00
1 500.00
1 600.00
1 600.00
1 800.00
1 800.00
1 1000.00
2 10.00
2 20.00
2 30.00
2 30.00
2 30.00
2 60.00
2 80.00
2 80.00
2 30.00
2 100.00
3 4.00
3 1.00
3 6.00
3 2.00
3 6.00
3 5.00
3 5.00
3 12.00
3 7.00
3 7.00
(30 rows affected)
Calculate Mode
Step 1 (Find Frequency):
The first step is to find the frequency of the amounts paid by each customer. I have used COUNT() and GROUP BY to get the frequencies.
SELECT
CustomerID,
Amount,
COUNT(Amount) AS [Frequency]
FROM [dbo].[MTB_Statistics]
GROUP BY CustomerID, Amount
ORDER BY CustomerID, 3 DESC
GO
If you notice, customer 1 has paid $100 three times. So, $100 is the mode for this customer. Likewise, customer 2 has paid $30 two times. Another customer 3 has paid $5 , #6 and $7 two times each. So, this customer has multiple modes.
Step 2 (Apply Rank):
The next step is to apply a ranking factor to find the highest frequency of amount for each customer. Here I have used the RANK() function to apply ranking to the result set.
SELECT
CustomerID,
Amount,
COUNT(Amount) AS [Frequency],
RANK() OVER (PARTITION BY CustomerID ORDER BY COUNT(Amount) DESC) As [Rank]
FROM [dbo].[MTB_Statistics]
GROUP BY CustomerID, Amount
ORDER BY CustomerID, 3 DESC
GO
In the result set, you can find the modes are ranked as 1 followed by the others. Now it is easy to get the modes alone.
Step 3 (List only the Modes):
Finally, I have converted the above select statement into a CTE to get only the 1st ranked rows, i.e. the mode alone for each customer.
WITH CTE_Statistics AS
(
SELECT
CustomerID,
Amount,
COUNT(Amount) AS [Frequency],
RANK() OVER (PARTITION BY CustomerID ORDER BY COUNT(Amount) DESC) As [Rank]
FROM [dbo].[MTB_Statistics]
GROUP BY CustomerID, Amount
)
SELECT
CustomerID,
Amount,
[Frequency],
[Rank]
FROM CTE_Statistics
WHERE [Rank] = 1
GO
Reference
- Read more about statistical mode @ Wikipedia.