Union vs Union All – Usage and Performance In SQL Server

Today we’ll see the difference between UNION vs UNION ALL set operators and especially the performance difference between them.

UNION vs UNION ALL in SQL Server

Union

UNION is a set operator which will combine multiple select queries and returns a single result set. The single result set will have the results from all the select queries which are combined with UNION. Result set returned by the union of select queries will ignore the duplicate rows and results only the distinct rows. Here is a example of UNION. To start with create two tables with equal number of columns and of same data type.

/* Create the first table */
Create table MTB_Table_A (Employee_ID char(10), Employee_Name varchar(50))


/* Insert rows to the first table */
Insert into MTB_Table_A values ('A002', 'William')
Insert into MTB_Table_A values ('A003', 'Jacob')
Insert into MTB_Table_A values ('A004', 'Tyler')
Insert into MTB_Table_A values ('A005', 'Emma')
Insert into MTB_Table_A values ('A006', 'Ryan')
Insert into MTB_Table_A values ('A007', 'Ashley')
Insert into MTB_Table_A values ('A008', 'Brayden')
Insert into MTB_Table_A values ('A009', 'Olivia')
Insert into MTB_Table_A values ('A010', 'Michael')
Insert into MTB_Table_A values ('A011', 'Emily')

select * from MTB_Table_A

/* Create the second table */
Create table MTB_Table_B (Employee_ID char(10), Employee_Name varchar(50))

/* Insert rows to the first table */
Insert into MTB_Table_B values ('A002', 'William')
Insert into MTB_Table_B values ('B003', 'Michael')
Insert into MTB_Table_B values ('A004', 'Tyler')
Insert into MTB_Table_B values ('B005', 'Abigail')
Insert into MTB_Table_B values ('A006', 'Ryan')
Insert into MTB_Table_B values ('B007', 'Isabella')
Insert into MTB_Table_B values ('A008', 'Brayden')
Insert into MTB_Table_B values ('B009', 'Emily')
Insert into MTB_Table_B values ('A010', 'Michael')
Insert into MTB_Table_B values ('B011', 'Makayla')

select * from MTB_Table_B

Now use UNION to combine select statements for the two tables.

SELECT * FROM MTB_Table_A
UNION
SELECT * FROM MTB_Table_B

Even though each table has 10 rows, the result set of the union statement has 15 rows. This is because UNION operator returned only the distinct rows ignoring the 5 duplicate rows.

Union in SQL Server

Union All

UNION ALL is just the UNION set operation along with ALL option. The result set of select queries combined with UNION ALL will include the duplicate rows. For example if Table_A has 10 rows and Table_B has 7 rows and 5 rows in both the tables are duplicates, the UNION All statement will return 17 rows including the duplicates. Here is an example of UNION ALL.

Using the tables we have for the above example, if you run a UNION ALL statement, the result set will have all the rows from both tables including the duplicates.

SELECT * FROM MTB_Table_A
UNION ALL
SELECT * FROM MTB_Table_B

Union All in SQL Server

Performance Comparison of Union vs Union All

Using Extended Events

To compare the performance of UNION and UNION ALL, we will start with a simple test using the Extended Events and compare the time consumed by each statement. Create and start an extend event session to track and log the queries running against the database you are using. Run the UNION and UNION ALL statements and see the events captured by the Xevent session. For the simple query I’ve used above, the time taken by the UNION statement is much higher that the UNION ALL statement. See the details in the below screen shots.

UNION

Performance UNION duration

UNION ALL

Performance UNION ALL duration

Using Actual Execution Plan Tool

Now we’ll compare the statements using the Actual Execution Plan Tool. If you compare the actual execution plan for both the statements, you can notice that statement with just UNION will be the costliest. This is because of the Distinct Sort operation. Distinct sort is an expensive operation.

Performance UNION vs UNION ALL

You cannot ignore UNION just because it’s expensive than UNION ALL. Both these operators have their own usage and functionalities. If you are very confident that the result set will not have any duplicate rows then you can use UNION ALL instead of UNION. If you want only distinct rows and you are not sure whether the tables have duplicates or not, then it’s safe to go for UNION.

Union vs Union All

#

UNION

UNION ALL

1 UNION will return the distinct rows eliminating the duplicates from the tables. UNION ALL returns all the rows for all the combined tables including duplicates.
2 Considering performance, UNION is slower as it uses distinct sort operation to eliminate duplicates. UNION ALL is faster as it won’t care about duplicates and selects all the rows from the involved tables.
3 Use UNION if you need to eliminate duplicate rows from result set. Use UNION ALL if you it is OK for you to have duplicate rows in the result set.

Reference


Leave your thoughts...

This site uses Akismet to reduce spam. Learn how your comment data is processed.