In this tutorial we will cover what a pivot table is, the strengths and limitations of pivot, and a step-by-step example of how to perform Pivot in SQL. 

What are Pivot Tables?

Pivot Tables are a powerful tool for analyzing and reporting metrics on large amounts of data. Pivots use a variety of summary functions to condense data into easily digestible tables. These tables are commonly used to show either the trend of value over time or relationships between that value and multiple columns. Due to its ability to accurately display trends, data teams will use Pivot Tables for sales, marketing, business performance and a number of other business related reports. 

Pivot Example:

Consider a table containing the scores for a student's individual assignments in three classes. We want to know how the student is doing each month, so we want to calculate the student's average score in each class in each month. A pivot table (in Excel) or pivot in SQL does this by rotating (or pivoting) the data of individual scores to create a column for each class and a row for each month containing the average score for each class.

Month Class Score
2019-04 A 10
2019-04 B 5
2019-04 A 6
2019-04 C 17
2019-04 B 25
2019-05 A 4
2019-05 B 6
2019-05 C 7
2019-05 A 8
2019-05 C 20

The aggregation (or pivot) of the above data to the sum of the scores for each class for each month gives us the following table:

Month A B C
2019-04 8 15 17
2019-04 8 15 17
2019-05 6 6 13.57

Unfortunately, often, the actual date is stored instead of the month like above. In this case, we will need to extract the month from the date to build the above pivot table.

Date Class Score
2019-04-01 A 10
2019-04-01 B 5
2019-04-10 A 6
2019-04-14 C 17
2019-04-28 B 25
2019-05-05 A 4
2019-05-05 B 6
2019-05-21 C 7
2019-05-22 A 8
2019-05-28 C 20

And we need to generate the same summary table shown above.

Why use SQL?

While Excel pivot tables are powerful, they are limited to data that fits in Excel and don’t work well with other custom transformations. The benefit of running your pivot in SQL is you can analyze large data sets directly in your cloud data warehouse, and quickly iterate over the full range of data instead of exporting it to Excel for processing.

SQL: 

If your data lives in a cloud data warehouse, you’ll benefit from using SQL for your data transformation. Here’s a step-by-step guide on how to achieve this. 

While the pivot function is not part of the ANSI standard syntax, most modern databases support a PIVOT function. Snowflake, Redshift, Microsoft SQL Server and BigQuery all support the same syntax:


SELECT COLUMNS_TO_RETURN
FROM SQL_COMMAND_THAT_RETURNS_THE_DATA
   PIVOT ( AGGREGATE_FUNCTION ( COLUMN_BEING_AGGREGATED )
            FOR COLUMN THAT CONTAINS NEW COLUMN CLASSES IN ( COLUMN CLASS 1 [ , COLUMN CLASS 2 ... ] ) )
OPTIONAL ORDER BY CLAUSE

How do we set these to generate the data we are interested in? Starting with the PIVOT syntax:

SQL Template Our Case
AGGREGATE_FUNCTION Any SQL aggregate function, in our case: AVG
COLUMN BEING AGGREGATED Score
COLUMN THAT CONTAINS NEW COLUMN CLASSES Class
COLUMN CLASS 1… ‘A’, ‘B’, ‘C’

Notice the SQL Pivot command requires us to explicitly list the columns (Classes) we want the average score calculated for. 

Nowhere in this have we specified that we want to return this average by month. This is done when we set up the <SQL COMMAND THAT RETURNS THE DATA>. In most SQL SELECT statements, this statement after FROM would just be a tablename. In this case, we need to create a subquery (a SELECT statement itself) that is used to select only the data needed for the PIVOT operation. The Pivot command uses any columns returned that are not referenced in the PIVOT statement to generate and label the rows that are returned by the final query. In our case, this should be Month. This means that we should only select the columns we want included in the final result.


SELECT DATE, COLUMN, SCORE 
FROM TABLE_NAME

This code will create a row for each date in the data. To aggregate the data to month we need to change the SQL slightly:


SELECT Function_to_create_month as MONTH, CLASS, SCORE 
FROM TABLE_NAME
Optional_Where_Clause>

Unfortunately, the code to generate month from a date differs on different databases. In Snowflake, this would be:


DATE_TRUNC('month', DATE) as month

In BigQuery this would be:


DATE_TRUNC(DATE, MONTH) as month

While Microsoft SQL Server doesn’t have a similar function, so the month can be created as:


DATEFROMPARTS(YEAR(DATE), MONTH(DATE), 1) as month

In this example, if we wanted to add a filter, we could create an optional where clause in this subquery.

For the <Columns To Return>, we could return “*” as we want all the columns created by this pivot table. This is not considered SQL best practice and it is better to explicitly return the columns of interest. So while we can call (in Snowflake with minor modifications to the month definition in other flavors of SQL)


SELECT *
FROM (SELECT DATE, COLUMN, SCORE FROM TABLE_NAME)
   PIVOT ( AVG (SCORE) FOR CLASS IN ( ‘A’, ‘B’, ‘C’ ) )
ORDER BY MONTH

It would be better to explicitly name the columns to be returned (and since Snowflake will name the columns with single quotes by default, we may want to rename them anyway). To do this use:


SELECT DATE, A, B, C
FROM (SELECT DATE, COLUMN, SCORE FROM TABLE_NAME)
   PIVOT ( AVG (SCORE) FOR CLASS IN ( ‘A’, ‘B’, ‘C’ ) )
(DATE, A, B, C)
ORDER BY MONTH

Instead.

Conclusion:

By following the steps above you have combined the power of pivot tables with the power of SQL to transform data directly in your cloud data warehouse. Please feel free to reach out to a member of our team if you have any additional questions.

No-code/low-code data prep and visualization

Request Demo
Try for Free