In this tutorial we will cover what a pivot table is, the strengths and limitations of pivot, and a step-by-step example of how to perform Pivot in SQL.
Pivot Tables are a powerful tool for analyzing and reporting metrics on large amounts of data. Pivots use a variety of summary functions to condense data into easily digestible tables. These tables are commonly used to show either the trend of value over time or relationships between that value and multiple columns. Due to its ability to accurately display trends, data teams will use Pivot Tables for sales, marketing, business performance and a number of other business related reports.
Consider a table containing the scores for a student's individual assignments in three classes. We want to know how the student is doing each month, so we want to calculate the student's average score in each class in each month. A pivot table (in Excel) or pivot in SQL does this by rotating (or pivoting) the data of individual scores to create a column for each class and a row for each month containing the average score for each class.
The aggregation (or pivot) of the above data to the sum of the scores for each class for each month gives us the following table:
Unfortunately, often, the actual date is stored instead of the month like above. In this case, we will need to extract the month from the date to build the above pivot table.
And we need to generate the same summary table shown above.
While Excel pivot tables are powerful, they are limited to data that fits in Excel and don’t work well with other custom transformations. The benefit of running your pivot in SQL is you can analyze large data sets directly in your cloud data warehouse, and quickly iterate over the full range of data instead of exporting it to Excel for processing.
If your data lives in a cloud data warehouse, you’ll benefit from using SQL for your data transformation. Here’s a step-by-step guide on how to achieve this.
While the pivot function is not part of the ANSI standard syntax, most modern databases support a PIVOT function. Snowflake, Redshift, Microsoft SQL Server and BigQuery all support the same syntax:
How do we set these to generate the data we are interested in? Starting with the PIVOT syntax:
Notice the SQL Pivot command requires us to explicitly list the columns (Classes) we want the average score calculated for.
Nowhere in this have we specified that we want to return this average by month. This is done when we set up the <SQL COMMAND THAT RETURNS THE DATA>. In most SQL SELECT statements, this statement after FROM would just be a tablename. In this case, we need to create a subquery (a SELECT statement itself) that is used to select only the data needed for the PIVOT operation. The Pivot command uses any columns returned that are not referenced in the PIVOT statement to generate and label the rows that are returned by the final query. In our case, this should be Month. This means that we should only select the columns we want included in the final result.
This code will create a row for each date in the data. To aggregate the data to month we need to change the SQL slightly:
Unfortunately, the code to generate month from a date differs on different databases. In Snowflake, this would be:
In BigQuery this would be:
While Microsoft SQL Server doesn’t have a similar function, so the month can be created as:
In this example, if we wanted to add a filter, we could create an optional where clause in this subquery.
For the <Columns To Return>, we could return “*” as we want all the columns created by this pivot table. This is not considered SQL best practice and it is better to explicitly return the columns of interest. So while we can call (in Snowflake with minor modifications to the month definition in other flavors of SQL)
It would be better to explicitly name the columns to be returned (and since Snowflake will name the columns with single quotes by default, we may want to rename them anyway). To do this use:
By following the steps above you have combined the power of pivot tables with the power of SQL to transform data directly in your cloud data warehouse. Please feel free to reach out to a member of our team if you have any additional questions.