In this tutorial we will cover why it is important to remove duplicate values, an example and a step-by-step example using SQL.
In analyzing data, we often want to calculate the total of some metric or key performance indicator (KPI) as of a certain date. These running totals are simply the sum of all the prior values up to and including the current one.
In this example we will be looking at sales data for different stores at different dates. As seen in the table called SALES.
Then the running total would look like:
However, we could also calculate the running total by store as:
Running totals over all rows are easy to compute using SUM once the data is loaded into an Excel spreadsheet. Calculating the running total independently requires the use of SUMIF. However, in many problems faced in a business setting, the data exists in a data warehouse and needs to be extracted to the Excel spreadsheet before analysis can begin. At best, this is an added step and wastes time downloading the data. At worst, this can require filtering in the database as there is too much data for Excel to handle. In either case, it is easy to perform these running total calculations directly in the data warehouse and just return the result.
WINDOW functions allow SQL to use the value of a column from multiple rows in a table to generate the result for a single row. This means that a WINDOW function can be used to calculate the running totals. The basic form of a WINDOW function is
In our case, <FUNCTION> is SUM, the column we want to sum is Amount, <order list> should be just date and in the case of a running total over the entire table, the PARTITION BY is not needed. This generates the following WINDOW function:
In the case of the running total for each store separately, the <partition list> is Store and the SQL would be:
Each of these can be included in a SELECT statement to return both the original data and both running totals.
This SQL will generate the table
Running totals are commonly needed when analyzing this data. They can be easily generated using WINDOW functions in SQL. These WINDOW functions can be used to generate running totals based on multiple conditions.
By following the steps above you have combined the power of calculating running totals and SQL to transform data directly in your cloud data warehouse. Please feel free to reach out to a member of our team if you have any additional questions.