In this tutorial we will cover why it is important to remove duplicate values, an example and a step-by-step example using SQL.

## What are Running Totals and why are they important?

In analyzing data, we often want to calculate the total of some metric or key performance indicator (KPI) as of a certain date. These running totals are simply the sum of all the prior values up to and including the current one.

## Example:

In this example we will be looking at sales data for different stores at different dates. As seen in the table called SALES.

 Date Store Amount 1/12/2019 A \$48.90 1/15/2019 B \$80.15 1/20/2019 A \$57.13 1/26/2019 B \$22.37 1/28/2019 A \$20.23 2/5/2019 A \$63.83 2/14/2019 B \$42.65 2/24/2019 B \$49.94 3/4/2019 A \$44.50 3/5/2019 B \$33.09

Then the running total would look like:

 Date Store Amount 1/12/2019 A \$48.90 1/15/2019 B \$129.04 1/20/2019 A \$186.17 1/26/2019 B \$208.55 1/28/2019 A \$228.77 2/5/2019 A \$292.61 2/14/2019 B \$335.26 2/24/2019 B \$385.19 3/4/2019 A \$429.69 3/5/2019 B \$462.78

However, we could also calculate the running total by store as:

 Date Store Amount 1/12/2019 A \$48.90 1/15/2019 B \$80.15 1/20/2019 A \$106.03 1/26/2019 B \$102.52 1/28/2019 A \$126.25 2/5/2019 A \$190.08 2/14/2019 B \$145.17 2/24/2019 B \$195.11 3/4/2019 A \$234.58 3/5/2019 B \$228.20

## SQL over Excel:

Running totals over all rows are easy to compute using SUM once the data is loaded into an Excel spreadsheet. Calculating the running total independently requires the use of SUMIF. However, in many problems faced in a business setting, the data exists in a data warehouse and needs to be extracted to the Excel spreadsheet before analysis can begin. At best, this is an added step and wastes time downloading the data. At worst, this can require filtering in the database as there is too much data for Excel to handle. In either case, it is easy to perform these running total calculations directly in the data warehouse and just return the result.

## In SQL:

WINDOW functions allow SQL to use the value of a column from multiple rows in a table to generate the result for a single row. This means that a WINDOW function can be used to calculate the running totals. The basic form of a WINDOW function is

``````
() OVER ([PARTITION BY ] ORDER BY )
``````

In our case, <FUNCTION> is SUM, the column we want to sum is Amount, <order list> should be just date and in the case of a running total over the entire table, the PARTITION BY is not needed. This generates the following WINDOW function:

``````
SUM(Amount) OVER (ORDER BY Date)
``````

In the case of the running total for each store separately, the <partition list> is Store and the SQL would be:

``````
SUM(Amount) OVER (PARTITION BY Store ORDER BY Date)
``````

Each of these can be included in a SELECT statement to return both the original data and both running totals.

``````
SELECT Date, Store, Amount,
SUM(Amount) OVER (ORDER BY Date) AS running_total,
SUM(Amount) OVER (PARTITION BY Store ORDER BY Date) AS total_by_store
FROM SALES
``````

This SQL will generate the table

 Date Store Amount Running_Total Total_By_Store 1/12/2019 A \$48.90 \$48.90 \$48.90 1/15/2019 B \$80.15 \$129.04 \$80.15 1/20/2019 A \$57.13 \$186.17 \$106.03 1/26/2019 B \$22.37 \$208.55 \$102.52 1/28/2019 A \$20.23 \$228.77 \$126.25 2/5/2019 A \$63.83 \$292.61 \$190.08 2/14/2019 B \$42.65 \$335.26 \$145.17 2/24/2019 B \$49.94 \$385.19 \$195.11 3/4/2019 A \$44.50 \$429.69 \$234.58 3/5/2019 B \$33.09 \$462.78 \$228.20

## Conclusion:

Running totals are commonly needed when analyzing this data. They can be easily generated using WINDOW functions in SQL. These WINDOW functions can be used to generate running totals based on multiple conditions.

By following the steps above you have combined the power of calculating running totals and SQL to transform data directly in your cloud data warehouse. Please feel free to reach out to a member of our team if you have any additional questions.

Try for Free