In this tutorial we will cover how to join data using SQL, why it is important, and a step-by-step example.

What is Join and why is it important?

SQL Joins are one of the most common functions used in any form of SQL. Somehow, the data you need to analyze is never contained in a single table (thanks, Ralph). Joins allow the user to join different data tables together so that different types of data can be combined and better insights can be generated. 

One example of a business question that requires joining data is understanding how offering a discount is impacting both sales and profitability for your company. By joining these two datasets together the user will know how much to discount the product to maximize sales numbers and keep substantial profit for the company.

Join Example:

In this analysis, we will be using the AdventureWorks sample data and considering internet sales (FACTINTERNETSALES table). We will create a dashboard to show total sales amount, average discount and total discount amount per week.

Imagine: when you first load the internet sales data into your BI tool (Tableau, Looker or Power BI), you discover that despite the internet sales data containing a promotion key, the discount price percent and discount amount are always zero. We can fix this by joining the DIMPROMOTION table to the sales data. This is a common problem among most of the different dashboard tools.

Although dashboard tools can usually join two tables this is often not the best option when dealing with large amounts of data. It is often significantly faster to perform that join in the data warehouse and only pull the final data into the dashboard. Below we discuss how to do this directly in our SQL select statement.

SQL:

In SQL, we will join the FACTINTERNETSALES and DIMPROMOTION data on the key PROMOTIONKEY as:


SELECT *
FROM FACTINTERNETSALES AS sales
LEFT JOIN DIMPROMOTION AS promo
ON sales.PROMOTIONKEY = promo.PROMOTIONKEY

We perform a left join as we only want to include records that have sales. This means we want to keep all transactions from the sales table whether or not they have a matching promotion, but we do not include any promotions that were never applied to a sale.

Alternatively, if we were exploring promotion behavior, we might use a right join as we wouldn’t be interested in sales without a promotion, but would like to keep track of those promotions that didn’t have any sales.

This SQL is not complete as this will pull every column from the promotion table into our final result. We only care about the promotion name and discount percent for each promotion. In addition, since we have the discount percent, we can calculate the actual discount amount and actual sales amount in the same query.


SELECT sales.*
, promo.ENGLISHPROMOTIONNAME
, promo.DISCOUNTPCT
, FLOOR(sales.SALESAMOUNT*promo.DISCOUNTPCT, 2) AS ACTDISCOUNTAMOUNT
, CEIL(sales.SALESAMOUNT*(1-promo.DISCOUNTPCT), 2) AS ACTSALESAMOUNT
FROM FACTINTERNETSALES AS sales
LEFT JOIN DIMPROMOTION AS promo
ON sales.PROMOTIONKEY = promo.PROMOTIONKEY

We apply FLOOR on the discount amount as we round the discount amount down to the nearest penny. Similarly, CEIL is applied to the sales amount because of the above rounding down of the discount amount.

Conclusion: 

By following the steps above you have combined the power of Join and the power of SQL to transform data directly in your cloud data warehouse. Please feel free to reach out to a member of our team if you have any additional questions.

No-code/low-code data prep and visualization

Request Demo
Try for Free