In this tutorial, we will cover what Lead Scoring is, why it’s important, and a step-by-step example of how to build Lead Scoring Analysis in SQL. 

What is Lead Scoring?

Lead scoring is a technique that can rank prospective customers (leads) based on their value to the company. This ranking can be based on the likelihood to become a customer (or close) or the expected value as a customer. Using this ranking, the company can prioritize those prospects most likely to become customers in the near term.

There are multiple ways to implement lead scoring. First, the data can come from:

  • Just CRM
    • This method just considers the information about business
    • This is the approach we will take in this tutorial
  • CRM plus marketing and web analytics
    • This method allows for further understanding of how the lead is engaging with the company
  • CRM, marketing, web analytics, and product analytics
    • This method is appropriate for product-led growth (PLG) approaches that give the prospect the opportunity to try the product before sales gets involved.

Beyond this, there are two approaches to using the data from above. First, a set of rules can be created to identify which leads to rank highly. For example, this could involve defining the ideal customer profile (ICP) and targeting those leads that match and disqualifying the rest. Second, once enough leads have been converted to customers, a machine learning model can be built to create the lead score. In either case, the data needed is the same.

Why Lead Scoring is important?

Identifying those leads that are most likely to close in the near term allows the company to focus on those prospects with the most potential. This allows for increased sales efficiency and effectiveness. This increase in efficiency and effectiveness allows sales to grow revenue without the need for additional sales executives.

Beyond sales effectiveness, understanding those customers most likely to close informs the marketing team on what kind of customers, characteristics, etc. identify the best prospects. This information can be used to build campaigns and target the correct prospects with both inbound and outbound marketing.

Finally, by tying the marketing campaign design and targets to the lead scoring, it ensures that sales and marketing remain aligned on the best type of customer for the business. This means that marketing qualified leads (MQLs) are more likely to be good inputs into the sales process than they would be otherwise.

Data needed for Lead Scoring

In the version of lead scoring we are considering in this tutorial, we are only concerned with data about the lead from the CRM tool. This could be data stored in Salesforce, Hubspot, Dynamics, or another CRM tool.  We need to combine three pieces of data to create the lead scoring data to be used.

  • Customer contact details
    • Customer Leads to connect to opportunities
  • Customer opportunities details
  • Customer Accounts

How to create Lead Scoring in SQL?

In this section, we will show how to build the Lead Scoring data using SQL. We will assume the data comes from Salesforce. We will need to clean these datasets and join the data together. We can join the three tables together, keeping only the relevant fields from each. These can be joined on:

  • Accounts and Opportunities
    • account_ID in Opportunities to ID in Accounts
  • Leads and Opportunities
    • opportunity_ID in Leads and ID in Opportunities
  • Leads and Contacts
    • contact_ID in Leads and ID in Contacts

For the actual scoring of leads, this scoring will be done as new contacts are added to the system. The new contacts will connect to the account based on account_id and the same data can be generated in SQL.

Clean data

First, when records are deleted from tools like Salesforce, they are often not actually removed from the system but have a flag set to tell the CRM tool to ignore the record. We will want to ignore these by only including data where

Is_deleted = ‘false’

Define outcomes

We want to tag those accounts that were won or lost. However, most of the records are still in the sales process, so we will tag those with a NULL value.

    WHEN is_closed = 'true' AND stage_name = 'Closed Won' THEN 1  
    WHEN is_closed = 'true' AND stage_name = 'Closed Lost' THEN 1 
END AS closed_won


Using the information above, the final data can be easily attached as

o.economic_buyer_c, AS company_name,
a.industry, AS company_name,
    WHEN is_closed = 'true' AND stage_name = 'Closed Won' THEN 1  
    WHEN is_closed = 'true' AND stage_name = 'Closed Lost' THEN 1 
END AS closed_won
FROM opportunities AS o
LEFT JOIN accounts AS a
ON o.account_ID = a.ID
LEFT JOIN leads as l
ON o.ID = l.opportunity_ID
LEFT JOIN contacts as c
ON l.contact_ID = c.ID

We have now created the key metrics for building our understanding of lead scoring. These can be visualized with any BI tool to allow the business to view which types of customers they should focus on. This understanding can drive the development of their ICP and allow sales and marketing to focus on the right prospects. Further, as the business grows and sees more closed (and lost) deals, a predictive model can be developed to better rank prospects.


Lead scoring is necessary to allow sales to focus on the most promising leads and marketing to target those prospects that are most likely to convert. Since the data needed to understand this is stored in multiple tables, SQL can be used to merge it together and make it available for analysis.

No-code/low-code data prep and visualization