Broken Lifecycle

Data Science Failure Modes

Andrew Engel
Data Science Failure Modes

In last week's blog, we discussed how over 80% of data science projects never make it to production, how the few that do have a paltry ROI, and briefly introduced how inadequate strategies and processes related to the complex data science lifecycle contribute to these problems. So what are those strategic and process gaps that cause data science projects to fail? In our experience working with hundreds of data science teams, we’ve encountered 3 common failure modes:

  1. Too little time is spent upfront prioritizing the immediate business problems where data science can deliver incremental value to the business (bite sized wins always win!)
  2. Making investments in all stages of the data science lifecycle simultaneously, rather than prioritizing stages to make immediate investments in for incremental progress. This leads to large investments and delays the completion of useful projects often beyond the expectations of executives. Remember, quick wins are necessary to build credibility within the organization.
  3. Targeting investments to optimize a specific, yet incorrect, phase of the lifecycle without first understanding where inefficiencies occur in the lifecycle and how those inefficiencies impact other phases of the lifecycle.


Picking the right use cases is key.  A large consumer electronics company spent a year developing a model that predicted which of its online customers were most likely to make an additional purchase in the next month. Unfortunately, the model found that most customers weren’t likely to make additional purchases. However, the real problem was that  they hadn’t factored in that 3rd party retailers were responsible for more than 90% of their sales. Even if the model had found likely repeat customers, the impact on sales would have been negligible given their online store’s small proportion of total sales. When businesses focus on the wrong use cases, not only does that project fail to deliver value but the business sustains high opportunity costs by not working on projects that can deliver value. 

Define Business Problem

Failure modes two and three highlighted above focus on the lifecycle as a whole and how to prioritize efforts within the lifecycle. While certainly problematic, the first mode is potentially the most dangerous. If data science teams do not focus on solving the right problem, then it doesn’t matter if their processes or technology stack are correct. It is hard for both the business executives and data science teams to get this right.

Data scientists new to a company usually lack both the domain understanding and the broader business context to identify and prioritize the problems they solve. Without guidance from the business, they are likely to pursue problems that are of limited value to the business or propose solutions that can’t be implemented or are irrelevant to the business.

On the other hand, business executives struggle to understand how their problems translate into data science problems that can be solved by their data science team. In many cases, this lack of data science experience leads to projects that are too complex, could have been solved by easier techniques or rely on data that is not available.

To avoid these issues, it’s important to have a culture and processes that bring Data Scientists and business stakeholders together to discuss the business problems that need to be solved. Working together, the scientists and stakeholders can identify which problems are amenable to data science solutions and have relevant data. They can discuss and understand the limitations of the approach, establish and align on success metrics, determine the implementation mechanism, and determine the ROI, risk and project timeline.

Based on the mutually understood ROI, risk and timeline, data science and business leadership prioritize and resource the projects that maximize the value data science efforts will deliver. As an added bonus, collaboration between business and data science teams at the start of a project encourages continued collaboration throughout the life of the project. This ongoing collaboration enables the sharing of intermediary findings by the data science team with the business, potentially improving operations even before project completion, and business insights can be shared with and leveraged by the data science team as they continue modeling.

In light of these common failure modes, organizations can refactor their processes to prioritize the data science problems they tackle. With their focus on solving the right business problems, Data Scientists can now think strategically about the procedures that need to be built and which technologies are required to accelerate their data science lifecycle. We will discuss our approach to this prioritization next week.

Sign-Up For Your Free 30-Day Trial!