Skip the article and find your own baseball insights: click HERE to sign up for the free Rasgo app and baseball dataset.
There has been a discussion for quite some time whether baseball franchises with more money win more games because they can afford to pay for better players. Being a fully remote company, Rasgo has a number of different baseball fans with favorite teams ranging from the Yankees (bandwagon) to the Houston Astros and many more scattered in there.
This is a super heated subject amongst baseball and professional sports fans everywhere. One quick Google search will show you how ingrained this belief is to the rest of the sporting community. According to article put out by Bleacher Report (a leading professional sports publication) the:
“..the reason why teams such as the Marlins, Rockies, and Tigers never become a dynasty is because as soon as their young players become superstars, they can no longer afford them.
In 2007, the Yankees had the highest team payroll at $189 million. The Devil Rays had the lowest team payroll at $24 million.
The Yankees payroll was nearly eight times higher than that of the Devil Rays, hence the Yankees made the playoffs and the Devil Rays finished last in the AL East.”
This debate got us thinking, is it possible to show a correlation between the salary range and the amount of games won and, considering how common that belief is, can we visualize it for proof.
The first step in this experiment was finding a data set that we could trust and that had enough data to be definitive. We ended up going with Sean Lehmans baseball data archive since it is a trusted source amongst industry experts. Sean is an award winning reporter in both sports and other subjects. Sean also manages the Lehman Baseball Database which holds copious amounts of historical baseball data.
Back to our experiment, our lead data scientist grew up in Pennsylvania and through a number of different interactions and influences has developed a deep dislike of the New York Yankees. With his deep seated dislike and their winning history we asked him to focus on the Yankees throughout the majority of this experiment! With our hypothesis generally formed we were able to start the analysis.
After uploading the data into our cloud database we began transforming the data. Rasgo was perfect for us to get quick answers. We aggregated all the player salaries by Team and Year, and joined it to the Team performance data.
Then, we set up a Custom Filter so we can filter that by different teams, years, etc. More on this later.
We used the heatmap to generate the SQL for me. Let’s pick Season Wins…
Nothing obvious there (If you’re a Red Sox fan and you squint your eyes, you might convince yourself that there’s a clear pattern).
What about Earned Runs – that’s how you win games, right? And maybe home runs. And attendance?
Attendance seems to show a slight correlation! But that makes sense. Teams get money from attendance, and money is used to pay salaries. Home runs look interesting.
As we were performing our analysis one of our Red Sox fans made a good point: the salary data goes back to 1985. This made us think, what if this trend only shows up for the past 20 years? This is where our Custom Filter comes in. On the left, our quadrant shows all of the data. On the right, we have the same quadrant pointed to our Custom Filter. We could swap out Teams, Years, and various combinations to quickly compare. What we’re showing here is the past 20 years, but we tried a lot of different inputs.
After 10 minutes, our lead data scientist decided to change his opinion about the Yankees. Sure, if you dig hard enough, maybe you can find some sort of unfair advantage with the “rich” teams. But for any of our team members to hold such a strong opinion, we’ll need to see some evidence that’s clear and obvious.
Hey baseball fans - what are we missing? Any ideas on how we could have looked at the data better?
This data set along with Rasgo are both free cloud products and can be used by anyone to load and transform data directly in your data warehouse. In this case the data set lives in the Rasgo app for use by anyone!
We will continue to add more interesting data for you to play and find insights in. Please feel free to send us anything special that you’ve found!