Skip the article and find chess results now: click HERE to sign up for the free Rasgo app and chess game dataset.
When working with data, there are few games better than chess for using statistical analysis and/or machine learning algorithms.
Most of us learn to play chess at a young age, typically with an older, more experienced player whittling away our pieces in a crushing defeat. After some embarrassing losses, you understand that much of chess can be explained through strategy, probability and statistics. When one of our leading Data Scientists discovered a dataset in Snowflake (a cloud data warehouse) that contained the moves and results of over 2500000 chess games, we couldn't help ourselves.
One idea that is commonly disputed in the chess world is that white has the advantage as it allows you to make the first move. After some internal discussion here at Rasgo, we discovered that we couldn't agree on the outcome. Being a company of data enthusiasts, we decided to put it to the test.
We started with a simple hypothesis: is it true that White has an advantage over Black because they make the first move?
To figure this out, I categorized the games by comparing the two players’ elo scores. According to Chess.com, the elo score was invented by Arpad Elo, a physics professor and chess master in the United States who worked with the U.S. Chess Federation to measure the skills of different chess masters. Arpad created the Elo Score as an improved chess rating system instead of the previously used Harkness System.
Wikipedia describes the elo scale as:
“The Elo rating system is a method for calculating the relative skill levels of players in zero-sum games”
Taking this into account, we started to formulate an experiment. There were 1MM games where Black’s player had the advantage and 1MM where White’s player did. This left 500K games where the opponents were essentially equal. So what was the balance of wins?
Since the data already has a column Result, all I had to do was click the chart in Rasgo. It turns out that when the opponents were evenly-matched, White won more (53%-47%).
Not super astonished by our findings we searched the internet to see if anyone else obtained the same results. We found numerous sources that claimed that “beneath the elo rating of 1600, this advantage goes away.” A quick toggle in Rasgo showed me that this internet claim was false - the difference still exists below 1600.
This data set and Rasgo are both free cloud products and can be used by anyone to load and transform data directly in your data warehouse. In this case, the data set lives in the Rasgo app for use by anyone!
We will continue to add more interesting data for you to play with and find insights in. Please feel free to send us anything special that you’ve found!