Industries throughout our economy are adopting new, aggressive analytical approaches to decision-making. Professional sports – particularly baseball – are among those leading the way.
You only have to look at the back of a baseball card from decades ago to understand that the sport has a strong foundation in statistics. Its fans debate about metrics (standards of measurement) all the time, now even more so with the rapid growth of fantasy baseball.
Coinciding with that interest is the explosion of raw data. Big Data. High-speed video equipment and Doppler radar installed in every major league park capture 120,000 bits per second. For every pitch, 20 pieces of data are recorded, from its velocity to the angle of the pitcher’s arm. In 2015, Statcast data that measures elements of a performance like the defender’s running speed, arm strength and route efficiency and pitch spin rate is now available.
For example, Atlanta shortstop Andrelton Simmons needed just 0.11 seconds for his first step on a grounder by the New York Mets’ Travis d’Arnaud. Simmons made a 68.5 mph throw across his body from the outfield grass. Or Houston right fielder George Springer had 99.1 percent route efficiency when he jumped to steal a home run from Texas’ Leonys Martin. Springer covered 93.7 feet at up to 17.7 mph. And that’s just the start.
To say this amounts to a revolution in data availability would be an understatement.
“You can take the first 135 years of baseball data – the game accounts and what happened during the games – and put that on a 2 GB flash drive,†says Vince Gennaro, president of the Society for American Baseball Research (SABR).
With the new data that MLB is capturing, each game generates about 1 TB of data. (1000 gigabytes in a terabyte, a terabyte has a 1 followed by twelve zeros.)
“So we’re talking about a 10-million fold increase in data capture,†says Gennaro.
***
How do major league teams share this powerful new information? The raw data is collected league-wide by Major League Baseball and given to every team. So at the start, there is parity.
Yet there are huge variances in what teams do with the raw data. To give the measurements value, teams must analyze it. And that is a major undertaking. It takes a prioritization and a commitment of resources. When teams break down the data, it allows them to track player performance and determine value to shape strategic decisions like roster slots, lineups and who pinch hits. Defensive shifts are an obvious area where data has changed baseball.
But the full use of data isn’t a given. Teams with cultures of traditional ways of doing things may choose against aggressive use of analytics, preferring intuition or experience when making key decisions. Organizations must invest in infrastructure and employment opportunities to take full advantage of the data surge. They must move beyond simple description to prediction and prescription.
Major League Baseball will take in almost $10 billion in revenue this year. Certain teams are using their share to push into additional areas of analytics. For example, the Boston Red Sox and Chicago Cubs are adopting advanced neuroscience techniques to help their minor league hitters recognize pitch types and swing decisions. The LA Dodgers and San Francisco Giants are using biometric measurements and workload metrics to help identify players at risk of injury and develop prevention programs.
When one or two games in the MLB regular season can be the difference in advancing to the postseason or staying home. Big Data analytics can provide a crucial edge.
“It’s impossible to equate winning records with more analytical capability, but the recent success of highly analytical teamsâ€â€the Boston Red Sox and New England Patriots, the San Francisco Giants and 49ers, the Dallas Mavericks and San Antonio Spursâ€â€suggests an important role. Analytics are also renowned for making small market teams like the Oakland A’s and Green Bay Packers relatively competitive,†says Thomas Davenport, co-founder for the International Institute for Analytics.
You can add the St. Louis Cardinals, the Pittsburgh Pirates and Chicago Cubs to that list as well.
***
The Los Angeles Dodgers general manager is Farhan Zaidi. Zaidi is 37 years old and has a degree in economics from MIT and a Ph.D. from Berkeley in behavioral economics. His boss, Andrew Friedman, the president of baseball operations, is 38.
If you were perusing FanGraphs this morning, you probably saw the Dodgers had two job postings: Research & Development Data Scientist and Research & Development Senior Developer.
Among the qualifications listed for the Dodgers jobs:
- A Ph.D. in Computer Science (Machine Learning), Statistics, Operations Research or related field from a top-tier university;
- Minimum of five years’ work experience in mathematical, statistical and predictive modeling (optimization, statistics, and machine learning);
- Expertise in mathematical and statistical programming (e.g. Python and R);
- Ability to communicate complex concepts to a non-technical audience
That’s just the start, but you get the idea.
It turns out the Cincinnati Reds have a job posting up today, too. But not at FanGraphs. To find it, check out the latest issues of AARP: The Magazine, Reader’s Digest and Good Old Days Magazine. Online, search at Geocities, Prodigy Online and various AOL chat rooms.
The job description and qualifications are a little different from the Dodgers, too:
“The ideal candidate for the Cincinnati Reds will have excellent eyesight and be able to calculate batting average in his/her head, especially average with runners in scoring position. You will need advanced abacus skills, although the organization will provide graph paper, slide rules, compass and pencils on request.
Previous experience scouting former Cardinals is a must. Familiarity with Bay-scene (are we spelling that right?) statistics preferred, which we guess means previous experience working in San Francisco, Oakland or Tampa. Candidates with strong belief in sacrifice bunts and using the organization’s best pitching arm in bullpen preferred. Specialists in batter-pitcher match-up history will work closely with Bryan Price.
Please send your resumé by Pony Express or one of those new fangled fax machines along with a 500-word essay on what the Reds front office can learn now that games are broadcast in Technicolor.â€Â
***
This isn’t about Moneyball. At the abstract level, Moneyball was about finding market inefficiencies. At a practical level it was mostly about valuing walks – understanding the importance of not making outs and recognizing that the statistic of on-base percentage (OBP) is a better measure of value than batting average (AVG).
The challenges presented by Big Data are far more complex. Advanced information science and machine learning — qualifications listed for the Dodgers’ job — are about creating machines that find new patterns in data that our brains cannot see. We accept this formula in other areas of life. We can’t see cells, so doctors use microscopes. We can’t measure precise angles with our eyes, so we use a compass. The computer you are reading from processes millions of pieces of information a second so you can ignore work and think about the Reds on a Friday.
It may not be what Mr. Castellini expected when he bought the Reds, but our favorite baseball team needs to get to warp drive just to keep up in the NL Central.
Moneyball was simple: The difference between AVG and OBP was adding walks. Third grade addition. This next phase of baseball knowledge requires processing millions of pieces of information and finding new meaningful relationships and strategies. Moneyball was playing with LEGOs. The challenge of Big Data is building the Matrix.
***
MLB may share the data, but organizations are on their own when it comes to making sense of it by developing their own models. The data alone is essentially meaningless.
But not only do organizations have to master the data collected by MLB each night, but they have to understand it in a way that allows them to extrapolate the lessons to situations where they don’t have it.
For example, if it turns out that a pitcher’s ability to put spin on the ball is one of the keys to success, how do organizations apply that knowledge to prospects they are scouting in high school or overseas where pitch spin data isn’t provided to them? Do they require their scouts to take machines with them that measure the spin rate?
You can bet the Los Angeles Dodgers, St. Louis Cardinals and the Chicago Cubs will.
As their fan, I want the Cincinnati Reds to squeeze every advantage they possibly can out of Big Data. Even if intuition and experience can sometimes lead to the right decision, I want my team to be considering all the information that’s out there. It’s not about whether scouting or analytics is more important, they both are. Instead, it’s about having all the facts available to make the best decisions.
Sure, it’s unreasonable to expect general managers to have advanced degrees in machine building. But it’s imperative they be open-minded, quick to absorb and understand the importance of new information, not resist it.
As the Reds face this crucial period of rebuilding when they have to decide which players to trade away and which ones to acquire, it’s essential that they enter negotiations armed with as much analytical power as the team across the metaphorical table. To do otherwise will lead to predictable, suboptimal results and an inevitable downward slide.
Inches matter. And so do all the other measurements.
It’s not hyperbole to say that building a competitive baseball roster is vastly different than it was just ten years ago. With the tsunami of modern analytics engulfing the sport, the image of the Reds general manager and his cast of cast-offs holding back the tide inspires little confidence.