A foundation of modern analytics in baseball is the goal of isolating the performance of an individual player from that of his teammates. That pursuit has led to new concepts and ways of evaluating performance. Revolutionary applications of technology have further advanced our ability to judge talent.
No aspect of baseball understanding has changed more in the past decade than evaluation of pitchers. In this series, Jordan Barhorst and I take a look at how measuring pitchers has evolved in the past few years and apply those new metrics to the Reds pitching staff.
We start with a look at two statistics, pitcher Wins and ERA, and their weaknesses as measurements.
The oldest method of measuring pitchers is the Win statistic.
Here’s the basic rule: A Win is awarded to one, and only one, pitcher in every game. A pitcher qualifies for the Win when he is the pitcher of record when his team takes the lead for good. A starter has to pitch at least five innings to qualify. There are a couple of rare exceptions, but that’s the gist.
The practice dates back to an era when starters almost always pitched complete games. The expanding use of relievers in recent decades has eliminated much of the direct connection between the starting pitcher and winning and losing. That disruption made assigning the Win statistic a little tricky.
Wins are an excellent way to judge how a team plays overall. But they are an extremely crude measurement of any individual player, including pitchers. A team win is the product of outscoring the opponent. A team’s offense is unrelated to its pitcher’s performance. Run prevention is only partly determined by the pitcher. Defense and other factors play a big role.
The quality of a pitcher’s supporting cast – offense, defense, bullpen – is the most important factor in his accumulating Wins. It was easier to compile pitcher Wins for the 1976 Reds than for the 2016 Reds.
Three-time Cy Young winner Max Scherzer put it succinctly, “It’s really not a good way to evaluate a pitcher. You can be on a good or bad team and that affects your win-loss record.”
Total Runs Allowed
The first step in narrowing our focus to just the contribution of the pitcher in question is to take the team’s offensive run scoring out of the equation. The simplest way to do that is to count only the Runs scored by the opposing team. The fewer runs the other team scored, the better the pitcher performed.
Even with that refinement, it’s easy to understand many of the Runs scored against a team might not be the pitcher’s fault. The team’s fielding and other factors play a large role in runs being scored. Looking at Total Runs Allowed is still too blunt of a tool for measuring a particular pitcher.
The Goal of Earned Run Average
To do better, we’ve devised a statistic known as Earned Run Average (ERA). The century-old benchmark is a calculation of the average number of “earned” runs the pitcher gives up over nine innings.
Analysts and fans have become familiar with ERA and its scale, although it moves around a bit based on the overall run environment of the time. Average ERA is about 4.00. 3.50 is really good. Below 3.00 and you’re talking about Cy Young candidates. 5.00 and above is horrible.
The desire to limit credit and blame to what the pitcher can control is the driving force behind preferring ERA to Total Runs Allowed. In fact, we now almost never see a pitcher’s evaluation include unearned runs. The entire premise of ERA is that a pitcher shouldn’t be accountable for runs scored due to his teammates’ fielding shortcomings.
ERA has become the dominant way for the average baseball fan to evaluate a pitcher. We even dice up ERA across a handful of appearances and try to spin narratives.
But ERA is a highly flawed way to evaluate a pitcher. Using small sample sizes of ERA, like even entire seasons, is an unreliable way to characterize the past, let alone predict future performance.
The Fundamental Weakness of ERA
There are a number of reasons to question the reliability of ERA and we’ll get to the rest in a minute.
The fundamental problem with the statistic lies in the execution of its basic premise. The rationale for using ERA is solid; pitchers shouldn’t be blamed for their teammates fielding mistakes. But ERA only uses one measurement of defensive failure: the horribly inadequate Error statistic.
An Error takes place when the home team’s official scorer says it does. Based on those judgments, runs that are the product of Errors are deemed unearned and rightly excluded from the pitcher’s ERA.
Setting aside the tremendous vagaries of official scoring (and we shouldn’t), Errors capture a miniscule portion of fielding weakness. Defenders vary widely based on range, arm strength, glove work, reflexes, intuition, route efficiency, positioning and more. Defensive shortcomings unrelated to Errors affect a huge number of runs in every game. Yet they remain unaccounted for by ERA.
Imagine two pitchers with the same quality of pitches, throwing to identical hitters. They record the same number of strikeouts, walks and give up an equal number of home runs. Their fastballs have the same velocity, movement and spin rate, as do the rest of their pitches.
From that description, it would be a fair inference the two pitchers perform equally. But suppose Pitcher A plays with a much better defense (say, Billy Hamilton, Zack Cozart and Brandon Phillips behind him) while Pitcher B has a defense that’s below average. Pitcher A will give up fewer hits and therefore fewer runs.
As a result of the different defensive abilities, Pitcher A’s ERA could be several runs lower than Pitcher B’s ERA. But yet they actually performed the same.
When evaluating a position player, such as shortstop, we’ve come to understand that measuring his defensive ability solely by the number of Errors is woefully incomplete. We measure his range, arm strength and other factors. A modern baseball analyst would never use Errors as the basis for judging a player’s defense.
Yet when we use ERA to evaluate a pitcher, we’re doing exactly that.
Using Errors as the basis for fielding is a fatal flaw of ERA. But it’s not the only significant weakness.
Major league baseball is becoming a sport of specialization, particularly when it comes to pitching. The number of innings covered by relief pitchers is increasing every season. Because of that trend, the number of runners on base when pitchers are pulled from a game is growing.
When Pitcher A leaves a game with runners on base, his ERA becomes dependent on the relief pitcher or pitchers who follow him that inning. If the relief pitcher gives up a double, allowing an inherited runner on first base to score, the blame, in terms of ERA, goes entirely to Pitcher A.
Again, two identical pitchers, leaving a similar number of runners on base, could end up with vastly different ERAs based on the performance of other players.
Even within a specific bullpen, large differences in quality span the pitchers who can come in. The reliever chosen on a given night is influenced by factors such as how many runs your own offense has scored, which relievers pitched the night before and the handedness of the batter the opposing manager chose to send to the plate. Those factors are all out of the control of the initial pitcher.
The rate at which relievers stop inherited runners from scoring plays a huge role in a pitcher’s ERA, even though variance in that outcome is irrelevant to judging how the original pitcher actually performed. This is another way that ERA brings along the strengths and weaknesses of teammates.
Another flaw in using ERA to measure pitching performance is the way it is impacted by the sequencing of events out of the pitcher’s control. Imagine our two pitchers again.
On Monday in the first inning, Pitcher A strikes out Kyle Schwarber, gives up a single to Javy Baez, walks Kris Bryant, gives up a home run to Anthony Rizzo, gets Ian Happ to pop out and strikes out Ben Zobrist.
On Tuesday in the first inning, Pitcher B strikes out Kyle Schwarber, gives up a single to Javy Baez, walks Kris Bryant, gives up a home run to Anthony Rizzo, gets Ian Happ to pop out and strikes out Ben Zobrist.
Pitcher A and Pitcher B performed the same.
Suppose on Monday, Rizzo batted cleanup after Baez and Bryant. But on Tuesday, the Cubs manager changed his batting order and had Anthony Rizzo leading off. Pitcher A would have given up three earned runs, while Pitcher B gave up only one.
In this example, Pitcher A’s ERA would be three times higher than Pitcher B’s because of the sequence of events, one that was determined by Joe Maddon.
This one is pretty straightforward. Pitcher A gives up a 370-foot fly ball to left-center and Pitcher B gives up a 370-foot fly ball to left-center.
The pitchers performed the same, right?
Not if you’re using ERA and the pitchers were in different parks. The first fly ball goes for a 3-run homer into the front row of seats at Great American Ball Park. The second is a routine out, not even reaching the warning track at Pittsburgh’s PNC Park.Pitcher A’s ERA gets dinged for 3 earned runs.
Pitcher B, for the same performance, gives up zero.
Often when we evaluate a pitcher, we mention the home ballpark or maybe the specific minor league where he pitches. We understand it matters. But then we use his ERA, a stat that assumes all the parks are the same
Just Plain Luck
Four factors determine whether a ball hit in play becomes a hit: defense, the talent of the hitter, the talent of the pitcher and luck.
The mention of luck in this context causes certain fans to recoil. We’re reluctant to believe randomness plays a large role in the outcome of a sport we love.
We romanticize the check-swing bloop that falls in just in front of the outfielder and the dribbler ground ball a few inches out of the reach of the second baseman. We want to believe our gritty home-team player intended to hit those balls there as if aiming like that were possible. But it isn’t. A baseball player isn’t Roger Federer drilling a forehand precisely to the baseline corner. Hitting off a professional baseball pitcher is harder. An element of luck is involved.
On the one hand, pitchers do control the number of balls put in play and runners on base with strikeouts, walks and hit batters. Pitch velocity has a slight effect on the quality of contact. Pitchers also have bit of influence over whether a ball is hit on the ground or in the air.
But, broadly speaking, pitchers have almost no control over what happens after the ball is put in play. Research by Matt Swartz (2010) shows that whether a ball falls in for a hit is determined 13 percent by the defense and park factors, 12 percent by talent (both pitcher and batter) and 75 percent by luck.
The way we try to discount the role of luck is by rationalizing that it balances out over a short period of time, maybe even in a single game. But research by Derek Carty (2011) shows it takes eight seasons to weed out enough randomness to produce a measure of the true talent of a starting pitcher.
We’re talking about hits. The number of hits in a game is small. Given the huge random component of luck, plus the role of defense, Hits is one of the least meaningful statistics by which to judge a pitcher’s performance. (Remember this point when we talk about WHIP tomorrow.)
But ERA credits every hit to the pitcher, with no accounting for luck, good or bad.
People say that ERA measures “what actually happened” and in a certain sense that’s right. But that’s not the same thing as measuring how the pitcher “actually pitched.” Official scorers, defense, bullpens, sequencing, ballpark dimensions and luck all work to confound ERA. Those things are all “actually happening” too and are captured in the ERA statistic.
The problems with ERA are huge. The difficulties seem insurmountable.
What we need is a way to evaluate a pitcher that doesn’t depend on those other factors. We need a measurement that isolates how the pitcher actually performed. Happily, there are a few good ways to go about that. We’ll cover them in Parts 2 and 3 of this series.