Part 2 – Isolating Pitching | Part 3 – Pitching Arsenal
A foundation of modern analytics in baseball is the goal of isolating the performance of an individual player from that of his teammates. That pursuit has led to new concepts and ways of evaluating performance. Revolutionary applications of technology have further advanced our ability to judge talent.
No aspect of baseball understanding has changed more in the past decade than evaluation of pitchers. In this series, Jordan Barhorst and I take a look at how measuring pitchers has evolved in the past few years and apply those new metrics to the Reds pitching staff.
We start with a look at two statistics, pitcher Wins and ERA, and their weaknesses as measurements.
Pitcher Wins
The oldest method of measuring pitchers is the Win statistic.
Here’s the basic rule: A Win is awarded to one, and only one, pitcher in every game. A pitcher qualifies for the Win when he is the pitcher of record when his team takes the lead for good. A starter has to pitch at least five innings to qualify. There are a couple of rare exceptions, but that’s the gist.
The practice dates back to an era when starters almost always pitched complete games. The expanding use of relievers in recent decades has eliminated much of the direct connection between the starting pitcher and winning and losing. That disruption made assigning the Win statistic a little tricky.
Wins are an excellent way to judge how a team plays overall. But they are an extremely crude measurement of any individual player, including pitchers. A team win is the product of outscoring the opponent. A team’s offense is unrelated to its pitcher’s performance. Run prevention is only partly determined by the pitcher. Defense and other factors play a big role.
The quality of a pitcher’s supporting cast – offense, defense, bullpen – is the most important factor in his accumulating Wins. It was easier to compile pitcher Wins for the 1976 Reds than for the 2016 Reds.
Three-time Cy Young winner Max Scherzer put it succinctly, “It’s really not a good way to evaluate a pitcher. You can be on a good or bad team and that affects your win-loss record.”
Total Runs Allowed
The first step in narrowing our focus to just the contribution of the pitcher in question is to take the team’s offensive run scoring out of the equation. The simplest way to do that is to count only the Runs scored by the opposing team. The fewer runs the other team scored, the better the pitcher performed.
Even with that refinement, it’s easy to understand many of the Runs scored against a team might not be the pitcher’s fault. The team’s fielding and other factors play a large role in runs being scored. Looking at Total Runs Allowed is still too blunt of a tool for measuring a particular pitcher.
The Goal of Earned Run Average
To do better, we’ve devised a statistic known as Earned Run Average (ERA). The century-old benchmark is a calculation of the average number of “earned” runs the pitcher gives up over nine innings.
Analysts and fans have become familiar with ERA and its scale, although it moves around a bit based on the overall run environment of the time. Average ERA is about 4.00. 3.50 is really good. Below 3.00 and you’re talking about Cy Young candidates. 5.00 and above is horrible.
The desire to limit credit and blame to what the pitcher can control is the driving force behind preferring ERA to Total Runs Allowed. In fact, we now almost never see a pitcher’s evaluation include unearned runs. The entire premise of ERA is that a pitcher shouldn’t be accountable for runs scored due to his teammates’ fielding shortcomings.
ERA has become the dominant way for the average baseball fan to evaluate a pitcher. We even dice up ERA across a handful of appearances and try to spin narratives.
But ERA is a highly flawed way to evaluate a pitcher. Using small sample sizes of ERA, like even entire seasons, is an unreliable way to characterize the past, let alone predict future performance.
The Fundamental Weakness of ERA
There are a number of reasons to question the reliability of ERA and we’ll get to the rest in a minute.
The fundamental problem with the statistic lies in the execution of its basic premise. The rationale for using ERA is solid; pitchers shouldn’t be blamed for their teammates fielding mistakes. But ERA only uses one measurement of defensive failure: the horribly inadequate Error statistic.
An Error takes place when the home team’s official scorer says it does. Based on those judgments, runs that are the product of Errors are deemed unearned and rightly excluded from the pitcher’s ERA.
Setting aside the tremendous vagaries of official scoring (and we shouldn’t), Errors capture a miniscule portion of fielding weakness. Defenders vary widely based on range, arm strength, glove work, reflexes, intuition, route efficiency, positioning and more. Defensive shortcomings unrelated to Errors affect a huge number of runs in every game. Yet they remain unaccounted for by ERA.
Imagine two pitchers with the same quality of pitches, throwing to identical hitters. They record the same number of strikeouts, walks and give up an equal number of home runs. Their fastballs have the same velocity, movement and spin rate, as do the rest of their pitches.
From that description, it would be a fair inference the two pitchers perform equally. But suppose Pitcher A plays with a much better defense (say, Billy Hamilton, Zack Cozart and Brandon Phillips behind him) while Pitcher B has a defense that’s below average. Pitcher A will give up fewer hits and therefore fewer runs.
As a result of the different defensive abilities, Pitcher A’s ERA could be several runs lower than Pitcher B’s ERA. But yet they actually performed the same.
When evaluating a position player, such as shortstop, we’ve come to understand that measuring his defensive ability solely by the number of Errors is woefully incomplete. We measure his range, arm strength and other factors. A modern baseball analyst would never use Errors as the basis for judging a player’s defense.
Yet when we use ERA to evaluate a pitcher, we’re doing exactly that.
Using Errors as the basis for fielding is a fatal flaw of ERA. But it’s not the only significant weakness.
Inherited Runners
Major league baseball is becoming a sport of specialization, particularly when it comes to pitching. The number of innings covered by relief pitchers is increasing every season. Because of that trend, the number of runners on base when pitchers are pulled from a game is growing.
When Pitcher A leaves a game with runners on base, his ERA becomes dependent on the relief pitcher or pitchers who follow him that inning. If the relief pitcher gives up a double, allowing an inherited runner on first base to score, the blame, in terms of ERA, goes entirely to Pitcher A.
Again, two identical pitchers, leaving a similar number of runners on base, could end up with vastly different ERAs based on the performance of other players.
Even within a specific bullpen, large differences in quality span the pitchers who can come in. The reliever chosen on a given night is influenced by factors such as how many runs your own offense has scored, which relievers pitched the night before and the handedness of the batter the opposing manager chose to send to the plate. Those factors are all out of the control of the initial pitcher.
The rate at which relievers stop inherited runners from scoring plays a huge role in a pitcher’s ERA, even though variance in that outcome is irrelevant to judging how the original pitcher actually performed. This is another way that ERA brings along the strengths and weaknesses of teammates.
Sequencing
Another flaw in using ERA to measure pitching performance is the way it is impacted by the sequencing of events out of the pitcher’s control. Imagine our two pitchers again.
On Monday in the first inning, Pitcher A strikes out Kyle Schwarber, gives up a single to Javy Baez, walks Kris Bryant, gives up a home run to Anthony Rizzo, gets Ian Happ to pop out and strikes out Ben Zobrist.
On Tuesday in the first inning, Pitcher B strikes out Kyle Schwarber, gives up a single to Javy Baez, walks Kris Bryant, gives up a home run to Anthony Rizzo, gets Ian Happ to pop out and strikes out Ben Zobrist.
Pitcher A and Pitcher B performed the same.
Suppose on Monday, Rizzo batted cleanup after Baez and Bryant. But on Tuesday, the Cubs manager changed his batting order and had Anthony Rizzo leading off. Pitcher A would have given up three earned runs, while Pitcher B gave up only one.
In this example, Pitcher A’s ERA would be three times higher than Pitcher B’s because of the sequence of events, one that was determined by Joe Maddon.
Park Factors
This one is pretty straightforward. Pitcher A gives up a 370-foot fly ball to left-center and Pitcher B gives up a 370-foot fly ball to left-center.
The pitchers performed the same, right?
Not if you’re using ERA and the pitchers were in different parks. The first fly ball goes for a 3-run homer into the front row of seats at Great American Ball Park. The second is a routine out, not even reaching the warning track at Pittsburgh’s PNC Park.Pitcher A’s ERA gets dinged for 3 earned runs.
Pitcher B, for the same performance, gives up zero.
Often when we evaluate a pitcher, we mention the home ballpark or maybe the specific minor league where he pitches. We understand it matters. But then we use his ERA, a stat that assumes all the parks are the same
Just Plain Luck
Four factors determine whether a ball hit in play becomes a hit: defense, the talent of the hitter, the talent of the pitcher and luck.
The mention of luck in this context causes certain fans to recoil. We’re reluctant to believe randomness plays a large role in the outcome of a sport we love.
We romanticize the check-swing bloop that falls in just in front of the outfielder and the dribbler ground ball a few inches out of the reach of the second baseman. We want to believe our gritty home-team player intended to hit those balls there as if aiming like that were possible. But it isn’t. A baseball player isn’t Roger Federer drilling a forehand precisely to the baseline corner. Hitting off a professional baseball pitcher is harder. An element of luck is involved.
On the one hand, pitchers do control the number of balls put in play and runners on base with strikeouts, walks and hit batters. Pitch velocity has a slight effect on the quality of contact. Pitchers also have bit of influence over whether a ball is hit on the ground or in the air.
But, broadly speaking, pitchers have almost no control over what happens after the ball is put in play. Research by Matt Swartz (2010) shows that whether a ball falls in for a hit is determined 13 percent by the defense and park factors, 12 percent by talent (both pitcher and batter) and 75 percent by luck.
The way we try to discount the role of luck is by rationalizing that it balances out over a short period of time, maybe even in a single game. But research by Derek Carty (2011) shows it takes eight seasons to weed out enough randomness to produce a measure of the true talent of a starting pitcher.
We’re talking about hits. The number of hits in a game is small. Given the huge random component of luck, plus the role of defense, Hits is one of the least meaningful statistics by which to judge a pitcher’s performance. (Remember this point when we talk about WHIP tomorrow.)
But ERA credits every hit to the pitcher, with no accounting for luck, good or bad.
Conclusion
People say that ERA measures “what actually happened” and in a certain sense that’s right. But that’s not the same thing as measuring how the pitcher “actually pitched.” Official scorers, defense, bullpens, sequencing, ballpark dimensions and luck all work to confound ERA. Those things are all “actually happening” too and are captured in the ERA statistic.
The problems with ERA are huge. The difficulties seem insurmountable.
What we need is a way to evaluate a pitcher that doesn’t depend on those other factors. We need a measurement that isolates how the pitcher actually performed. Happily, there are a few good ways to go about that. We’ll cover them in Parts 2 and 3 of this series.
Steve, nice job !!! Now, if we could only educate the NFL. The worst stat in football is the QB gets credit for the entire yardage on a pass whether the ball is thrown 1 yard or 25 yards. The receiver should get the yardage credit from the point of reception.
Steve, anything you can do to get the TV networks to remove ERA from the players stats and use more TRUE stats of EVERY player?
Looking forward to Part 3.
Great post Steve. Down with ERA! We should ban it from even being mentioned here.
Okay, I will throw away ERA in the same can as I did OBP.
What?? I imagine you would want to start burning books now that even mention ERA in them, no?
What’s next? Telling us how worthless we are if we Don’t want to join up with the BCC’s (Bat Crap Crazies) that advocate for the removal of the Pete Rose statue??
Remove the statue.
Burn ’em all, sure.
Jokes aside, it’s just laziness if you’re using a stat that has been clearly demonstrated to be flawed over better statistics that are readily available. There’s just not a valid reason to use ERA to evaluate pitching anymore.
Steve this is a thorough and insightful assessment of the shortcomings of ERA, especially as a predictor of future results, and to a lesser but still significant extent as an evaluator of past performance. But I do want to address one point – the luck factor. I don’t doubt the finding that whether or not a batted ball is a hit or an out depends mostly on luck (75%). But it’s critical to clarify that this statement should include a major qualifier – given two balls with the same profile. two 275 foot fly balls might result in different outcomes – one a hit and one an out – and the difference is mostly down to luck. But the likelihood that either is a hit is still very tiny. Hit a 275 foot fly ball and you’re almost certainly making a right turn at first. So to say “when a ball is in play, the result is mostly down to luck and out of the pitcher’s control” seems a bit misleading. There’s a big reason that RLN tends to favor pitchers with a higher ground ball percentage – those rarely clear the outfield wall.
I’m not a huge fan of ERA, but I do believe that batted ball profiles need to be considered when evaluating pitchers. Even though there’s a lot of luck in what happens to many batted balls, pitchers do influence how those balls are put in play. I’m sure you and Jordan will cover that ground in full, but as excellent as this first post is I don’t want to gloss over that aspect of assessing pitchers.
The only thing that strikes me as odd is the idea that 75% of hits can be attributed to luck. Seems too random….that the talent of the pitcher/batter should be higher. I mean, the quality and location of the pitch thrown certainly has a huge influence on how the ball is struck, which in turn influences its chances of becoming a hit.
Thoughts???
Sultan – That description can be misleading. Start with two equivalent batted balls – say, two hard hit grounders, or two line drives with the same launch angle and velocity, or two 275′ fly balls. Whether one or the other of each pair is a hit or an out is mostly luck – about 75%. But the line drives will always be more likely to be hits than the fly balls (almost never) or the ground balls (sometimes). So within any one profile of batted ball, luck plays a big part in the outcome. But BETWEEN profiles luck is not always the defining factor for outcomes. Clearly a pitcher who allows a lot of line drives, or 400′ fly balls, will not fare as well as the pitcher who gives up the same number of ground balls instead. The bigger question is – how much control does any pitcher have over the profile of those batted balls. That can be difficult to evaluate…
Agree about the 75%. By that view, Ted Williams and Tony Gwynn were just consistently luckier than everyone else.
Pretty head scratching comment by David Bell to declare that Finnegan has a spot on the opening day roster. I mean, if you take the 10 shoo-ins, that means he is assured a spot and that Reed/Stephenson/Romano/Wisler/Sims are all behind him. Really???
Puzzled me too. I think his comment may have been taken a little out of context or misunderstood. It probably was meant as encouragement for Finnegan that he has a chance, or spot, in the grand scheme of selecting staff members. Can’t believe after his horrible last 2 years he would put him ahead of more deserving candidates after one decent inning in ST.
Good start Steve!
For me, at least intuitively – using a measure of quality of contact (Velocity, Launch Angle) will eliminate more of the randomness and allow for smaller sample sizes to be meaningful.
The goal is for
Put simply – for a Pitcher independent of other variables (park, fielding. luck)…
No Contact is the Best way to prevent runs (several models are using this well now)
Soft Contact is the second best and should correlate to success
Hard contact is the worst and should be inversely related to success.
Some of the models still don’t isolate quality of contact – simply looking at Strikeout, HRs and Walks.
what I describe is obviously not available for much historical data – but I think it is the path forward.
That sounds extreme but actually that seems closer to acurate than you may think. Let’s compare Joey Votto and Billy Hamilton. Billy had 119 Hits in 504 at bats with a .309 Batting average on Balls in play. Joey Votto had 503 at bats (basically identical to Billy’s 504 and perfect for comparing) Joey had 143 hits and a Babip or .333. Doing some math this says Joey Votto put the ball in play 433 times last year. Billy Hamilton put the ball in play 385 times. Now this doesn’t seem right does it? That Votto only put the ball in play 48 more times than Billy last year. But it’s true. Votto was less than 10% better at the plate than Billy was last year when actually hitting the ball. The real difference between the 2 is Billy had 46 walks to Joey’s 108. Billy also had only 29 extra base hits to Joey’s 42.
The article suggests that Batter skill is only worth 12.5% of hits. Joey Votto was only 9.5% better than Billy Hamilton at hitting the ball last year. Those stats in this case seem to match up well with reality.
This was meant to be replied to SultanofSwaff
Good comment!
Interesting, but something about it does not pass the eyebrow test. Have you seen Votto’s plate discipline and swing? Have you compared these to Hamilton’s? Undoubtedly you have. Only a 9.5% difference between those . . . no way.
Votto only had 23 more hits than Billy did with a basically equal 503 recorded at-bats. You would think it would be more considering just how much better Votto is.
If we evaluated “better” by only the number of hits, I would say we would think it were more. But that’s not how we evaluate better. The quality of the hits matter, too. As does the whole walking aspect of things.
Right, but the original conjecture is that player skill only accounts for 12.5% of whether a batted ball falls for a hit or not. I’m presenting an argument that this conjecture makes sense by showing that our best hitter was not statistically miles better than our worst hitter.
I agree about ERA but I don’t think sequencing should be totally eliminated. For example, consider the following home run stats from 2018:
Luis Castillo 28 homers: 16-solo, 8-two run
Zack Greinke 28 homers: 21-solo, 4- two run
Cole Hammels 29 homers: 19-solo, 6-two run
J.A. Happ 27 homers: 19-solo, 4-two run
Some pitchers are better at “limiting damage” and part of that involves sequencing.
ERA has it’s set of issues. So does FIP. And xFIP. And exit velocity. And launch angle. And “hard contact/soft contact”.
They’ve all got their issues. Every single one of those stats is useful. None of them come close to telling the whole story. Some of them tell a different story than what many people try to use them to tell. That last sentence is something I’ve been guilty of in the past.
I have a question about sequencing. Does the sequence of events perhaps affect a pitcher and his performance? For example, the fact that pitcher b might have allowed two men on (in a row) then affect how he pitches to the next guy? He is pitching from the stretch, and there’s a sense of pressure with a RISP. Any thoughts/insights here. I hope I’ve explained my question clearly enough.
Ethan, that’s exactly what I was thinking as I read that part: situational decisions by the pitcher and catcher, depending on how many on / how many out (pitch selection, pitching out of the windup or out of the stretch, etc.) would affect whether or not that hypothetical HR by Rizzo actually takes place.
Same goes for Steve’s assertion that “A team’s offense is unrelated to its pitcher’s performance.” It’s all situational; it’s not just “get up there and try to hit the ball.” If you’re down 6-2 in the 8th, you’re likely to play more aggressive offense than if you’re up 14-1.
That said, the fact that it IS all situational, in my opinion, only lends further credence to Steve’s examination of ERA here.
Steve, excellent job here, buddy. A while back, during a discussion of Homer Bailey’s 2018 production of Le Feu de Benne, I brought up the obvious wins-and-ERA argument, and you shook your head and told me ERA was out-of-date, and (not to be a spoiler for your series – I can’t wait for installments 2 and 3!) to look at xFIP instead. I knew there had to be more to it, and you’re laying it out in great detail here, so keep it up!
He is saying that on two identical balls hit 275 feet being hits. Not just whether or not the batter gets a hit. The luck impacts whether or not the same ball falls in between the outfielders or is caught by the center fielder. Hitting the ball 275 feet is where the skill comes in, not where it lands.
This is a good post, but I would make one additional point: ERA is, in general, a bad stat. HOWEVER, we are still really mediocre at measuring pitchers. Over very large sample size, it sometimes happens that players significant outperform what they “should” do. Johnny Cueto is an excellent example. When given a large enough sample ERA gives us a window in to the part of measuring pitching we still haven’t figured out how to quantify on its own.
The comparison above about Votto/Hamilton :
The difference in a hitter batting .200 vs. .300 if you have 430 ab’s is 43 hits.
At.200 youre getting 2 hits per 10
At.300 youre getting 3 hits per 10
The true difference in the 2 players was not just hits but the quality of the ab’s.
Day and night with these 2,even in a “bad” year for Joey.
Due to a crappy phone (that’s now a thing of the past) I wasn’t able to respond to this article at the time it came out. So I hope I can still get some answers/responses to my statements/questions.
Question 1: Would ERA be a better statistic if the runners on base at the time of the starting pitcher’s removal became the sole responsibility of the reliver? There is that “Inherited Runners” stay for relievers. So, should Inherited Runners mean inheriting the risk to a relievers ERA?
Now on to a secondary subject. I personally have been bothered by the fact that each ballpark is allowed to have it’s own dimensions. It would seem that, in the interest of fairness, that MLB should mandate that all parks have the same exact OF dimensions. Now, I see that there could be a lot of potential controversy with this (ie, the different dimensions give each park it’s own identity….like they’re actually living entities or beings). But, I can’t get past the, “in the interest of fairness” narrative that I brought up earlier. I guess that’s my OCD flaring up, lol.
So, my 2nd question is this: Is there at least a general range that MLB “mandates” (and I use that term loosely) that the OF dimensions have to stay in for Left, Center and Right?