In Part 1 of this series, we looked at the weaknesses of using ERA as a statistic to measure pitcher performance. It turns out there are a number of factors that have a substantial impact on a pitcher’s ERA that the pitcher doesn’t control. Let’s take a look at a few of the statistics that improve on ERA.
Isolating the Pitcher’s Contribution
Suppose we pare back things for which we hold the pitcher accountable. Pitchers do have significant control over strikeouts, although catchers play a role with calling pitches and pitch framing. Umpire strike zones matter, too. But pitcher performance plays an overwhelming role in strikeouts. The same is true for walks and hit batters. Let’s give the pitcher credit and blame for those three outcomes.
For the moment, let’s assume home runs are something the pitcher controls. The number of home runs surrendered is unrelated to defense or relief pitchers or sequencing or official scorers, although it does depend on park factors. But lets assume for now that home runs belong in the bucket of stuff the pitcher controls.
We need a statistic that evaluates pitchers on those outcomes. Just count up home runs, walks, HBP and strikeouts. Those are standard box score stats. Nothing fancy. Figure out a weighting for each that reflects the known data on contribution to runs scored. To help with familiarity, use a formula that puts our stat on the same scale as ERA with 4.00 about average, 5.00 and above lousy, 3.50 good, and below 3.00 outstanding.
What we described is Fielding Independent Pitching (FIP), which is one of a group of similar statistics referred to as ERA Estimators. FIP has been popularized by the site FanGraphs and used as a basis for their WAR calculations.
FIP – Evaluating on Strikeouts, Walks and Home Runs
Fielding Independent Pitching measures how the pitcher actually pitched. The pitcher gave up those home runs and walks. He struck out those batters.
It’s more crucial to list the factors that don’t influence FIP. It doesn’t look at the number of runs scored or whether they were earned. It doesn’t include hits in the formula. Remember from yesterday’s post that hits have a huge component of randomness and are affected by batter skill and defense.
FIP does a better job of isolating what the pitcher controls in his performance than does ERA. FIP does not depend on official scorers decisions, or a shortstop’s range, or a left fielder’s arm strength, or the effectiveness of relief pitchers, or the sequence of events, or whether soft fly balls fall in as hits.
Research shows a pitcher’s FIP is a better predictor of how many runs he’ll give up in the future than does the pitcher’s ERA. Think of FIP as what a pitcher’s ERA would be assuming average defense, average bullpen and average luck.
A pitcher’s FIP is more stable than ERA from year to year, which is another indication it better reflects actual pitcher talent. If a pitcher has a long enough career, his ERA usually converges to his FIP. 75% of pitchers with at least a thousand innings pitched had an ERA within .2 of his FIP.
FIP isn’t perfect. It doesn’t account for that small part of batted balls that the pitcher does control. It includes home runs, even though those are influenced by park factors. But if you’re looking for a better measure of pitching performance, it’s a good place to start.
xFIP – Evaluating on Strikeouts, Walks and Fly Balls
Let’s go back to home runs. After years of study, we’ve learned that pitchers surrender one home run for every 10-12 fly balls they allow. That stat is expressed as the ratio HR/FB. For many years, HR/FB remained near 10%. In the past three seasons, the number jumped to around 12.5%.
Pitchers do have a degree of control over the number of fly balls they give up. If, as the data indicates, home runs are a reasonably consistent percentage of fly balls, the number of home runs a pitcher gives up is a function of his fly ball percentage (FB%).
Let’s say we wanted a version of FIP that “normalizes” home runs hit across luck and stadium dimensions. The way to do that would be to remove HR from the equation and replace it with a variable representing a pitcher’s FB% in relation to the league FB%.
That statistic is called xFIP where the “x” stands for “expected.” FIP counts how many home runs a pitcher gives up. xFIP estimates how many home runs a pitcher should give up assuming average luck and stadium size. It works essentially the same way FIP does. Pitchers control strikeouts, walks, hit batters and fly ball percentage. The formula is scaled to ERA. You can find xFIP at FanGraphs.
Why is xFIP important?
Over a season, the number of home runs an individual pitcher gives up varies quite a bit and might even diverge from league average over the duration of an entire year. Eventually the pitcher will move back toward league average. But an unusually high or low HR/FB for certain stretches may not be a good indicator of his true talent.
Studies show that xFIP is a better predictor of future pitching than FIP. Both are better than ERA.
SIERA – Adding Back Some Pitcher Skills
Let’s return to that small amount of influence pitchers have on batted balls and try to factor that into an ERA estimator.
Here is the raw data: Pitchers with greater velocity and more strikeouts also generate more poor contact and more double plays per ground ball. Pitchers with higher walk rates give up more runs than would be supposed by straight linearity. Pitchers with higher ground ball rates have lower out rates than fly ball pitchers.
A formula that takes all of that into account is more complicated than the one for FIP or xFIP. But it is still based on what the pitcher controls.
This statistic is called SIERA, which stands for Skill-Interactive ERA. You can find it at FanGraphs.
SIERA assumes the pitcher has average luck, defense, sequencing, park factors and home runs. It incorporates strikeouts, walks, HBP and FB% as things under the pitcher’s control. What SIERA adds to xFIP is an attempt to model the small fraction of batted balls that the pitcher can influence.
Studies show that SIERA is a better predictor of future pitching than xFIP, FIP and ERA.
In 2015, the folks at Baseball Prospectus (a historic and tremendous baseball site) introduced their own stylized pitching statistic. It’s called Deserved Runs Average (DRA). DRA is a “mixed model” because like ERA it weights all batting events, including hits, but normalizes ERA in many, many ways. DRA controls for the stadium, temperature, quality of opposing batter, pitching on the road, defense, pitch count, catcher framing, umpire strike zone, number of runners on base, number of outs, base runner speed and more. It’s also scaled the same as ERA.
Statcast “Expected” Stats
MLB’s Trackman system now gives us batted ball data, such as exit velocity and launch angle, for every play. Using that, it’s possible to develop new measures of how the pitcher performed. MLB’s Statcast Search page contains several new statistics that look at every hit ball a pitcher gives up.
Based on exit velocity and launch angle, it’s possible to formulate an expectation for how many hits and extra-base hits the pitcher should have given up. Examples include expected batting average (xBA), expected slugging percentage (xSLG) and expected, weighted on-base average (xwOBA). You can find them at the Baseball Savant website operated by MLB. They evaluate pitchers but are scaled to hitting stats, not ERA.
These new stats share similarities with FIP, xFIP and SIERA in that they assume average defense, bullpens, sequencing and, to a certain degree, luck.
But there is an important difference between the ERA Estimators and the new Statcast Expected Stats. Expected Stats give the pitcher 100% credit for the batted ball profile he surrendered. If a pitcher gives up more hits with a powerful angle-velocity combination, Expected Stats attribute it entirely to pitcher performance. But we know pitchers control far less of the variance in batted-ball profiles than that.
The Statcast Expected Stats give additional insight into actual pitcher performance and eliminate much of the noise that makes ERA an unreliable measure. But assuming that pitchers have complete control over batted balls will lead you down a questionable path.
It is possible to adjust certain statistics for park effects. The convention among baseball statisticians is to put a minus-sign at the end and scale the statistics to 100. You can find ERA-, FIP- and xFIP-. Every point below 100 is a percentage that a pitcher is better than average. For example, a pitcher with an FIP- of 90 is 10 percent better than average, taking into account ballpark.
The statistic WHIP stands for Walks plus Hits per Innings Pitched. It measures how many base runners a pitcher allows per inning. Because it’s a non-traditional baseball acronym, people often assume WHIP is a new-fangled sabermetric stat when that isn’t the case.
WHIP was a term invented by the guys who came up with the first fantasy baseball league in 1979. So it’s a made-up fantasy baseball stat.
WHIP does offer a certain snapshot of pitcher performance. Walks are an important way to evaluate pitchers. Of course, plenty of other statistics measure walks. The second half of the WHIP equation is Hits. Assigning the number of hits given up to the pitcher is a problematic and inaccurate way to measure the pitcher.
Defense, luck and hitter talent play an overwhelming role in Hits. An analyst looking to mitigate that variance would avoid using WHIP to analyze pitchers in favor of the stats described above. In that sense, WHIP is more of an anti-modern stat than a modern one.
In the first two parts of this series, we’ve looked at ERA, ERA estimators and Statcast Expected Rate as ways to measure pitching.
But there are new, granular ways to evaluate pitchers, many of which are at the cutting edge of thinking and based on brand new technology. Those metrics examine and measure the pitcher’s arsenal, individual pitches and outcomes. We’ll cover them in Part 3.