If you were watching last night’s FSO broadcast in the bottom of the 2nd inning, you saw this graphic regarding Matt Harvey’s ERA:
Thom Brennaman said: “The simple truth is he (Harvey) has gotten healthier as we have moved forward and the numbers are starting to show it.”
Let’s take a closer look at the chart and what it attempts to prove. The first bar on the left (blue) shows Harvey’s time with the Mets. The subsequent columns are the month-by-month values of Harvey’s ERA. Each month represents three (Sept) to five (June, July, August) starts.Ã‚Â Harvey’s red-column May numbers include four starts for the Reds.
Sure, that May column is lower than June, July and August, so it doesn’t fit the narrative. But the broadcasters explained it away as being not representative because Harvey only pitched a couple innings in a few of those games. In fact, Harvey threw four innings in his first two May starts for the Reds. The reason his ERA is low for May, though, is because of a 6-inning start where he gave up one earned run. So, yeah. But we have narratives to keep stitched together, so let’s ignore that little issue, as the FSO guys asked. Here’s the same chart in a format we can repeat.
Also in service to the broader narrative, you have to look past the fact that July’s ERA was only .03 runs better than June’s. That’s less than the effect of one-third of an inning, and trivial. But again, we’ve got narratives at stake here.
To be sure, if you look at July, August and September, there is, indeed, a “downward trend in the ERA department” (Jeff Brantley).
[If you know last night didn’t go too well for Harvey, you’re rightly wondering how that affects these numbers. We’ll get back to that in a minute.]
The purpose of this post it to take a look at the way arbitrary endpoints can affect support for what one is trying to prove. FSO chose endpoints based on when we turn the page of our Gregorian calendar, devised in 1582 byÃ‚Â Pope Gregory XIII, and who was not known as a big fan of baseball.
Suppose instead, we divided Harvey’s time with the Reds into four equal blocks, which is 5-6 starts per block. Here’s what the ERA graphic would have looked like.
Hmm. Not terribly consistent with “as he has gotten healthier the numbers are starting to show it.” Instead, by these endpoints, Harvey’s performance looks like a random walk, not a trend.
How about dividing Matt Harvey’s time with the Reds into two equal periods, each with 11 starts. That’s as large of a sample size we can create for comparison.
Oops. Harvey’s first 11 starts were quite a bit better than his last 11 starts, based on ERA. You can see why FSO didn’t choose these data piles.
[Reminder: All we’re doing is changing the endpoints for each block of time. This is the same underlying data, with the same metric (ERA) that FSO used. They chose to divide it month-by-month, which supported what they wanted to say. But so far, every other way we’ve divided up the data has failed to back up their claim.]
Finally, what if we used the popular “first-half, second-half” scheme, with the All Star Game serving as endpoints.
Ugh. That’s really narrative busting. Harvey’s ERA has been far worse after the ASG. And remember, we aren’t even including his latest start.
The point should be obvious by now. The “simple truth” advanced by the guys on FSO using ERA was based entirely on the dates they chose for the data piles and those start and end points are completely arbitrary.
What about last night?
Here’s where the narrative takes a real beating. Giving up 7 earned runs in 5.1 innings won’t help one’s ERA. Even FSO’s Month-by-Month endpoints no longer support their “numbers show Harvey has gotten better” claim.
Obviously then, throwing that 7-spot into the last column is going to make all the other endpoint schemes worse for the back end, too. Here’s the Four Block example:
The First-Half/Second-Half split really, really becomes contrary to the Harvey Happy Ending bedtime fairy tale.
The bottom line: Matt Harvey’s ERA with the Reds has been 4.46. The NL average for starters is 4.02. That’s a huge increment worse than average. Are you sure you want to offer him a big extension?
Now, what if FSO had used a metric other than ERA?
ERA is laughably inaccurate in conveying pitching performance over such short periods of time. Runs are clunky in relation how someone pitches (one swing can = three runs). Sequencing has a huge effect. Defensive performances can be variable. Sometimes relievers strand inherited runners, sometimes they don’t. Sometimes you pitch well and are unlucky and vice versa. ERA is fraught with misleading implications over sample sizes even of half-seasons, let alone individual months.
Suppose we just look at a pitcher’s strikeouts, walks and home runs?
Some analysts believe those are the variables over which the pitcher has the most control. That’s the statistic called Fielding Independent Pitching, or FIP. It’s a lot less clunky, isn’t affected by sequencing, defense, luck (much) or relievers. Maybe it will show something more reliable about Matt Harvey’s performance with the Reds.
It turns out that using FIP, Matt Harvey has pitched better if we use Month-by-Month endpoints, or Four Blocks. But looking at Two Blocks or First/Second Halves, Harvey’s FIP has been higher later. That’s with or without last night. For example, his first-half FIP is 3.97 and second-half is 4.79. The NL average is 4.10.
Finally, what if we look at just Harvey’s strikeouts and walks?
Some analysts think that’s an even better way to measure a pitcher because home runs can be random (a function of fly balls, not pitching quality per se). To look at just a pitchers Ks and BBs, we can use xFIP, a metric like FIP that is scaled to ERA.
The xFIP data doesn’t much conform to the narrative that Matt Harvey is getting better, either. But it’s closer. The First-Second Half numbers (4.39 down to 3.99) are back in the correct direction. So are the Four Block and Two Block data. Month-by-month is still up and down, though. So the broader point remains, what you prove still depends on the endpoints you choose.
The main moral of this story is: Be skeptical of advocates using selective endpoints, especially when the sample size is small and with certain metrics.
Yes, choosing endpoints is unavoidable. We’ve all at times done it selectively. And I’m not saying FSO chose month-by-month knowing it was the only way the data could be grouped to prove their point. They probably just didn’t look past the first split data they found. Those “simple truths” though, are often simple but not true.