As of this morning, the Reds have completed 5.5% of their 2015 season. Here is how they’ve done at the plate:
It is still too early to make much use of these numbers other than as justifications for our own beliefs. For example, I find it hard to believe that Devin Mesoraco will continue to put up a -19 wRC+ after posting a 147 wRC+ last year. But at what point should we be worried? Or at what point does a player’s hitting line start to mean something about their future performance?
First question: how much stock should we put in last year’s performance? Numbers that are consistent from year to year can loosely be interpreted as consistently measuring a hitter’s underlying talent. If there is a wide variation in year-to-year performance (not due to injury) then the metric could be picking up a lot of noise and therefore won’t be very useful for predicting future performance.
In the case of hitting statistics, we might be interested in knowing if a player has a high batting average one year, does that mean the hitter will put up a similar batting average the next year?
Bill Petti researched the question and offers a pretty good answer. His analyses uses eight years of data and only includes hitters that posted at least 300 PA in back-to-back seasons (the full correlation table can be found here).
Here is the table from the article:
Hitter Metric | Year to Year Correlation |
Contact % | 0.90 |
SwStr % | 0.89 |
Swing % | 0.84 |
K% | 0.84 |
Z-Swing % | 0.83 |
O-Contact % | 0.81 |
Z-Contact % | 0.80 |
BB% | 0.78 |
BUH | 0.77 |
GB/FB | 0.77 |
GB% | 0.76 |
O-Swing % | 0.75 |
ISO | 0.73 |
HR/FB | 0.73 |
FB% | 0.73 |
SLG | 0.63 |
OPS | 0.63 |
OBP | 0.62 |
wOBA | 0.61 |
IFH | 0.59 |
IFFB% | 0.56 |
F-Strike % | 0.56 |
Zone % | 0.52 |
IFH% | 0.44 |
Batting Average | 0.41 |
BABIP | 0.35 |
BUH% | 0.24 |
LD% | 0.22 |
When using previous performance numbers in hitting projections, we need to use metrics that have a high correlation from year-to-year. As pointed out in the Petti data, the numbers that show the highest predictive power are plate discipline stats (these are about 75% predictive, with contact percentage at a 90% correlation).
Based on this table, predicting future performance on batting average is not a very wise thing to do because it changes so much every year (as pointed out in the linked article, this is mostly due to variation in BABIP, which is largely out of the hitter’s control).
In contrast, looking at a players BB%, ISO, and contact numbers are relatively stable from one year to the next.
Second question: How soon can we state that a player is having a good or bad year? Or, when does a “slow start” become more than just a slow start?
Russell Carleton has been publishing data in this area for years, often under the name Pizza Cutter. He’s produced an updated set of numbers. (Warning: That article is math-y, but it’s fun).
The bottom line (taken from Carleton’s article):
Statistic | Definition | Stabilized at | Notes |
Strikeout rate | K / PA | 60 PA | |
Walk rate | BB / PA | 120 PA | IBB‘s not included |
HBP rate | HBP / PA | 240 PA | |
Single rate | 1B / PA | 290 PA | |
XBH rate | (2B + 3B) / PA | 1610 PA | Estimate* |
HR rate | HR / PA | 170 PA | |
AVG | H / AB | 910 AB | Min 2000 ABs |
OBP | (H + HBP + BB) / PA | 460 PA | |
SLG | (1B + 2 * 2B + 3 * 3B + 4 * HR) / AB | 320 AB | Min 2000 ABs, Cronbach’s alpha used, Estimate* |
ISO | (2B + 2 * 3B + 3 * HR) / AB | 160 AB | Min 2000 ABs, Cronbach’s alpha used |
GB rate | GB / balls in play | 80 BIP | Min 1000 BIP, Retrosheet classifications used |
FB rate | (FB + PU) / balls in play | 80 BIP | Min 1000 BIP including HR |
LD rate | LD / balls in play | 600 BIP | Min 1000 BIP including HR, Estimate* |
HR per FB | HR / FB | 50 FBs | Min 500 FB |
BABIP | Hits / BIP | 820 BIP | Min 1000 BIP, HR not included |
From these numbers, it looks like the first two months will provide a sufficient number of plate appearances to start making judgments about strikeout rates (watch out for Jay Bruce, currently at 37 plate attempts and a 29.7 K%), the power stats (ISO, HR/FB, HR Rate), and walk rate (only 120 At-bats, one of the lowest).
At the mid-season mark, slugging, hit by pitch, and singles rates start to come into focus.
This data demonstrates that it takes a long time for some noisy stats, like batting average, to stabilize. This helps to underscore that imprecise measures (or those that rely on many conditions, such as batting average) take a long time to become a reflection of underlying talent.
Now, this second chart is an imperfect fit for our purposes because it provides an answer to a similar, but not completely comparable question to the one we are asking. This chart lets us know when a metric stabilizes, but does not provide an answer for when we can use that metric to demonstrate a difference in performance. For example, if Joey Votto has a higher strikeout rate this season than his career through 60 at-bats (he doesn’t, by the way), does that mean that he is going to have a poor season?
No, these results are not refined enough because they are about measuring the stabilization of a number, not the difference between two periods of time. And this gets us to the staggeringly difficult question of how to falsify a probabilistic estimate. Furthermore, how can we both incorporate a player’s past performance (when it came from a different season, perhaps different team, perhaps different age) into their current performance? Are they really different, or is it random variation?
Statisticians have struggled with this question for quite a long time, so no sense trying to resolve it now.
Yet what we can say is this: within the first two months, players’ walk rates, strikeout rates, and power stats are instructive, but not definitive. These are important stats because the two major “skill†areas for players are controlling the strikezone (K%, BB%) and power (ISO). Combining this with a divergence in contact rates from one year to the next (first chart) would point to a durable decline that will continue down the stretch.
Playing better is always preferable to playing poorly, but it takes awhile to know when a set of outcomes becomes representative of a larger trend.