If you’ve been around this website for the past few years, chances are you have seen me write something like “Well, [Stat X] doesn’t begin to stabilize until a player reaches [X Plate Appearances].” Statements like these hint at a topic that is near and dear to me; sample size!
We all know the most common baseball caveat; small sample size. What does it really mean, though? It is consistent among statistics? How do you know when a sample size is no longer small? I hope to shed some light on these topics.
To begin, I must disclose this is definitely not my own work. Many folks have tinkered with this stuff over the years, but the most commonly used metrics from Russell Carleton, aka “Pizza Cutter.” You can read one of his posts here. I’ll do my best to shortly summarize and then we’ll look at some Reds stats!
Basically, by using math we can determine what sample size of plate appearance, at-bats, or balls in play are required before you can begin to trust the stat you are analyzing. For example, we all know as baseball fans that if a batter gets a hit in his first at-bat of the season, we shouldn’t consider his “true talent level” to be a 1.000 AVG. If a hitter goes 7-for-10, we know his true talent level is not a .700 AVG. However, most fans get to a point where they start to believe the numbers they are seeing. This may be different for each fan. Take Joey Votto’s slow start this year. Entering Thursday’s game with the Cubs, he had a .182 AVG in 63 plate appearances. It’s natural to wonder what’s wrong with Joey. But if I told you batting average took over 900 plate appearances to begin to stabilize, would you stop worrying? Some might. Some might ask “What is stabilizing?” Glad you asked!
Stabilization points were determined by Carleton using a method called split-half reliability. Essentially, you take a sample and cut it in half, then arrange those halves in every possibly combination and run correlations between the two sets of samples. Average those out, and you get an overall correlation. So, what is stabilization? Carleton surmised that once all the sets of samples reach a mean correlation of around 0.7, they have “begun to stabilize.” Another way to say this is that the signal-to-noise ratio is at 0.5 (i.e. 0.707*0.707=0.5), which means there is as much true talent level description in the stat as there is random variation.
So, when do some stats begin to stabilize? Here’s a chart of a few for your viewing pleasure:
Notice the quickest things to stabilize are all related to how often you swing; swing percentage, strikeout percentage, and contact percentage. Other things take very, very long to stabilize. Batting average, for example, takes longer than a full season to determine a true talent level. For some players, we’re already to the stabilization points for Swing% and K%. For others, we’re close. Let’s have a look at the first three and see if we can make any conclusions from the data:
(Note: The asterisk denotes that I also used Scott Schebler’s 2015 AAA numbers when determining his K%, and the double-asterisk denotes that I used Devin Mesoraco’s 2014 numbers in the 2015 column since he didn’t really have a 2015 to speak of.)
This chart is sorted by Swing Rate. I think if you’ve been watching the Reds this year you didn’t need a chart to tell you Brandon Phillips has been swinging at everything in sight. Now you have proof! Given that Swing% stabilizes rather quickly, we can make the statement that Phillips has likely altered his game plan and his new approach and true talent level has his swinging at many more pitches than at any other time in his career.
This chart is sorted by Strikeout Rate. Zack Cozart has been dazzling so far. If he truly changed his approach and we can trust his increased contact rate and decreased strikeout rate, he could be in for a career year, even if his luck on balls-in-play takes a major turn for the worse.
As expected, since Phillips is swinging way more than ever he’s striking out less. This may be counter-intuitive, but swinging early and often means you put the ball in play before you ever have the chance to strike out.
This chart has been sorted a third and final time, now by Contact%. This stat begins to stabilize around 100 PA, so there can still be some noise in these. But they can also be telling.
The point of this article was not to draw any huge, ground-breaking conclusions, but simply to introduce the idea of reliability and stabilization so we can discuss it more in the future!
I’d interested to hear what you guys can glean from the above data, as well as your eye balls! Post your findings on the comments below!
(Note: For those of you who were expecting a third RE24 article, I apologize. I decided three articles in a row about the same topic was probably not fun for readers who didn’t like the topic.)