Let us discuss the concept of Batting Average on Balls In Play, or BABIP.  In simplest terms, a player’s BABIP is his batting average while ignoring strikeouts and home runs. Why ignore those? Because neither play results in a “ball in play” that could possibly be converted to an out by the defense. The goal of BABIP is to describe how difficult it was to get a player out when he put the ball in play.

Many factors effect a player’s BABIP; a few examples being speed, propensity to hit the ball hard, and batted-ball profile (think line drives and pop-ups!)

Being fast (think Billy Hamilton!) makes it easier to leg out infield singles, thus increasing a player’s BABIP. Naturally, you’d expect someone like Billy to have a higher BABIP for grounders than an average player; he does. For his career, Billy is batting .302 when he puts the ball on the ground, while the average player in 2016 batted .239 on grounders.

Hitting the ball hard (think Giancarlo Stanton!) will also increase your BABIP. When a ball is hit hard, defenders have less time to react and reach the ball before it falls in for a hit.  Hitting the ball hard, more often, will lead to higher batting averages. This should be very intuitive.

There are many other factors that play into BABIP. Obviously, hitting line drives is preferable to hitting pop ups (think Joey Votto!). Along the same line, it is nice to spray the ball around enough that a team can’t easily employ a defensive shift (think Jay Bruce!).

In short, everything you do in the batter’s box accounts for your BABIP. Each player, based on all these factors, will have a unique BABIP profile. An average player will have a BABIP around .300. Each of the last 5 seasons saw league-wide BABIPs no lower than .297 and no higher than .300.

For the sake of demonstration, let us look at Joey Votto, since he happens to be a historic example of BABIP prowess. He hits the ball hard regularly, hits gobs of line drives, avoids pop-ups better than just about anyone in history, and usually sprays the ball to all parts of the field. Because of this, and despite the fact that Votto doesn’t possess great speed, he has posted a career BABIP of .359. Said another way, when Votto puts the bat on the ball (and it doesn’t leave the yard), it has turned into a hit 36% of the time. This happens to be tied for 3rd all-time. Like…in all of history. He trails only Ty Cobb and Rogers Hornsby and is tied with Rod Carew (min 4000 PA, no 1800s guys!). Folks…please appreciate the fact that you get to watch an all-time great in the batter’s box on a daily basis.

Switching gears a bit, now that we understand how BABIP is calculated and what affects it, let’s talk about the “L word:” luck. Many productive conversations on player valuation have broken down when the “L word” is invoked.

Since we know an average player with an average batted-ball profile should have a BABIP around .300, we might sometimes say a guy has been unlucky if he maintains an average batted-ball profile and has a BABIP lower than .300. Conversely, we sometimes say a guy has been lucky if he maintains an average batted-ball profile and has a BABIP north of .300.

In these terms, the word luck is used to express the mathematical idea of variance.  Inherently, we all understand the concept and accept it without the need for math. We know, as baseball fans, if a guy hits five pop-ups in a row and all five fall in for base hits, the batter’s true talent level is not a 1.000 AVG. Along the same lines, if a batter hits five screaming line drives that all happen to be hit directly towards a defender, we know that batter’s true talent level is not a .000 AVG. The answer lies somewhere in between.

It is very important to understand that batted ball profiles can vary quite a bit from player to player. It follows, then, that we should expect a player’s BABIPs to vary quite a bit.  For example, it is simultaneously possible for a guy running a.330 BABIP to be unlucky while his teammate running a .295 BABIP is lucky.

So, how do we describe the luck factor for each player, and how do we figure out if that factor is within expectations based on the way that player is hitting the ball?

The answer, of course, is xBABIP; or, expected batting average on balls in play. This is a framework used to describe, with the greatest accuracy we can, how often a player’s hits should be falling in for hits, based on what the player is actually doing. This isn’t hocus-pocus prediction; this is honest-to-goodness multi-variate regression using historical data! Three cheers for historical data!!

Several days ago, a fantastic writer/researcher by the name of Mike Podhorzer released a version of a formula that calculates a player’s xBABIP which now also incorporates defensive shifts. The fact that some players are shifted quite frequently has been a common missing factor in many of these xBABIP equations up to this point.

As usual, these types of equations can be calculated using publicly available data and using FanGraphs split tools. The process is outlined in the article linked above. The inputs for this equation are as follows:

– Speed Score (Spd)
– Hard-Hit Rate (Hard%)
– Line Drive Rate (LD%)
– Fly Ball Rate (FB%)
– Infield Fly Ball Rate (IFFB%)
– Ground Ball Rate (GB%)
– Pull Rate while Shifted (pullGBshift%)
– Not Shifted Balls in Play (NoShBIP)
– Shifted Balls in Play (ShBIP)
– Rate of BIP While Shifted (%BIPSh)

It might seem like a lot, but following the outlined formula in the article, and using a spreadsheet to keep yourself organized, it is quite straightforward.

I decided to go ahead and calculate the eight projected Reds starters with the new system and the old system (that did not incorporate shifts) and then compare the two.

Mostly I did this for fun. And mostly, my idea of fun is viewed as “odd” by many. Regardless, here is a chart to gaze upon!


Looking first at the 2016 Actual BABIP column, one might conclude that Joey Votto and Jose Peraza were very, very lucky! Their BABIPs were way over .300! No way they can sustain those .360+ numbers! Well, what do the peripherals say?

Check out the xBABIP columns; they incorporate all the peripherals we care about. While not quite as high as the actuals, inflated xBABIPs show both players were doing the things that need to be done to carry an inflated BABIP. That is to say, we shouldn’t expect either player to fully regress to league-average this year. (Peraza, given his tiny sample size, however, is still a candidate for huge BABIP regression. Joey Votto is not.)

As you might expect, a player who pulls grounders at a high rate (Scott Schebler) is going to have a big drop between old and new, now that we are incorporating shift data. In 2017, we probably should not expect Schebler to maintain an above-average BABIP unless he changes his batted-ball profile.

Zack Cozart’s big delta is interesting, given that righties are not shifted a ton. I think it likely has something to do with the new equation weighting each event slightly differently.

For those who care, this new model is the first one of this type (i.e. – not using granular ball-in-play exit velocity data) which has broken an r-squared of 0.5. Again, more on that in the article linked above.

Really, there’s nothing that groundbreaking here from an analysis perspective, but I ran these numbers and figured I’d share with everyone!  Baseball is upon us!!

8 Responses

  1. Gaffer

    Cozart makes sense as he is a largely 3 outcome player (not the usual ones). He either pops up to RF/2B, pulls it to SS ON THE GROUND, or hits a line drive. Therefore I expect a low BABIP. Also, Duval is not a good thing to have a .300 BABIP because he strikes out so much, he needs a much higher hit rate when making rare contact.

  2. cfd3000

    Yes, Patrick, your idea of fun is a little unusual. But we benefit so you won’t ever hear me with the criticism! One point that jumps out at me is the lack of inputs that account for using more (or less) of the field. Except for the “pull when shifted” factor there doesn’t appear to be anything to differentiate between a dead pull hitter and one who uses the whole field. Presumably that would influence both actual and expected success. It’s that old suspect eye test but I recall Schebler generating much better numbers later in 2016 as he started to spray the ball more. That might shake some of the “luck” out of his row and suggest that he has more potential than this method assigns him. Thoughts? (And thanks for the great info).

  3. Scott Carter

    You always do good job of breaking down the numbers Patrick.

  4. jazzmanbbfan

    I am very much a “non-expert” and I find your explanations quite helpful. Keep ’em coming.

  5. VaRedsFan

    Great article PJ…
    A few questions about shifts. Does the middle infielder have to cross the other side of 2nd base for it to be considered a shift? Many teams “shade” a guy one way or another without a full-blown shift.
    Is infield in considered a shift? Billy has the 3rd baseman breathing down his throat for 75% of each of his AB’s, as well as the rest of the infielders playing a little more than half way just because they can’t throw him out if they play normal depth.

  6. big5ed

    Wee Willie Keeler. Hit ’em where they ain’t.

  7. big5ed

    BABIP gives me some hope on Phillip Ervin, who had baffling home/away stats last year. He had a .660 OPS in games at Pensacola, but .854 on the road, but he had a much higher walk rate at home. I calculate (not having SF info) his road BABIP at .302 and his home BABIP at .239 last year.

    It likely isn’t all BABIP to explain the poor home showing, but he is probably better than his regular stats show.

  8. Jeff

    My favorite inputs are ShBIP and NoShBIP. They should both come with exclamation points after them!