Please allow me to start out by saying this may be a fool's errand. Trying to unearth a new baseball metric is almost blasphemous to some. The stats-crowd pores over data every season looking to gain an edge and while this should be considered a work in-progress, we just may have stumbled onto something big here. With the help of Brian Creagh from Tableau on Twitter and ExpandTheBoxscore.com, we work-shopped an idea born from a fondness of an up-and-coming NFL staple called Air Yards, which is the total distance a football is thrown beyond the line of scrimmage to the point of reception. We believed we could create something similar for baseball which just might help give us all the edge we seek.
It starts with a look at Josh Hermsmeyer, who does excellent work pushing the boundaries of stats on the NFL side with Airyards.com. To put it simply, Air Yards are predictive because they don't have the noise of some other metrics like Targets. If a wide receiver is being thrown the ball at a significant rate and distance, he's likely to produce in fantasy. I wanted to apply that line of reasoning to baseball and we came up with “Fly Index”, a lens to measure a hitter’s fly-ball distance. If a hitter is hitting balls with both frequency and distance, they are likely to produce in fantasy, at least on the power spectrum. This treads closely on ISO’s territory but doesn’t account for the result of the play, merely the distance the ball was hit.
Our aim is to identify a way of predicting the players who have the best chance of hitting home runs, similar to barrels and exit-velocity. Does a large amount of fly-ball distance equate to a large number of home runs? Unsurprisingly, yes. Players who hit more fly-balls naturally have more home runs. The next step was to see who had a large fly-ball distance but didn't have the home runs to accompany it, i.e. potential buy lows. To account for the uneven nature of plate appearances Brian took “average first half distance” and compared it to second half HR% in 2018. The color on the chart below represents each player’s first half HR-rate on fly balls. Dark green being the highest HR% and dark red being the lowest.
Some of the outliers in ballparks skewed to hitting or pitching make sense. What we're looking for are red dots in the upper-right quadrant (low first half HR-rates, but hitting the ball far thus a potential spike in the second half) or green dots in the lower left (high HR-rates in the first half due to "luck" and thus negative regression coming in the second half). We found a very interesting poster boy. Christian Yelich had an above-average first half HR-rate on fly balls of 31%. Based on the distance he was hitting them in the first half, however, his second-half run wasn't crazy at all. We can pick out a few more examples that would've helped for fantasy purposes.
Players this model said were going to improve and did: Maikel Franco , Kendrys Morales , Trevor Story , Michael Conforto , and Daniel Palka .
Players this model said would come back to Earth and did: Yangervis Solarte , José Martínez , Curtis Granderson , and Kiké Hernández .
Players who got off to hot starts and this model said would continue to produce: Christian Yelich , Max Muncy , Javier Báez , David Peralta , Joey Gallo , and J.D. Martin ez.
To hone in on the most effective predictor, Brian calculated each player’s 85th percentile measure for batted-ball distance. If a player hit 100 balls into play, we are using their 85th longest hit as our measure. Another way to interpret this using real numbers: Joey Gallo 's Fly Index was 343 feet - this means 15% of his balls in play were longer than 343 feet, and 85% were shorter. We care about that next tier of contact that isn't quite a home run. The 85th percentile seems to work really well because the top 5-to-7% are typically home runs. We care about that next tier of contact that isn't quite over the fence.
To explore the validity of this Fly Index, Brian looked at the correlation of a few things in the graphs below:
1. The relationship between a batter's barrel% in 2017 vs the number of HR per PA hit in 2018, which had an R-squared of 0.477, or the statistical measure of the proportion of variance for a given variable.
2. The relationship between a batter's HR total in 2017 vs the number of HR per PA hit in 2018, which had an R-squared of 0.467.
3. Our Fly Index metric of 85th-percentile hit distance in 2017 vs the number of HR per PA hit in 2018, which had an R-squared of 0.533. In other words, Fly Index for 2017 data explains 53% of the variance in HRs (per PA) in 2018, explaining more variability than either 2017 barrel% OR HR/PA.
The good news: our metric stacks up really well in comparison. In fact, it has a higher correlation to HR per PA for the next season then barrel% and the number of prior HRs.
The bad news: the metric deteriorates somewhat as you decrease the number of plate appearances. Once the PA threshold is dropped to 300 PAs in both seasons, barrel% becomes a better predictor. We're still in the ballpark at these lower PA thresholds, though. So Fly Index may surface breakout candidates that Barrel% doesn't.
Here are players from 2018 with a high Fly Index that look likely to see an uptick in HR rate: Teoscar Hernández , Daniel Descalso , Justin Smoak , Paul DeJong , Albert Pujols , Tommy Pham , and Max Kepler . Consider buying them as potential power values in 2019.
Here are the leaderboards for our flyball metric at 75 and 200 plate appearances. A simple interpretation of these lists is to highlight players towards the top with low HR%. These are players who could see a surge in home run production in 2019. Some top names include Tyler O’Neill, Alex Bregman , Teoscar Hernández , Daniel Descalso , and Jake Lamb .
Our hope is to predict some breakouts for the 2019 season, as well as possible in-season regression candidates, by monitoring Fly Index as the season goes. We’ll be following this post up with periodic updates on the metric. Hopefully some of these finds produce power for your fake teams. We look forward to a healthy discussion on this new predictive front.
**One note on Brian’s process. The HR-Rates for 2018 may be off by fractional percentage points. He reverse-engineered the count of plate appearances based on the result of the play.
Something like a hitter getting to the plate with 2 outs and a runner gets caught stealing on an 0-2 count that was a ball can occur. Technically a PA doesn't get registered but may be calculated here. It shouldn't have any real impact on the results but we wanted to add the caveat that the numbers aren't 100% perfect.