Fangraphs and Baseball Savant are two of the titans of baseball data and research. Obviously, most normal people don’t know the first thing about either site. However, fantasy baseball analysts and players alike browse both religiously. The seemingly infinite amount of data presented gets the heart rate elevated with excitement for us addicts.
Some people use face value stats to get their opinions on players. Others combine stats to make metrics that they feel matter. Being the latter, things get quite time-consuming. After spending hours upon hours researching, comparing, contrasting, and creating data, this junkie has created countless “metrics” and statistics for personal use.
While personal research is fun, nothing ever seems to be sticky with any real-life numbers that matter. However, through multiple trials, multiple data points, and a basic knowledge of pitching, a metric was discovered that looks to be nearly 100% correlated to ERA. In this article, that metric will be discussed in detail, along with results between 2018-2020.
What?! Your league is not planning on using Fantrax? Inconceivable! Check out everything Fantrax has to offer and I’m sure you’ll come around to our way of thinking.
Amplified Run Average (ARA)
Amplified Run Average, or ARA for short, attempts to emulate Earned Run Average (ERA) using six different variables. These variables are common, everyday numbers seen at the forefront of a player’s profile. One of them is altered to better adjust to ERA, but nonetheless, they are simple numbers.
Below, the variables will be explained, as well as arriving at the final result. That final result shows a near-direct correlation to ERA, as the R² value is a strong 0.9357. In case the reader does not understand how R² values work, it is on a scale of zero to one. Zero meaning there is absolutely no correlation between the two objects being compared. One meaning there is a direct correlation, whether positive or negative. Perhaps an easier way to look at it is like a percentage. The correlation between ARA and ERA is 93.57% correlated between the 2018-2020 seasons. Without further ado, here are the variables used in ARA.
The first statistic used was strikeout rate (K%). There perhaps isn’t a more dominant number a pitcher can show than his strikeout rate. It shows the pitcher’s ability to fool a hitter consistently and effectively. The latter being that it directly results in an out, therefore lowering ERA. Here is the starting point, and here is the current correlation to ERA:
A 0.2437 R² value is decent for such a simple number as strikeout rate. We are off to a good start. Now it is time to include its counterpart to get one of the best metrics we use in today’s game.
Walk rate is an extremely important metric in that it is highly correlated to a pitcher’s ability to control his pitches. The better a pitcher is at controlling his pitches, the more likely the pitcher is to locate the pitch where they want. However, walk rate by itself has almost no correlation to ERA as can be seen below:
Subtracted from strikeout rate, now we have K-BB%, which is considered a metric itself. K-BB% shows a key trait to a pitcher, the ability to not only fool a hitter, but to fool a hitter while showing control. This is the first spike in correlation to ERA that we see. K-BB% shows why it is such a highly thought of metric:
With the core statistic set, another metric quickly came to mind when it came to suppressing runs. That was groundball rate. While it doesn’t have a direct correlation to ERA, it still has supreme value for the final formula. Since we already have K-BB%, we can look at the formula now as (K-BB)+GB%:
For those doubting the inclusion of groundball rate, not only does the above graphic show a strengthened correlation, but the final result is strongly supported by groundball rate. Despite groundball rate alone having a small direct correlation to ERA, when it is subtracted out of the final formula for ARA, the correlation drops from 0.9357 to 0.9011. Groundball rate has meaning in ARA.
And now we get into the three variables that are directly tied to runs allowed and hits allowed. The first of that bunch is home-run-to-fly-ball ratio. It is a self-explanatory metric, as it is simply the percentage of home runs hit on fly balls. As we all know, home runs lead to runs, which directly affects ERA. Simply subtracting HR/FB rate from our current formula (K-BB+GB%), and our correlation continues to strengthen, now up to 0.4748.
This metric was included to help support fly ball pitchers who are skilled at suppressing home runs. This includes Justin Verlander, Trevor Bauer, and Max Scherzer, among others. HR/FB rate is known to fluctuate from season to season and is a large reason why ERA fluctuates season to season. For this article, we will stick to it at face value since we are simply showing the ERA correlation. However in another article to come, HR/FB rate will be adjusted. Those changes to HR/FB rate (along with the other variables to come) will show an adjusted ARA score. This adjusted ARA score will show an expected outcome given past performance, and will hopefully yield predictiveness to future seasons.
Weighted-On-Base-Average on contact (wOBAcon) measures not only hits allowed by a pitcher when the ball is put in play, but it weighs the types of hits differently. A single is worth less than a double, doubles less than a triple, and triple less than a home run. This is one of the more underrated stats in all of baseball. If a pitcher is giving up singles instead of home runs, they deserve more credit, even if they are both hits. All hits are not created equal.
wOBAcon at it’s core is already fairly strongly connected to ERA, with an R² value of 0.5767 between 2018-2020. Added into the current ARA formula, and our correlation continues to grow:
You probably haven’t seen this statistic because it doesn’t exist. This is an alteration to left-on-base percentage (LOB%) that you can find at the top of a pitchers Fangraphs page. The only difference is that instead of using runs allowed in the formula, it was changed to earned runs allowed. Obviously, adding any metric that includes earned runs will strengthen the correlation to ERA. Using LOB% still strengthens the correlation significantly, as it is strongly related to ERA. However when using earned runs instead of runs, the correlation jumps.
Also of note, erLOB% is the only metric that has a coefficient. Multiplying by two takes the correlation from 0.8791 to the end result, 0.9357.
Final Formula & Converting to ERA Scale
Here is the final formula:
ARA Score = (K%-BB%+GB%-(HR/FB)-wOBAcon+(2*erLOB%))*100
Surprisingly not overcomplicated, none of the variables had to be adjusted outside of erLOB%. They were taken at face value, and they produced a strong result. Among 234 samples, the tightness of the circles on the trendline show this is a near-perfect correlation. Success!
With the correlation shown, the ARA score needs to be converted so we see it on an ERA scale. That is simply done by plugging in numbers into the formula seen at the top right of the correlation graph above. That formula is:
ARA = (ARA Score – 234.43) / -18.036
The final ARA metric is complete. Now that the number has been reached, how about some actual results? Without further ado, here are the best ARA’s since 2018, along with ERA, and the difference between the two. For any positive numbers in the “ERA-ARA” column, this suggests the pitcher was as good or better than his ERA. Negative numbers in the column indicate the pitcher overperformed their ERA.
|Hyun Jin Ryu||2019||2.32||2.17||0.15|
|Hyun Jin Ryu||2020||2.69||2.41||0.28|
|Lance McCullers Jr.||2020||3.93||3.94||-0.01|
How Interpreting ARA Can Help
Plenty of pundits out there are likely wondering why does this matter? Great that it’s strongly connected to ERA, but why does it matter? Is it predictive, or simply descriptive? After all, that’s what we want, a predictive metric. Bluntly put, it isn’t predictive since it is so correlated to ERA. Typically, when a pitcher’s ERA goes up and down, so does the ARA. And as we all know, outside of the elite pitchers, ERA itself fluctuates from pitcher to pitcher every season. They try new pitch mixes, have increased and decreased velocity, try different arm angles, release points, etc. If you are looking for the perfect predictive metric to ERA, you won’t find it. But ARA can be useful if you understand it at its core.
Examining the Volatile Variables
Examining each of the six variables, and figuring out the percentage (on average) each makes up of the entire formula is a good starting point. Here are those league average numbers for the six aforementioned variables, and the final percentage each makes up of the final formula:
erLOB% is the dominant statistic in the formula, making up (on average) 56% of the entire score. It was already the highest percentage used before doubling it, which only further strengthened the weight. Understanding that erLOB% fluctuates from season to season is key, and this is one of the primary reasons ARA will fluctuate. Using a population of 68 starting pitchers who pitched over 150 innings in 2018 and 2019, and over 50 innings in 2020, the standard deviation of erLOB% across the population was 6.3%. In other words, between these 68 pitchers, erLOB% could be expected to fluctuate 6.3% above or below the seasons prior. Doing a quick conversion between ERA and erLOB%, and that roughly comes out to be a full point higher or lower on the ERA scale.
This knowledge will be used in an article to come. erLOB% will be adjusted to league average for small sample pitchers, and a pitchers average for large samples in order to come up with an adjusted ARA metric. But for now, we need to understand that pitchers with high erLOB% will almost assuredly see a rise in their ERA the following season, and vice versa. There is an exception with good pitchers who have high strikeout rates, as they can continuously strand runners at a higher rate. Below you can see erLOB% since 2018:
|Hyun Jin Ryu||2019||85.9|
|Hyun Jin Ryu||2020||83.3|
|Lance McCullers Jr.||2020||72.6|
HR/FB & wOBAcon
Home run per fly ball is a more known statistic, so less time is necessary to be spent talking about it. HR/FB is another statistic that fluctuates and has near-identical standard deviation numbers to erLOB%. With that being the case, HR/FB rate can be looked at the same way as erLOB%. If a pitcher hasn’t shown the tendency to suppress home runs, and yet was below an 8% HR/FB rate on the season, then we know that could be a prime candidate for ERA regression and vice versa. All of this is fairly common knowledge in the fantasy industry.
wOBAcon is using a slightly tighter scale, so it doesn’t technically fluctuate as much as the aforementioned two. However, it still needs to be adjusted for potential positive or negative regression. Use the same logic applied to erLOB% and HR/FB rate and find the potential outliers.
As mentioned previously, adjusting these variables will be a major topic of the next article to come. Using an adjusted ARA score will hopefully give us more predictability than the ARA at face value.
K%, BB%, and GB%
These three are more skill than luck. While these numbers can change when pitchers make significant changes, there isn’t much logic in trying to adjust these numbers. Take these at face value and give the pitchers credit where credit is due.
In sum, ARA is not meant to be a brand new metric used on the front page of websites. It should be used as another way of understanding ERA through the six metrics talked about throughout the article. While analysts and players alike may already use these numbers every day in their research, the combination of the six gives us an undeniable correlation to ERA.
Understanding that ARA is nearly directly correlated to ERA is vital for this article. With this understanding, we can go further into pitchers who were perhaps lucky or unlucky using erLOB%, HR/FB, and wOBAcon. Ultimately, an adjusted ARA score will be attempted in a succeeding article, with hopes that it will more accurately project whether or not a pitcher performs better or worse with less luck (and potentially more skill) factored in.
For more great rankings, strategy, and analysis check out the 2021 FantraxHQ Fantasy Baseball Draft Kit. We’ll be adding more content from now right up until Opening Day!
Fantrax was one of the fastest-growing fantasy sites of 2020 and we’re not stopping now. With multi-team trades, designated commissioner/league managers, and drag/drop easy click methods, Fantrax is sure to excite the serious fantasy sports fan – sign up now for a free year at Fantrax.com.