Throughout the first two parts of this series, we’ve gone on a statistical journey to see which of the many baseball projection systems are the best predictors for fantasy baseball purposes. While we saw that Steamer’s projections generally perform much better than others, we also saw that averaging multiple projections into a combined projection works quite well too. But, perhaps there could be an even more accurate method of combining projections.
It was unspoken but understood that just averaging a few projections into one is a fairly simplistic method and that there likely could be even better projections created by providing certain weights to each projection system. Giving Steamer equal weight to the Marcel projections is likely not the best idea for hitter projections due to the evidence showing Steamer to be much more accurate. Last week, keeping that sort of logic in mind, I added the ability to brew up custom projections in the cheatsheets by putting different weights on the projections of your choice. At that time, I didn’t address the obvious question of what would be the best weights to use for a customized projection.
In order to answer that question, I did linear regression analysis to determine the relationship between the actual production from 2012 and 2011 and the available projection systems from those years. Using the results from this analysis, I was able to determine some weights that would have worked well over that time. Though the weights worked well in these two years, we can’t be certain that they’ll also be the best for this year but they represent an improvement over the simple combined projection system regardless.
I ran the analysis to determine the approximate weights that would be applied for best accuracy in each of the ten stats associated with 5×5 roto leagues (in addition to my home-brewed WERTH roto values). First, I ran analysis only for the 2012 projections/statistics as the MORPS projections weren’t around in 2011 and I wanted to include their work in this research. After looking at 2012 only, I ran the analysis with the 2011-12 years combined for a larger sample size (though Saves were missing for some projections I had so I excluded that stat).
As expected, some projection systems are better than others at specific stats so the weighting would be slightly different when trying to predict Stolen Bases versus RBI’s. You can see all of the weights that resulted from my analysis in this sheet here if you’re curious. This left us with the question of how to apply one overall weighting using all of those separate weights. Previously, I had done research here on which stats were most important to target for 5×5 roto owners and AVG and HR came out as the most significant ones as they had a direct effect on the other stats. For pitchers, WHIP and Strikeouts were the most important. With that in mind, I looked at the weighting in regards to those stats and tried to find a good balance.
For hitters, I found that using 45% Steamer, 25% ZiPS, 20% Fans and 10% CAIRO works quite well. This is interesting because you see four entirely different methodologies come together here in a way that accentuates the strengths of each. Looking at the 2012 results in particular, that type of weighting would have improved greatly upon the previous method of averaging four projections in a Combined projection. As shown in the previous research, the Combined projections had an average z-score rank of 1.3 for correlation and RMSE in my previous analysis. This new custom-brewed projection would increase it to 1.5 (and lower Steamer to 1.1, creating a bigger gap between first and second). So, yes, this would be a system that would theoretically perform significantly better than Steamer alone.
On the pitching side of things, the weights are a bit different. I found that a mix of 66% Steamer, 22% Marcel and 12% Fans created a strong system. Once again, these are three different approaches to projections. Marcel shows good results with the rate-based stats of ERA and WHIP while struggling with the counting stats and Steamer shows strengths in all areas while the Fans seem to do best with the counting based stats. Put them all together and you have a nice home-brewed system. In the analysis that included simple Combined projections, we saw that the Combined projection struggled to keep up with Steamer. This custom projection increases the average z-score rank from 0.76 to 1.11 (lowering Steamer from 1.30 to 1.17) when all players were included. When only including the Top 200 ADP or Top 300, the system surpasses Steamer.
However, this begs the question of whether doing a projection system that individually weights each stat differently would perform the absolute best. The short answer is “yeah, pretty much.” There was a small gain with the hitter projections in their overall accuracy when doing different weights for each stat. It wasn’t a huge jump but it was an improvement regardless. The bigger gain was for the pitcher projections though. With there being so many different types of predictions (counting stats like Strikeouts, rate stats like ERA and an opportunity-based stat Saves), I’m not surprised that a completely customized projection would be best.
For a comparison between the custom projections versus Steamer, check out the graphs below showcasing the various Root Mean Square Errors associated with each versus actual production.
|For pitcher projections, the Root Mean Square Error (RMSE) for each stat after they are standardized. Comparison between Steamer projections, a projection system weighting all stats or one weighting each stat separately|
|Same comparison as above but for hitters|
The takeaway from all of this is that providing specific weights to the projections allows for increased accuracy over just averaging out a bunch of projections (which is still surprisingly effective). If you want a standard weight to use across the board, these are my recommendations:
I am also now going to include the completely-customized projection as an option to select in the cheatsheets (I’ll announce that update on Twitter when available) for those who are interested in having different weights for each stat as we can see how effective this might be as well.