Throughout the first two parts of this series, we’ve gone on a statistical journey to see which of the many baseball projection systems are the best predictors for fantasy baseball purposes. While we saw that Steamer’s projections generally perform much better than others, we also saw that averaging multiple projections into a combined projection works quite well too. But, perhaps there could be an even more accurate method of combining projections.
It was unspoken but understood that just averaging a few projections into one is a fairly simplistic method and that there likely could be even better projections created by providing certain weights to each projection system. Giving Steamer equal weight to the Marcel projections is likely not the best idea for hitter projections due to the evidence showing Steamer to be much more accurate. Last week, keeping that sort of logic in mind, I added the ability to brew up custom projections in the cheatsheets by putting different weights on the projections of your choice. At that time, I didn’t address the obvious question of what would be the best weights to use for a customized projection.
In order to answer that question, I did linear regression analysis to determine the relationship between the actual production from 2012 and 2011 and the available projection systems from those years. Using the results from this analysis, I was able to determine some weights that would have worked well over that time. Though the weights worked well in these two years, we can’t be certain that they’ll also be the best for this year but they represent an improvement over the simple combined projection system regardless.
I ran the analysis to determine the approximate weights that would be applied for best accuracy in each of the ten stats associated with 5×5 roto leagues (in addition to my home-brewed WERTH roto values). First, I ran analysis only for the 2012 projections/statistics as the MORPS projections weren’t around in 2011 and I wanted to include their work in this research. After looking at 2012 only, I ran the analysis with the 2011-12 years combined for a larger sample size (though Saves were missing for some projections I had so I excluded that stat).
As expected, some projection systems are better than others at specific stats so the weighting would be slightly different when trying to predict Stolen Bases versus RBI’s. You can see all of the weights that resulted from my analysis in this sheet here if you’re curious. This left us with the question of how to apply one overall weighting using all of those separate weights. Previously, I had done research here on which stats were most important to target for 5×5 roto owners and AVG and HR came out as the most significant ones as they had a direct effect on the other stats. For pitchers, WHIP and Strikeouts were the most important. With that in mind, I looked at the weighting in regards to those stats and tried to find a good balance.
For hitters, I found that using 45% Steamer, 25% ZiPS, 20% Fans and 10% CAIRO works quite well. This is interesting because you see four entirely different methodologies come together here in a way that accentuates the strengths of each. Looking at the 2012 results in particular, that type of weighting would have improved greatly upon the previous method of averaging four projections in a Combined projection. As shown in the previous research, the Combined projections had an average z-score rank of 1.3 for correlation and RMSE in my previous analysis. This new custom-brewed projection would increase it to 1.5 (and lower Steamer to 1.1, creating a bigger gap between first and second). So, yes, this would be a system that would theoretically perform significantly better than Steamer alone.
On the pitching side of things, the weights are a bit different. I found that a mix of 66% Steamer, 22% Marcel and 12% Fans created a strong system. Once again, these are three different approaches to projections. Marcel shows good results with the rate-based stats of ERA and WHIP while struggling with the counting stats and Steamer shows strengths in all areas while the Fans seem to do best with the counting based stats. Put them all together and you have a nice home-brewed system. In the analysis that included simple Combined projections, we saw that the Combined projection struggled to keep up with Steamer. This custom projection increases the average z-score rank from 0.76 to 1.11 (lowering Steamer from 1.30 to 1.17) when all players were included. When only including the Top 200 ADP or Top 300, the system surpasses Steamer.
However, this begs the question of whether doing a projection system that individually weights each stat differently would perform the absolute best. The short answer is “yeah, pretty much.” There was a small gain with the hitter projections in their overall accuracy when doing different weights for each stat. It wasn’t a huge jump but it was an improvement regardless. The bigger gain was for the pitcher projections though. With there being so many different types of predictions (counting stats like Strikeouts, rate stats like ERA and an opportunity-based stat Saves), I’m not surprised that a completely customized projection would be best.
For a comparison between the custom projections versus Steamer, check out the graphs below showcasing the various Root Mean Square Errors associated with each versus actual production.
Same comparison as above but for hitters |
The takeaway from all of this is that providing specific weights to the projections allows for increased accuracy over just averaging out a bunch of projections (which is still surprisingly effective). If you want a standard weight to use across the board, these are my recommendations:
Hitters
Steamer: 45%
ZiPS: 25%
Fans: 20%
CAIRO: 10%
Pitchers
Steamer: 66%
Marcel: 22%
Fans: 12%
I am also now going to include the completely-customized projection as an option to select in the cheatsheets (I’ll announce that update on Twitter when available) for those who are interested in having different weights for each stat as we can see how effective this might be as well.
Brian Jenner
03/16/2013 at 11:46 PMAre you using the same playing time projections for every system, or just using the raw stats? Because ZiPS and Marcel do not project playing time while Steamer and the Fans obviously do. I think ZiPS would be a much better contributor for the hitting stats if you first standardized all the systems to the same PA projection. It probably won't help as much in team dependent stats like R and RBI, but ZiPS is great for Average, SB and HR, providing the PA adjustment is made. Ditto on the pitching side with standard IP.
I typically use Steamer for R and RBI and ZiPS/Steamer mix for HR, AVG and SB. And on the pitching side, I use steamer for everything except I have my own formula for starting pitcher ERA which I combine with Steamer. And I standardize each system to the Plate Appearance/Innings Pitched value from the Fangraphs Fans projections. This has worked out very well in the past.
Luke
03/16/2013 at 11:51 PMAs you suspected, I just used the raw projections that were put out to the public for this particular analysis. I definitely agree that the playing time factor plays a large role in the accuracy of the projections. I plan to do some more research in that regard too. It will be interesting to see how these results change once playing time is neutralized. The playing time portion of the Fangraph Fans projection is likely their strongest point.
I like your method though. I could definitely see why that would work well. More research pending on this…
MP
03/17/2013 at 4:31 AMBrian — is there public research showing ZiPS' effectiveness predicting AVG, HR and SB rates? Or is that something you've discovered in your own work? Thanks.
j
03/20/2013 at 2:16 PMContinuing on what Brian said above, since some projection systems (like ZIPS) don't project playing time, I project PAs separately. If you consider counting stats as rate stats (HR/PA, RBI/PA, etc) you can then find the best HR rate and multiply by the improved PA projections to get a better estimate. This way, you don't discard useful data just because it's not scaled to playing time.