Fantasy projections are a vitally important part of your draft preparation every year. If you use bad data then you’re going to have a draft. It’s as simple as that. Within my Excel cheatsheets, you’ll find ten different projections to choose from which gives you a lot of power but also may cause confusion as you wonder which projection sources are most accurate for fantasy baseball. I analyze the accuracy of the projection systems each year to help us know which ones are most trustworthy.
The projection systems that I am most interested in using and analyzing are the ones that are data-driven and relying on years of trends and advanced baseball data. I think there’s something to be gained by taking human subjectivity and bias out of the projection process. Projections from CBS or ESPN are helpful but I think there are flaws in a human-driven process like theirs.
I’ve been running this site since 2010 and I’ve been analyzed the existing projection systems almost every season since that time. Some projection systems have already come and gone (rest in peace, CHONE Projections) while other projection systems have gotten more popular and acceptable in the industry. In most years that I have done this, I have found that the Steamer projections almost always finish at or near the top for hitters and pitchers. Beyond that, I have typically found that combining and averaging multiple projection sources is often very accurate as well (my Special Blend projections and Fangraphs’ Depth Chart projections being examples of those).
As we are on the brink of the 2017 season starting, let’s take a look back to one year ago when we didn’t know what the 2016 season would hold. We had a number of projections in our hands not knowing which ones would serve us best. I’m going to analyze how those systems ended up performing in comparison to last year’s actual results so we can get a sense of which projections are most useful for fantasy baseball.
The Projection Systems
First, the competitors! There are nine different projection sources included in this year’s analysis and here’s a description of the methodology for each in as simple terms as I can come up with:
- Steamer (3rd in hitters last year, 1st in pitchers) – Steamer uses five years of player data and treats each stat differently for regression purposes so each component within their projections basically uses a different projection system. They’re consistently a top performer in these tests.
- CAIRO (6th in hitters last year, 7th in pitchers) – From Replacement Level Yankees Weblog, it weights three years of stats but then also regresses the results not only for age but regresses differently depending on which position the player plays. In addition, certain statistics are regressed differently than others.
- ZiPS (5th in hitters last year, 6th in pitchers) – Developed by Dan Szymborski, this system takes the past four years of stats for each player, weights the more recent years heavily but then takes those results to look for comparable historical players to determine the aging regression trend to apply.
- Clay Davenport (7th in hitters last year, 5th in pitchers) – This method also looks for comparable players like ZiPS but then applies playing time weights and redistributes the stats afterwards based on team projections and to mimic last year’s league totals.
- MORPS (8th in hitters & pitchers last year) – Like CAIRO, MORPS takes four years of player data for weighting and then does regression dependent on player position but also player league (AL or NL) then applies playing time projections based on current depth charts.
- Fangraphs Fans (4th in hitters & pitchers last year) – This is a crowdsourced projection where users of Fangraphs can project a player and this averages all of those projections.
- Fangraphs Depth Chart (1st in hitters last year, 2nd in pitchers) – This is the first of our combination projections. First, it combines ZiPS and Steamer projections and then playing time is done by Fangraphs staff as opposed to letting the projections predict playing time.
- Mr. Cheatsheet’s Special Blend (2nd in hitters last year, 3rd in pitchers) – Like the Fangraphs Depth Chart, this combines other projections. However, each stat uses a different weighting system for the combination of projections. However, unlike Fangraphs Depth Chart, this lets the projection systems predict playing time.
- Steamer 600 (not used last year) – This is a modified version of the Steamer projection where it eliminates playing time projection and assumes every projected hitter will have the same PA (600) and every pitcher will have the same IP.
The Method
There are a few different statistical methods that can be used to measure accuracy in a study like this. Over the years, I’ve settled on Mean Absolute Error (MAE). I prefer this over RMSE (Root Mean Squared Error) because RMSE penalizes large errors too heavily for our purposes here.
When it comes to doing this test, I’m not looking to see if the projections accurately match the real world results. Sounds crazy for a study of projection accuracy, I know. What I’m looking to see is if the projection accurately predicted who would be above and below average in fantasy baseball. This is important because Steamer may think the average fantasy hitter will hit 20 HRs so a player projected in their system to hit 25 HRs is nicely above average. In the real world, we may find that the average fantasy hitter hit 15 HRs so someone actually hitting 25 HRs would have been way above average. So, what I test here is how many standard deviations above/below average the player’s projection was for each fantasy stat and see how that matched up to real world results at the end of the year.
To accomplish that, I standardize all of the projections for each statistic to get z-scores. I then use the Mean Absolute Error to compare the results. MAE averages out the difference between the projected z-score and the z-score of the actual 2016 results for our projected players. I do this with pitchers and hitters, taking into account playing time adjustments for rate stats like AVG, ERA and WHIP.
I run the analysis for all of the players who were being typically drafted in the fantasy drafts last preseason, as long as they actually accumulated enough PA’s or IP’s to be analyzed. When all is said and done, I see which projection system had the best MAE results across the board and crown a winner (no actual crown given though).
The Hitter Results
We’re interested in fantasy baseball here so I am analyzing the main 5×5 stats of HR, R, RBI, SB and AVG to see how the projection systems fared in projecting those last season.
Steamer has been the best at projecting hitters in most years that I’ve done this. They were still the best in my previous year’s test but the projections that combined a variety of projections (Depth Charts and Special Blend) fared better as you might expect. For that reason, I was a bit surprised with the 2016 results:
Despite the Special Blend and Depth Chart projections utilizing Steamer’s projections in their own weightings, Steamer itself outperformed them on its own. As someone who has been analyzing these for years, I gotta say that is damn impressive. My own Special Blend projections did well but were unfortunately dinged in the Runs and SB categories, likely because of weighitng Clay’s projections too highly for Runs and the Fans too highly for SBs. Adjustments were made for 2017 based off these findings.
Outside of that, ZiPS had a very good year and was especially good when it came to projecting SBs. I’m actually pretty surprised that the Fangraphs Fan projections were as horrible as they were. Those projections have a history of doing fairly well in the past. I also threw in the Steamer 600 projections to show that accurate playing time projections can really make a difference.
The Pitcher Results
For the pitchers, I also looked at the main 5×5 roto categories but I did leave out Saves. Not every projection system even has a projection for Saves and, honestly, they’re very random as a team’s manager has a lot of control over who gets Saves and who does not. Projection systems aren’t designed to predict a manager’s crazy brain. And, also, Saves are a dumb stat and suck and I hate that we rely on them so much in fantasy. But, hey, let’s move on.
I praised Steamer’s work in projecting hitters already but they’ve historically been even better at projecting pitchers. Oftentimes, their projections far surpass the competition when it comes to pitching projections. And, yes, Steamer has a history of even outperforming the combination projections of Special Blend and Depth Charts. The 2016 results paint a similar picture:
It’s actually very close with the top three here and then a pretty big drop-off after that. The Special Blend projections just slightly under-performed in their weightings for ERA and WHIP while the Depth Charts missed the mark with their Wins projections (likely because of a poor showing from ZiPS in their combination). But, man, Steamer is tough to beat even when combining and weighting projections. Kudos to their team for their work on that.
Conclusions
It goes without saying that last season was an epic showing for the Steamer projections. They took on some systems that did combinations that included their work and they crushed those systems. They deserve the Gold, Silver and Bronze medal for their work last year.
For me personally, these results gave me a chance to go back to the drawing board with my Special Blend projections. I had another year of data now to use in my weighting analysis and I thought about new ways to go about the weighting (using xStats and previous year stats to a certain degree).
Next year will be another year but it’s good that we see familiar faces at the top of these rankings each year. Steamer and my Special Blend projections are consistently in the top tier which gives you confidence to keep using either of them in your own cheatsheet rankings.
J
03/29/2017 at 2:25 PMHow would you expand out the exercise to include paid sources like Prospectus, Baseball HQ, and Mastersball? How deep is the player pool you use to come up with the average player by category?
Luke Gloeckner
03/29/2017 at 2:35 PMI generally stay away from including paid sources because not everyone has access to them (myself included). I’ve included PECOTA or Pod in the past and they still haven’t beaten Steamer from my experience. If someone is really curious, they’d have to send me those projections from last year and I could do an addendum to the article. As far as the player pool, after taking out players who didn’t meet cutoffs and only including the players who had projections across the board last preseason, there were 217 hitters and 125 pitchers to analyze.
The Hospitaller
03/29/2017 at 2:41 PMGreat article–it’s these reviews that give validity to the greatness of these cheatsheets!
Luke Gloeckner
03/29/2017 at 3:14 PMHaha, thanks!
DS
03/29/2017 at 6:33 PMThanks for this. A Marcel-free analysis of forecast systems. That decision alone puts you in the top 5% of fantasy baseball minds!
steven
03/30/2017 at 9:43 PMHi Luke – would prorating Steamer projections with DepthChart PT projections improve them? In other words, did the DepthChart PT projections provide any value?