One of the aspects of baseball that is the hardest to quantify and evaluate is fielding ability. Most
events in baseball, such as hitting events, are discrete which makes them easy to tabulate and model
probabilistically. The central difficulty with fielding is that we are trying to evaluate players on a
continuous playing surface where we must take into account not just whether a successful play was made, but
whether a successful play was possible. The much-maligned error statistic is a subjective
attempt at discretising this phenomenon: players are assigned an error if the official scorer deems that
their unsuccessful play should have been successful. However, tabulating errors isn't a good measure of
ability without a corresponding measure that credits a player for making a play that most players wouldn't
have.
Recent techniques such as Ultimate Zone Rating or the Plus-Minus system from The
Fielding Bible are based on the tabulation of both positive and negative
fielding events. These statistics are more accurate measures of fielding ability. However, despite being obvious improvements on previous methods, both of these approaches are
still based on dividing the baseball field into discrete zones and vectors, and tabulating events within
each zone. Ideally, the baseball field could be treated as the continuous playing surface that it actually
is, instead of a set of zones or vectors. Instead of tabulating fielding events within discrete zones, we fit
continuous probability distributions to each fielder based on their past fielding events.
Our raw data is from Baseball Info Solutions. For each grounder ball-in-play (g-bip), we have the (x,y) coordinates in the field where the g-bip was fielded, a "velocity" classification (ranging from 1-5) for the g-bip, as well as the number of outs made on the play. We defined any play where one out or more was made as "successful". Our evaluation procedure consisted of the following steps:
1. Estimating starting locations for each position
Our BIS data does not provide a key piece of information for each g-bip: the location of each fielder before the ball was hit. We estimate the starting location for each fielder as the (x,y) location in the field where each position has the highest overall probability of making a successful play. For each grounder, we then convert the bip coordinates into the angle at which the grounder was hit off of the bat. An angle of 0 corresponds to the 3rd base line while an angle of 90 corresponds to the 1st base line.
2. Fitting smooth models for the average fielder at each position
We model the probability of a successful play on a grounder as a smooth function of the angle between fielder location and the BIP path. We model different functions for each velocity category, and also allow a different function for fielders moving to the left or the right. These models are calculated using the data from all infielders, and so represent the ability of an aggregate fielder at each position. In the figure below, we show the probability model at each position for successful fielding of grounders with an intermediate velocity.
We see that each position has a distinct probability model. Note that pitchers seem to have a much larger range than the other infield positions only because they are much closer to home plate and therefore do not have to travel as much distance to cover the same range of angles from home plate.
3. Fitting player-specific models and calculating differences
We calculate the same probability models using only the data for each individual fielder and allowing different parameters for each player. Since we have different models for each individual player, we can quantify the difference between players by comparing their individual probabilities of making an out relative to the aggregate probability of making an out. As an example, the figure below illustrates the comparison on grounders between the aggregate model for the SS position and the individual models for the best and worst shortstops.
4. Weighted sum of player-specific differences
For each possible angle, we can calculate the difference D between particular fielder's probability of success and the aggregate probability of success. A rough measure of fielder ability is the sum over all possible distances of the difference (individual player - aggregate) in probability of not making a successful play. This sum is carried out by simple numerical integration. However, since not all distances occur with equal frequency, our SAFE measures are actually calculated as a frequency-weighted sum, so that more frequent distances or angles are more important. In addition, our sum is also weighted by the average run consequence of each angle, which allows us to take into account the different consequences of grounders to different areas eg. a missed grounder down the first base line leads to more bases than a missed grounder to the shortstop. The figure below shows the different weights that go into our aggregation.
Thus, for an individual player, their SAFE statistic can be interpreted as their expected runs cost/saved relative to the average fielder. A good fielder will have a large positive SAFE, which means a high number of runs saved, whereas a bad fielder will have a large negative SAFE, which means a high number of runs cost.
Our raw data is from Baseball Info Solutions, which was also used for The
Fielding Bible. For each ball-in-play hit into the air (a-bip), we have the (x,y) coordinates in the field where the a-bip was fielded, a "velocity" classification (ranging from 1-5) for the a-bip, as well as the number of outs made on the play. We defined any play where one out or more was made as "successful". Note that our balls-in-play into the air are subdivided into three different types: fly balls, liners, and pop ups. The following evaluation procedure is performed for each a-bip type separately:
1. Estimating starting locations for each position
Our BIS data does not provide a key piece of information for each a-bip: the location of each fielder before the ball was hit. We estimate the starting location for each fielder as the (x,y) location in the field where each position has the highest overall probability of making a successful play.
2. Fitting smooth models for the average fielder at each position
We model the probability of a successful play on a a-bip as a smooth function of the distance between the fielder starting location and the a-bip coordinates. We model different functions for each velocity category, and also allow a different function for fielders moving to the left or the right. These models are calculated using the data from all infielders, and so represent the ability of an aggregate fielder at each position.
3. Fitting player-specific models and calculating differences
We calculate the same probability models using only the data for each individual fielder and allowing different parameters for each player. Since we have different models for each individual player, we can quantify the difference between players by comparing their individual probabilities of making an out relative to the aggregate probability of making an out. As an example, the figure below illustrates the comparison on fly balls between the aggregate model for the CF position and the individual model for Darin Erstad in 2002
4. Weighted sum of player-specific differences
For each possible (x,y) coordinate, we can calculate the difference D between particular fielder's probability of success and the aggregate probability of success. A rough measure of fielder ability is the sum over all possible (x,y) coordinates of the difference (individual player - aggregate) in probability of not making a successful play. This sum is carried out by simple numerical integration. However, since not all a-bip coordinates occur with equal frequency, our SAFE measures are actually calculated as a frequency-weighted sum, so that more frequent (x,y) coordinates are more important. In addition, our sum is also weighted by the average run consequence of each (x,y) coordinate, which allows us to take into account the different consequences of a-bips to different areas eg. a missed a-bip into the outfield power alley has a higher consequence compared to a missed pop-up in shallow outfield. The figure below shows the major differences in consequences of a-bips to different areas of the field
Thus, for an individual player, their SAFE statistic can be interpreted as their expected runs cost/saved relative to the average fielder. A good fielder will have a large positive SAFE, which means a high number of runs saved, whereas a bad fielder will have a large negative SAFE, which means a high number of runs cost.
We have described our methodology for calculating SAFE for grounders as well as balls hit into the air (which includes liners and fly balls). For each player in each season (2002-2008), their SAFE values within each ball-in-play type
are added up over all appropriate ball-in-play types. For infielders, their combined SAFE values consists predominately of grounder balls-in-play (g-bip) but also include infield flys or liners. For outfielders, their combined SAFE values are aggregated across all ball-in-the-air types (fly balls and liners). These combined SAFE values are available in raw form at the link below:
My student James Piette wrote his Ph.D. thesis on "Evaluating Fielding Ability in Baseball Players Over Time". Instead of treating each player-season as independent, he combines information over time within a player. Three new models are compared: a constant-over-time model, a moving average age model and an autoregressive age model. You can download a copy of his thesis here.
These SAFE values for these new time-series models are available at the link below:
The SAFE estimate columns vary between models, as the first two models are season-specific (i.e. columns refer to season) and the last two models are age-specific (i.e. columns refer to age). There are also, additional pairs of columns; for the constant-over-time model, columns related to the player-specific estimates are included and for the autoregressive age model, columns related to the initial state estimates are included. In general, the SAFE estimate columns can be read like the following: mean refers to posterior mean, int1 refers to lower bound on posterior 95% interval and int2 refer to the upper bound.
Below, we give the SAFE values for each infielder, averaged over the 2002-2008 seasons. Positive values indicate runs saved whereas negative values indicated runs cost. Within each position, fielders are ranked from best to worst. Only fielders for which we have enough data (at least 1000 BIP faced) are included. These averages are weighted by the number of BIP faced by the player in each year, but keep in mind that some of the values below may be based on only one or two years worth of data. As mentioned above, the full year-by-year data is available here
Below, we give the SAFE values for each outfielder, averaged over the 2002-2008 seasons. Positive values indicate runs saved whereas negative values indicated runs cost. Within each position, fielders are ranked from best to worst. Only fielders for which we have enough data (at least 1000 BIP faced) are included. These averages are weighted by the number of BIP faced by the player in each year, but keep in mind that some of the values below may be based on only one or two years worth of data. As mentioned above, the full year-by-year data is available here