What is the best statistic for measuring defense? (2024)

There are a lot of publicly available defensive metrics floating around the baseball world, making it a confusing ecosystem for fans who don’t enjoy putting their hands (or at least their eyes and brains) in the guts of a mathematical formula

DRS? UZR? FRAA? OOA? How are we supposed to know what to use? Should we average them? Or maybe should we just give up?

The answer is that, in the sabermetric world, we test.

Baseball Prospectus and their fantastic analysis team has yet again done the hard work for the baseball world, with an article evaluating all the major defensive metrics, head to head. The article, Measuring Defensive Accuracy in Baseball, by Jonathan Judge and Sean O’Rourke, is free for all, although you must have a Baseball Prospectus account to read it.

Spoiler — the analysis’s results can be easily distilled as such (Note: for the purposes of their analysis, pitcher and catcher defense was excluded, because not all systems currently measure it):

  • In the Outfield, Baseball Prospectus’s Fielding Runs Above Average (FRAA) is the “best” defensive metric
  • In the Infield, MLB’s Outs Above Average (OAA), which is based on Statcast and still evolving, is the “best” defensive metric.
  • Overall, FRAA performed the best, followed by STATS Inc’s private statistic Runs Effectively Defended (RED) and Sports Information Systems’s publicly available statistic Defensive Runs Saved (DRS).

Scare quotes are mine, because the more interesting part than the straight ranking results is the details of the test itself, the philosophical questions it touches on about what’s worth evaluating, and how it hints at the complex nature of baseball.

I love this type of analysis. Detailed metric testing is the best, once you get into the weeds. Let’s step through a few of those weeds.

Of Course FRAA Won

I don’t just mean this because FRAA is a BP metric and BP was the ones doing the test, like in a conspiracy way. It’s because the test and the FRAA metric are based on some of the same set of assumptions.

The test that the BP team put each metric through is very clever. Basically, it assumes that teams with good fielders turn more balls in play into outs. Therefore, a team full of players rated highly by a good defensive metric should turn more balls in play into outs. If the team-level data and the player-level data match up well, the metric is successful. If they don’t then the metric is less successful.

FRAA is actually the simplest metric in the test. Basically, it’s counting how many outs a defender completes. The makers of FRAA are fully aware that there’s a lot more complexity to any given defensive play than made/not made, but they count on the size of their sample to even that out over time. It’s not a bad bet.

When you have test that’s evaluating metrics against the total count of outs completed, and you have a metric that’s strictly based on counting outs, those two are going to match up well.

What is the best statistic for measuring defense? (1) Photo by Joe Robbins/Getty Images

But Wait, Isn’t the Point To Get Outs?

The point of baseball defense is to get outs, for sure. The point of baseball defense metrics is more up for debate.

Think about situations where the FRAA approach might fail:

  1. If there’s not enough sample size. Say you only get to see one play, a grounder to shortstop. If the ball is hit straight at the shortstop and he makes the play, it tells you a very tiny amount about his fielding ability. If the ball is hit 15 feet to his right and he makes the play, it tells you something more. Obviously, you can arrive at a predictive metric in a smaller sample if you use that extra signal, so positioning and batted ball data give an opportunity to improve on FRAA. But that extra data is in a set that is flawed, and/or difficult to use. How do you incorporate it successfully? And in what sample does it help? In what sample does it hurt?
  2. Positioning. This second situation is more absolute and concrete, and shows a stark difference between FRAA and OOA. Whether or not a particular play is made isn’t just based on an individual’s quickness, speed, and hands; it also has to do with where he was positioned at the start of the play. And some of the credit or blame for positioning definitely belongs with the player himself, but much of it should go to the team — namely the manager, bench coach, fielding coordinator, and front office wonks. So should the goal of a defensive metric be to evaluate defensive effectiveness in a player’s given team context, or should it try to rate players in a context-neutral way? FRAA includes most of the team context (by ignoring positioning and batted ball data), while OOA strips most of the team context out (by using detailed positioning and batted ball data to evaluate the difficulty of each play based on the flight of the ball and the starting position of the fielder). The other metrics fall somewhere in between (I think).

A Split Decision, aka The Interesting Part

If every team positioned their fielders exactly the same way, I’m pretty sure FRAA would crush OOA in a test of this structure (and there might not be a good reason to use any of the other metrics in large samples). But they don’t, and it didn’t, and the way that it didn’t is fascinating.

We already noted that FRAA “won” the outfield test. Well, OOA finished dead last. On the infield, though, it was flipped, with OOA “winning,” and FRAA finishing last.

That’s interesting!

On one section of the field, the metric that evaluates positioning (FRAA) matched up better with overall team quality of results, while on the other main section of the field, the metric that tries to isolate player skill away from positioning (OAA) matched up better with overall results. This is because OAA strips out positioning and isolated the player’s skill, post-crack-of-the-bat.

I’m darned if I know quite why it worked out that way, but I’m pretty certain that it’s telling us something about the nature of infield and outfield defense and shifting, if we’re clever enough to puzzle out what.

What is the best statistic for measuring defense? (2) Photo by Joe Robbins/Getty Images

My hypothesis is that in the outfield teams basically position their fielders the same. The right fielder is in right, the left fielder is in left, and the center fielder is somewhere in between them. Then the outfielders shade a bit one way or another depending on batter spray chart, and everybody knows those spray charts. FRAA is making an apples to apples comparison, and the fielders with more range and who are shaded 10 feet better are eating the apples.

For infield positioning, there is much more variation in how teams approach positioning.

When a team shifts three fielders to the first base half of the field, some teams might walk their third baseman over past second base, leaving the shortstop at short. Some teams might walk the shortstop past second instead and leave the third baseman on his side. In this case, depending on which team they’re on, those shortstops and third basem*n are playing legitimately different positions, and in this scenario, it would seem that FRAA is comparing apples to oranges without knowing it.

This result of BP’s test suggests to me that teams have a decent handle on the skills of their infielders, and are dividing up the field effectively based upon those skills.

People analyzing other sports already talk about systems of “total soccer,” and “positionless basketball.” It may be time to think about infield defense the same way.

Caveat: is First Base an equal infield position?

Defensive metric skeptics have long argued that playing first base is an entirely different skillset than playing other infield positions, and that it’s wrong to think of it as “the place we put the guy who can’t really field.”

These skeptics balk loudly at the idea of a defensive spectrum, that implies that you can move an average second baseman over to first and realize ~10 extra runs saved in defensive production.

It pains me to say this, but these skeptics and their straw men . . . they might have a point. First base IS different, at least in this test.

The way this shows up in the test is that both FRAA and UZR actually fared worse than the baseline control for first base, which assumed that every first baseman is average. Which is to say, if you used fancy FRAA or UZR to rank all the first basem*n and then used that to predict which teams had the better overall infield defense, you’d do worse than if you didn’t rank the first basem*n at all. It appears there is no correlation between a stat’s evaluation of team defense and that same stat’s evaluation of first base, at least in 2019.

That’s crazy and interesting!

I see three possibilities, that can exist in some combination to produce that result:

  1. First base defense is based on a different skillset than “the ability to field balls and make outs.”
  2. There is very little difference between the defensive ability of major league first basem*n.
  3. There is very little impact on team defense by the actions of the first basem*n.

Option 1 is what the defensive metric skeptics will want to seize on. Options 2 and 3 could hypothetically still confirm the existence of a traditional sabermetric defensive spectrum.

What is the best statistic for measuring defense? (3) Photo by Mike McGinnis/Getty Images

Option 1 reveals what the eye test should already tell you: First base defense is based on a different skillset than “the ability to field balls and make outs,” such as things like receiving the ball on a ball in play, or the ability to throw to a pitcher covering first. It’s simply different.

Option 2 comes into play when you investigate the variation in defensive skillsets around the infield. One potential data point is the “assume everyone to be average” baseline. The spread between the metrics in BP’as analysis was a lot closer at second base than it was for shortstop and third. But because second base is also a position where teams often try to hide fielders with fewer physical tools (especially arm), like first base, this suggests to me that option 3 is at work, at least to some degree.

Option 3 reveals that, while first base might not be as easy as Billy Beane tried to tell Scott Hatteberg, it has far less value in the grand scheme of infield defense than the defensive ability of the players at the “more difficult” positions.

Conclusion

If you hoped to read a nice, tidy summation of the Baseball Prospectus research that would distill for you what defensive metrics you should look at, and what you should ignore . . . I’m sorry. This wasn’t that.

Good analysis, like what Judge, O’Rourke, and the whole team over there at BP do, is rarely tidy. It gives a quantifiably precise answer to a question narrowly asked. That narrowness means that we get from them a high quality answer, but one that’s tricky to apply to the more commonly asked questions like “Who are the good fielders in baseball?”

Judge and O'Rourke limited their discussion to their concrete results, but being that it's not my research, I feel free to speculate wildly, irresponsibly. So here are the actionable takeaways I got from the article:

  • Now, as much as ever, is a great time to subscribe to Baseball Prospectus. They’re doing real work.
  • FRAA seems like the thing to use to evaluate outfielders within their team defensive system in a given year. I am curious to see the year-over-year predictive ability of outfield FRAA (and also the other metrics), and whether it is lower for outfielders who change teams, but that’s for another time.
  • I’ve never paid a ton of attention to infield OOA, having more confidence in the outfield version. I probably I should pay attention on the infield.
  • There’s little reason to use UZR these days, at least in its original form. It throws out the plays that are shifted, and teams shift all the time. While that can tell us something, that something is now better told by other metrics.
  • The discrepancy between FRAA’s comparative success with outfielders and its comparative failure with infielders is wild, and while I’m tempted to throw it in the trash bin for individual infielders, along with UZR, I wonder if it could be improved by a reconsidering of the labels we put on those infielders. That would be a fun project.
  • The traditional defensive spectrum? It might work in some situations, but it’s got issues.
  • Come back, baseball.
What is the best statistic for measuring defense? (2024)
Top Articles
Latest Posts
Article information

Author: Barbera Armstrong

Last Updated:

Views: 6616

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Barbera Armstrong

Birthday: 1992-09-12

Address: Suite 993 99852 Daugherty Causeway, Ritchiehaven, VT 49630

Phone: +5026838435397

Job: National Engineer

Hobby: Listening to music, Board games, Photography, Ice skating, LARPing, Kite flying, Rugby

Introduction: My name is Barbera Armstrong, I am a lovely, delightful, cooperative, funny, enchanting, vivacious, tender person who loves writing and wants to share my knowledge and understanding with you.