Why Do We Need Another Player Evaluation Method?

by Jim Furtado

[ Webmaster's Note: The following article appears in The 1999 Big Bad Baseball Annual. ]

Recently, someone asked me this question. Why have I spent a good chunk of my limited free time researching and designing player analysis formulas? Without getting into my childhood and my psyche (my wife covers these areas sufficiently), the answer is simple–I don’t feel that the available methods sufficiently answer my questions.

Now, don't get me wrong. I don't want to imply that all the available methods are useless, because they can be useful. However, they all fall short of the mark; they all lack ....... something.

I'm not alone thinking this way, either. Just last year, in this very space, Jay Walker and Brock Hanke pointed out many of the problems with today's player evaluation systems. In particular, they wrote a very insightful examination of Bill James' Runs Created (RC) system.

Their critique was very interesting. Although I hadn't noticed the RC/27 outs issue, I had noticed RC's inaccuracies at the extremes. Because of this rather large problem I, too, had adopted Paul Johnson's Estimated Runs Produced (ERP) as my run model of choice.

What Jay and Brock missed in their examination was that ERP was essentially a linear formula dressed up in a convoluted package. Of course, this wasn't a big surprise. Many other analysts, including Bill James, had also missed it. As a matter of fact, I missed it myself for a long time. I only experienced my epiphany, regarding the matter, about a year before. Although I can't say the revelation changed my life, it did cause me to delve a little deeper into player statistics.

You see, when I made the realization, I was quite perplexed. Bill James long ago disparaged Pete Palmer's Linear Weight System (LWTS) because, in his words, "runs scoring is not a linear activity." I thought, "How could James, on the one hand, laud Johnson's linear formula as more accurate than his own, while knocking Palmer's linear formula?" These conflicting opinions couldn't both be true. The contradiction prodded me to study the matter myself.

During my investigation, I contacted Don to see if the guys at BBBA might be interested in checking out what I was up to. Luckily, they were. The ensuing collaboration produced something we are all proud of.

Read over the next few pages to find out more.

Logic and Methods in Baseball Analysis - Revisited

In the 1984 Baseball Abstract, an article entitled “Logic and Methods in Baseball Analysis" appeared. In this article, Bill James not only demonstrated the Runs Created methodology, but also explained the philosophy behind it. Since the new methods presented in this chapter are as much a shift in philosophy at BBBA as a change in formulas, it's appropriate that I frame my part of the discussion similarly.

Bill James began the essay with the following statement, “Ultimately, what we are trying to do is to find answers to questions, baseball questions.” Although it’s best that I update his questions a little, his sentiment in the article remains true. We are trying to find answers to baseball questions. Who is the best player? Who is the Most Valuable Player? Should the Red Sox re-sign Mo Vaughn or let him go? Who should the Mets play at catcher? Baseball fans, including myself, ask these questions all the time. Consequently, we’re constantly looking for the answers.

These questions can be answered many different ways. They can be answered subjectively or objectively; they can be answered based on feeling or scientific method. There is no right way or wrong way to find answers to our baseball questions, as long as we end up at truthful answers.

Bill James, also, used a road building analogy in the original essay on this subject. He stated, “The development of knowledge in any field waits upon two things: the development of methods and the availability of a compelling logic. Methods are the roads that one travels on in searching for the truth, and like all roads they can be constructed and abandoned as needed. The creation of a new method is basically a mental construction project; if you need the road, if you know where you want to go, then there is no doubt about one’s ability to develop a method to get there.”

As is often the case, Bill James was on the mark with this idea. Once we properly frame the question, it is possible to develop methods that provide suitable answers. Certainly, our methods don't always provide perfect results, but as long as we constantly refine our questions and our methodology, we get closer and closer to our desired destination—the truth.

Of course, not everyone accepts the truth. Whether something is true or not is often very hard to determine. People disagree about “the truth” all the time. History is littered with examples, sometimes horrific, of the different ways “truth” has been defined or discussed. Fortunately, “the truth” baseball fans search for does not require bloodshed for resolution. Thankfully, the search for truth in baseball can be undertaken in more civilized fashion.

Like James, the way that I often search for baseball truth is sabermetrics. Sabermetrics, coined by Bill James, is defined as “the search for objective knowledge about baseball”. Sabermetrics is a scientific process that tries to provide answers to questions based on fact and evidence, not feeling and intuition.

Since I’m a skeptic, sabermetrics greatly appeals to my sensibilities. However, although I definitely consider myself a fervid proponent of the discipline, I try to always keep in mind its limitations, namely, some components of play can’t be quantified.

First, we can not properly account for the human element. Physical talent alone does not explain performance. If it did, only the best athletes would be the best players. “Tools” guys would be the only superstars. That’s certainly not the case. Many players perform better than their athletic talent would indicate. Intelligence, diligence, and tenacity are some of the qualities that spark players to perform above their given talent.

Second, many results can not be appropriately apportioned among players. For example, say I tell you that a catcher allowed 30 stolen bases and threw out 6 other guys. Say I also tell you the only games he caught was with a particular pitcher on the mound. Can you tell me who was the weak link? You can’t. Now, I know this is a rather extreme example, but many facets of play can’t be broken down

These limitations do not mandate that all objective analysis be discarded. It just requires that questions and answers be framed in such a way that unmeasurable facets of play are either isolated from the question at hand, or, at the least, are recognized as shortcoming of the process. As long as these thoughts are kept in mind, sabermetric methods can be used in the quest for baseball truth.

Getting back to my friend, there are an abundance of systems which try to answer baseball questions. He’s certainly right in this regard. There are plenty of methods vying for the attention of baseball fans. Some of the formulas work very well; some don’t work too well; and some don’t work at all. Unfortunately, it’s not always readily apparent which is which. This is especially true for the statistically feint of heart.

I always hate to generalize but I think I’m on safe ground with the following statement: All baseball fans enjoy baseball stats. Now before you rush to correct me, let me add this hedge: To different degrees, all baseball fans enjoy baseball stats.

Every fan checks out the statistics they are most comfortable with. For some, the batting line on the television satiates the desire for information. Others (like you, otherwise, you wouldn’t have purchased this book and wouldn’t be reading this article) have a deeper thirst that can only be quenched by more in depth analysis.

The differing statistical acumen leads to misconceptions. I can’t begin to count the times I’ve heard a batter described as the “best batter in baseball” because he had the highest batting average. Certainly, batting average can be useful to determine a player’s ability, but only when the question is properly framed. Unfortunately, in most cases, batting average is used to answer overly broad questions.

Batting average is not the only statistic that’s used and abused. Others also get misused. At times, even intelligent baseball analysts choose the wrong statistics to answer the question at hand. I’m sure, at one time or another, I have done it myself.

These analytical faux pas aren’t the end of the world. People can enjoy baseball without being statistically adept. However, if the goal is a deeper understanding of the game with the help of statistical analysis, it’s important that analysts carefully select and design methods to answer succinct questions. It’s also important that those who have a deeper understanding pass along that knowledge. Hopefully, I can help do my part here.

At the Feet of the Masters

Back in the early 1970s, when I was a lad who spent most of my time playing baseball in the backyard with my brother, my father bought me a tabletop baseball game. I can’t recall the exact name, but I do remember it had Sports Illustrated somewhere in the title.

Anyway, I spent hours and hours and hours playing that game. I mostly played with the Red Sox, but I remember playing other teams as well. I bring this up because this was the first time I ever really concerned myself with things like lineup building, the value of the bunt, pitching rotations, etc.

As my interest in baseball (and the tabletop game) grew, my father and I spent more and more time discussing baseball strategy. He passed on his considerable baseball knowledge, advised me to read books about the subject and recommended that I listen intently to radio and television broadcasters. Being a devoted son, I minded my father’s words and consumed every baseball strategy tidbit that came my way. What I essentially learned, from all this, was the way of “The Book”. As you know, “The Book” isn’t really a book. It’s just what everybody calls baseball’s conventional wisdom.

As you might expect, it wasn’t long before I could recite “The Book” backwards and forwards. I knew that Steve Garvey was the best fielding first baseman in baseball. Rod Carew was the best hitter. Tony Perez was a great clutch hitter. Pete Rose was the second best player. (Yaz was the bestIm from Red Sox country.) Bunting was a great play. Pitching was 75% of the game. Et cetera, et cetera, et cetera. The years passed and my confidence in my own baseball knowledge grew. That was until I discovered books by two knowledgeable gentlemenBill James and Pete Palmer.

In 1986, I walked into a bookstore and, as is my habit, I purchased some interesting looking baseball books. The two books, The Hidden Game of Baseball (HGB) and Bill James’ Baseball Abstract (BJBA), were quite disconcerting. The ideas presented in the books clashed with the things I had learned to be true. The bunt wasn’t really that good an idea. The Red Sox didn’t necessarily have the best hitters. Pete Rose wasn’t the best player in baseball. Clutch hitters didn’t exist. Steve Garvey couldn’t field. Batting average was overrated. Et cetera, et cetera, et cetera.

Don’t get me wrong, I wasn’t driven to the streets in turmoil. I didn’t feel compelled to scream. “It’s a maaad house! A maaaaaaad house!” like Charlton Heston did in the original Planet of the Apes movie. To be honest, at the time, baseball no longer consumed most of my waking thoughts. As often happens to young men, my primary focus had shifted to females. During this period of my life, baseball devolved into a secondary, but still substantial, interest.

Nevertheless, even with baseball's temporarily diminished importance, the books did have a profound effect. I started re-thinking everything I thought I knew about baseball. Of course, this was a very good thing. If it wasn't for these books, I wouldn’t be writing this essay.

Unfortunately, I don’t have the space to properly give everyone, who has contributed to our current baseball knowledge, credit. So what I’ll do is give a very, very, very brief overview of the three most influential sabermetricians. Furthermore, I’ll limit the scope of the overview to player evaluation systems.

I mention all this for one important reason: to pay homage to the people that laid the groundwork for the current state of sabermetrics. To go forward, we must look back.

Branch Rickey / Allan Roth

I’m sure you noticed two names here. No, they are not the same person. I include them together because it’s impossible to separate their contributions. You’ve heard of Branch Rickey; you probably haven’t heard of Allan Roth. That’s unfortunate because Allan Roth probably was the first sabermetrician.

As far back as 1940, Mr. Roth was tabulating statistics that are now commonly known as situational stats. He, at one time or another, examined such things as batting by count, by situation, by lineup position. Additionally, he wrote many articles including an in depth examination of Hal Newhouser’s 1945 season and including an examination of Bob Feller and strikeouts which appeared in The Sporting News in 1946.

Beginning in 1947, Mr. Roth began working with Branch Rickey. Their association culminated with the publishing of an article “Goodby to Some Old Baseball Ideas” in LIFE magazine August 2, 1954. Although the article lists Branch Rickey as its author, Roth appears to be the person responsible for its content.

In the article, Branch Rickey proposed a “device for measuring baseball which has compelled me to put different values on some of my oldest and most cherished theories. It reveals some new and startling truths about the nature of the game. It is a means of gauging with a high degree of accuracy important factors which contribute to winning and losing baseball games.” The formula which he described as “a simple, additive equation” was the first system developed to compare run scoring difference to team wins. Its effect was revolutionary, though not instantly.

Rickey’s thoughts remain controversial, “If the baseball world is to accept this new system of analyzing the game—and eventually it will—it must first give up preconceived ideas. I had to. The formula outrages certain standards that experienced baseball people have sworn by all their lives. Runs batted in? A misleading figure. Strikeouts? I always rated them highly as a determining force in pitching. I do now. But new facts convince me that I have overrated their importance in so far as game importance is concerned. Even batting average must be reexamined.” Current sabermetricians are still trying to gain the widespread acceptance of these ideas.

As Palmer noted in HGB, “The first of the efforts to pare offensive statistics to their essence, runs, and then reconstruct them for individuals so as to reflect their run-producing ability, were Rickey’s and Roth’s.” The truth of that statement is why the Rickey / Roth paring warrants attention.

Pete Palmer

I can’t give Pete Palmer’s work the full credit it deserves since I can’t possibly do so in such a short article. Besides his work in the HGB, Mr. Palmer also is the data wizard behind Total Baseball (TB). Along with John Thorn (his co-author for many projects), Pete Palmer is responsible for enlightening baseball fans, not only concerning sabermetrics, but baseball history in general. Mr. Palmer’s work in verifying baseball records ensures that players, past and present, are properly credited for their performances.

Mr. Palmer has made many contributions to player evaluations. Although On Base Percentage is now commonly applied, it wasn’t always so. Due directly to his championing of the stat, it is now available everywhere.

His work with park effects, including the development of park adjustments, was groundbreaking. Today’s sabermetricians, including myself, often use the park effect system he developed for the HGB and TB.

His Linear Weights System (LWTS) of player analysis is the most widely used player evaluation method. In particular, the defensive assessment portion of LWTS is the standard for historical comparisons.

I could go on and on. This is just a tip of the Palmer statistical iceberg. If you want to find out more about Pete Palmer’s handiwork, I suggest you borrow HGB from your local library, purchase a copy of TB, and/or visit the TB Internet web site at http://www.totalbaseball.com.

Bill James

Quite simply, Bill James is the “Father of Sabermetrics”. Without him, who knows where baseball analysis would be today. Even though Pete Palmer’s work was also groundbreaking, it was Mr. James’ popularization of the genre that made the publication of sabermetric books possible.

Mr. James’ style of analysis prodded a multitude of baseball analysts to dig deeper into data and think profoundly about various baseball questions. As with Pete Palmer’s work, I could not possibly give Bill James’ accomplishments proper credit here. Additionally, since BBBA is a direct descendent, you should be properly familiar with a good deal of his analysis. Instead, I’ll re-visit a few elements from his aforementioned “Logic and Methods in Baseball” essay, since many of the points remain true today, fourteen years later.

General Principles of Sabermetrics

In the essay from the 1984 Bill James Baseball Abstract, James defined several baseball axioms and sabermetric principles. I'll take the liberty of consolidating and slightly updating his list. This is done with great trepidation. I run the risk of offending the sensibilities of sabermetricians everywhere. To some, fiddling with anything done by Mr. James’ amounts to sacrilege. Hopefully, these changes won’t be met with the same contempt as Dr. Frankenstein's medical breakthroughs.

Jim Furtado’s Revised Principles of Sabermetrics

1.The questions define the methods.

This one is pretty simple, really. In spite of that, it’s constantly being violated. I don’t know how many times I’ve seen people use the wrong method to answer a particular question. Using batting average to answer the “Who’s the best hitter?” question is the most prevalent example. Another example is using pitcher wins to answer the “Who’s the best pitcher?” question. Don’t get me wrong, these stats can help us answer these questions, but, used as the sole criteria, they aren’t useful.

This error isn't confined to statistical novices. People with substantial backgrounds in mathematics sometimes make the same error. Even prospective sabermetricians occasionally make this mistake. I recently came across such a case. This guy, who shall remain nameless, looked around at the available methods and decided the results didn’t match his expectations. He then designed a system that produced results that matched his own subjective sensibilities.

In this instance, the designer didn’t think defensive skill was properly credited. As a result, he created a method that placed a high value on players with strong defensive reputations. Unfortunately, he started with the answer and came up the question, rather than the other way around. Imagine his disappointment when his “objective” creation met with criticism. Too bad he didn’t understand what “objective” means, or he could have saved himself a lot of time and effort.

2.Baseball is a team game; therefore, players should be evaluated in a team context.

In the original essay, James said. "A ballplayer's purpose in playing baseball is to do things which create wins for his team." He was certainly correct because, as he also said, a player is not in the game to rack up personal accomplishments but to help his team.

Many analysts don’t factor in this important truism. Ignoring it is a mistake. The reason is simple: baseball is a complex game where player performances are heavily intertwined. The third baseman waves at a hard grounder; the pitcher is charged with a hit. The pitcher delivers slowly to the plate; a “SB Allowed” is added to the catcher’s defensive record. The first baseman makes a great scoop; the shortstop makes the play. The outcome of games hinges on this interaction between players.

We can’t possibly design an evaluation system that accounts for every interaction. Even the best simulations can’t account for every possible permutation. However, rather than throw up our hands in defeat, we do the best we can with the available data. As long as we continue to incorporate as much information as we can, we inch our way toward our destination.

3.All offense and all defense occurs within a context of outs. (James)

In my mind, the important point to understand about this statement is that the opportunity for the team is not the same as the opportunity for players. For the team, opportunity is tied to outs the team gets 27 outs. For players, opportunity is tied to plate appearances, times on the bases, and inning in the field. Certainly there is an interaction between the two. If a team's players consume outs at a high rate, the individual players get fewer opportunities to positively impact the team. What's also true is that this interaction isn't always readily apparent. Take the guy who gets on base in the sixth inning, which player(s) benefit? Directly, the other hitters who come up after him in the same inning. Indirectly, the player who gets an additional at bat in the ninth.

4.There are two essential elements of an offense: its ability to create opportunities and its ability to take advantage of those opportunities. (James modified)

In James' original description, getting on base (OBA) defined opportunity, while slugging defined the ability to take advantage of those opportunities. This is close, but slightly off the mark. A player lowers the number of opportunities for his team not only when he is at the plate, but also when he is on the bases. This must be accounted for to accurately assess a player's ability to create additional opportunities for this team.

Slugging Percentage defined the team's ability to take advantage of opportunities in the James' original principle. Again, this is close, but slightly off the mark. Every positive offensive play helps a team. A batter walks, steals second, advances to third on a wild pitcher, and then scores on a soft grounder to the second baseman. The players in this example took advantage of the opportunity without a hit. The run they produced is just as valuable as a run driven in by a double.

5.There is a predictable relationship between the number of runs a team scores, the number they allow, and the number of games that they will win. (James)

This ties in with another of James' original axioms: "Wins result from runs scored. Losses result from runs allowed." The bigger the difference between the runs you score and runs you give up, the higher your winning percentage will be. This topic has been covered many times, so you should already be familiar with it.

6.Value is context driven: ability is context neutral.

The genesis for this item is the whole MVP debate. Every season people argue about who the Most Valuable Player is. The biggest point of contention is usually whether the player was on a contender or not. Another sore point is whether situationally dependent stats, such as RBI and Runs, should be factored in. To me, people meld together two different questions: “Who is the Most Valuable Player?” and “Who is the best player?” Although sometimes the answer to both questions is the same, they ask two distinctly different things. While the MVP question is about value, the “best player” question is about ability. Value is context driven; ability is context neutral.

If I want to know who’s most valuable, context is important because the value of a player’s contribution changes with context. A home run has more value with the bases loaded than with the bases empty. A steal of second base has more value with nobody on than with two outs (you have more opportunities to drive the player in with less than two outs. A run in a 0-0 game has more value than a run in an 11-0 game.

With the changing value of events, it's very important to establish the context of the player's action to properly valuate his performance. Say a player batted with lots of runners on base and drove in an average number of those runners. What would the result be? You guessed it, a lot of RBIs. Say another player batted with relatively few runners on base, but drove in a high percentage of them. What would the result be? Again, a lot of RBIs. Which player's performance has more value? Well, that depends on the context of the production. We need to factor in team context to properly determine the value.

Wins are the currency of value for teams. Therefore, the team with the most wins has the most value. Since team wins are produced by the combined accomplishments of its players, the players' value equal the team’s value. This means that two players who make exactly the same contribution do not necessarily have the same value. This greatly complicates answering the whole MVP question.

To illustrate, here’s a word problem for you. Two gentlemen (let’s call them Player A and Player B) play on two nearly identical teams. Each team scored 800 runs. Each team possessed fielders of exactly the same quality. The difference between the two ball clubs is the quality of the pitching staffs. Team A’s pitchers allowed 650 runs, while Team B’s pitchers allowed 700 runs. All this resulted in 96 wins for Team A and 91 wins for Team B. If I tell you both players generated 80 runs for their respective teams, does this mean they were equally valuable players?

Considering this is a baseball book rather than a math test, I’ll just give you the answer: no, Player A was more valuable. Why? Because in the context he operated in, his runs were more valuable–they bought more wins. Using a modified version of Pete Palmers runs to wins formula, I determine it cost 9.97 runs per win in Team As context [(10/3)*SQRT((800+650)/162)] and 10.14 runs per win in Team Bs context [(10/3)*SQRT((800+700)/162)]. Player A’s 80 runs purchased 8.02 wins while Player B’s 80 runs purchased 7.89 wins. Therefore, Player A was more valuable.

Ability, on the other hand, is context neutral. The ability to hit a ball 500 feet is the same whether a player is at Coors Field or at the Astrodome. The fact that the same ball might travel 540 feet at Coors is irrelevant. The change in conditions causes the ball to travel 40 feet farther, not a change in ability. (A non-baseball example of the same concept is weight. Take a 200 pound item on Earth and weigh it on the Moon. What does it weigh? About 32 pounds. The item doesn’t change, the conditions do. The change in conditions, gravity in this example, accounts for the difference.)

Since the best player is the player with the most ability, ability is what we should measure to answer the "best player" question. To measure ability, we must first filter out context. Once that's done we can directly compare player in a neutral context–we can compare their ability.

7. An accurate measure of performance is always to be preferred to a less accurate measure. (James)

This one shouldn't be needed, but it is. There are a lot of stats that aim to evaluate players. Many of them are developed using sound theoretical principles. I've included a few examples of them below. But how do we decide which one is best? Compare how well they correlate with actual run scoring.

8.All else being equal, opt for simplicity.

If two measures do essentially the same thing, choose the easier one to calculate.

O.K., that’s enough with the philosophical stuff. Before Jay and I get into our reasons for creating a new evaluation system. I think it would be very useful to examine the design of some current statistics. I’ll make the examination both rather broad and brief, since a complete examination would take an entire book.

Looking back to go forward

Knowledge is not created in a vacuum, but, instead, expands on what comes before. With that in mind, it's probably a good idea to know a little about what other evaluation methods are out there. Here’s a few of the better efforts. I include references, in case you want to check out what the inventor had to say at the statistic’s unveiling.

Rate Stats

It’s the base that counts

Some analysts look at the game and come to the conclusion that bases gained are what’s important. If a player gains more bases than the number of outs he consumes the better he is. Although in the last twenty years many people have worked on developing such systems, the concept was applied to Major League Baseball as far back as the turn of the century. At that time, a man named Travis Hoke began compiling a base grounded system for Branch Rickey and the St. Louis Browns. Although the system stopped being used after two seasons due to the overwhelming burden of computation placed upon Mr. Hoke, it stands as the first time such as system was developed. (Read all about Mr. Hoke's system in Esquire magazine’s October 1935 issue or at my web site at http://www.baseballstuff.com/btf/pages/essays/rickey/hoke.htm.

Although I guess on some level the whole concept of evaluating players by comparing bases is true, I don't think it's the best way to go. The goal for an offense is to score runs, not to accumulate bases. Although base stats correlate with runs, the correlation is not as high as the correlation for the stats expressed in estimated runs.

Another problem with these statistics is that they are expressed as rate stats. The problem with rate stats is they don’t tell you the aggregate value of a player’s performance. Additionally, the answer you end up with isn’t really in a quantity that can be directly converted to wins (like runs). To express these stats in runs, a conversion step is required. Check out the “A Question of Accuracy” essay to see how they compare in this regard.

Anyway, any number of people have designed base counting statistics. Here's a sampling:

Thomas Boswell’s Total Average (TA)

TA=(((1B + (2Bx2) + (3Bx3) + (HRx4)) + HP + BB + SB) ― CS) / ((AB - H) + CS + DP)

Originally from Inside Sports, January 1981. Seasonal lists also appear each preseason in the same publication. Basically, it has total bases plus in the numerator and outs (without SF, SH) in the denominator.

Barry Codell’s Base-Out Percentage (BOP)

BOP=(TB+BB+HBP+SB+SH+SF)/(AB-H+CS+GIDP+SH+SF)

Find it in the 1979 Baseball Research Journal. In the numerator, all the bases gained by the batter; in the denominator, all the outs accrued.

Mark Pankin's Offensive Performance Average (OPA)

OPA=[ 1B + 2*2B + 2.5*3B + 3.5*HR + 0.8*(BB+HP) + 0.5*SB) ] / (AB+BB+HB)

Mark Pankin is a pretty inventive baseball analyst. OPA is one of his attempts to quantify offense. Although OPA was a good swing at player evaluation, the most interesting stuff he's worked on is his Markov chains and baseball line-up studies. Unfortunately for you, I cannot possibly give a brief overview about such complex subjects. If you're interested in learning more about it, you can read about Mark's Markov Models/Batting Order Optimization on the Internet at http://www.retrosheet.org/mdp/baseball.htm.

To read more about OPA read the July-August 1978 issue of Operations Research or visit my web site at http://www.baseballstuff.com/btf/pages/essays/other/opa.htm.

Bill Gilbert’s Bases per Plate Appearance (BPA)

BPA=(TB + BB + HB + SB - CS - GIDP) / (AB + BB + HB + SF)

BPA=Bases per Plate Appearance

Find it on the Internet at http://www.stathead.com/bbeng/gilbert/measoffperf.htm.

Clay Davenport’s Equivalent Average (EQA)

EQA=1/3 x (H + TB + BB + SB) / (AB + CS + (1/3 x SB) + (2/3 x BB))

Base of the Baseball Prospectus player evaluation system. Here’s what the designer has to say about EQA, “It is, admittedly, an ugly function. This is partly because it is empirical; it is based not so much on theory but on what actually works. Another reason is due to its evolution: I added more information to the denominator to improve the correlation with run scoring. Placing stolen bases in the denominator is exceptionally odd, but it works to keep the ratings of player from non-caught stealing years in check.”

I agree that it's an odd construct. EQA is designed to mirror the batting average range. That, I suppose, is what the 1/3 part is for. Other than that, it’s easy to see that it’s not really based on theory. Although it does work, it doesn't as well as they purport. But this isn’t the place for that discussion.

Find it either in the Baseball Prospectus books or at http://www.baseballprospectus.com.

James Tuttle’s Base Production (BP)

I can’t really give you a description or formula because this statistic's design keeps changing. I correspond with BP’s creator, James Tuttle, and promised him I’d include a mention of his work in this article. However, this isn’t a “Hi mom!” mention. James is an intelligent guy who has some good thoughts about the process. Unfortunately he’s a little unsure about what he wants his brainchild to measure. I include the reference here as a heads-up for the future.

You’ll have to do a little detective work on the Internet to find information about this one. Go to http://www.dejanews.com and do a search using the following keywords: “James Tuttle”, BP, baseball, “base production”.

Get them on and drive them in

The next few stats are based on the premise that production should be measured by some combination of On Base Average (OBA) and Slugging average (SLG). All the ones listed work pretty well on the team level. Unfortunately, when they are used on the player level, they don't. The reason is simple: a player's slugging does not interact with his ability to get on base. When the lead off hitter gets on base with a walk, it's not his slugging percentage that matters, it's the slugging percentage of the following hitters.

The limitations of the construct show up most when the formula is applied to individual players. Players like Frank Thomas, who usually have a high OBA and a high SLG, get overrated; players like Rey Ordonez, who usually have a low OBA and a low SLG, get underrated (if that's possible). Like I said previously, this limitation was the primary reason I began investigating alternatives. It's also the reason Bill James revised his own RC formula.

Rickey

Rickey=[ (H + BB + HP) / (AB + BB + HP) + 3/4 * (TB - H) / AB + R / (H + BB + HP)]- [ H / AB + (BB + HB) / (AB + BB + HP) + ER / (AB + BB + HP) - 1/8 * SO / (AB + BB + HP) - F ]=G

Here’s a brief Rickey description, “It is a means of gauging with a high degree of accuracy important factors which contribute to winning and losing baseball games. It is most disconcerting and at the same time the most constructive thing to come into baseball in my memory.” He also stated, “The formula is designed principally to gauge and analyze performance on a team basis. But certain elements in it provide a yardstick for measuring individual talent.”

Although I include the full version of the formula here, I only evaluate the offensive component (the top half) in the accuracy essay. This part of the formula can be broken into three components:

On Base Average (H+BB+HP)/(AB+BB+HP)
¾ times Isolated Power (TB-H)/AB
Scoring efficiency R/(H+BB+HP)

This formula was the forebear of many of the equations over the last thirty years.

Read all about it in Life magazine’s August 2, 1954 issue or at my web site at http://www.baseballstuff.com/btf/pages/essays/rickey/goodby_to_old_idea.htm.

Earnshaw Cook’s Scoring Index (DX)

DX=[(H+BB+HBP)/(AB+BB+HBP)] * [(TB+SB-CS)/(AB+BB+HBP)]

One of the ancestors of Bill James’ Runs Created. I'm not sure if James was aware of this or not when he created RC but if you compare them you'll see some similarities. Even though the reference I provide is from 1971, Mr. Cook published the first OBA*SLG model sometime in the early 1960's.

Earnshaw Cook fertile mind can examined by reading his two baseball books: Percentage Baseball (1966) and Percentage Baseball and the Computer (1971).

Dick Cramer’s Batter Run Average (BRA or OTS)

BRA=OBA*SLG

A simple, yet, accurate measure for evaluating team offense. Like the other methods based on this concept. It doesn't work particularly well for individual players.

The original BRA article appeared in the 1972 Baseball Research Journal.

Pete Palmer’s On Base Plus Slugging (OPS)

OPS=OBA + SLG

The Cliff Notes version of player evaluation. Add together OBA+ SLG and you get a good approximation of player value. Of course, you only get a rate that doesn't include many facets of play. However, what you lose in accuracy is made up in simplicity. Nobody can calculate Extrapolated Wins in their head. Just about everybody can add together OBA+SLG easily. I use this stat quite often myself. If we could only get the networks to do the same, we'd actually get an accurate assessment of players' offensive skills over the television. Maybe, just maybe, we might even get Ray Knight to shut up once and a while.

Do yourself a favor and read The Hidden Game of Baseball(1984) and Total Baseball I-IV

Power of the Run

The greatest advances in sabermetrics took place in the early 1980's through the work done by Bill James and Pete Palmer. I realize my praise of their work may be getting monotonous, but I make no apology for it. No matter how many times I mention them in this essay, I could not possibly give them enough credit.

By far the greatest contribution James and Palmer made to our understanding about the game is their work illustrating the relationship between runs and wins. Although the idea seems simple to grasp, its still not commonly accepted by mainstream baseball. That's unfortunate because James and Palmer clearly established the validity of the concept.

Even though Bill James and Pete Palmer agree on the value of estimated runs generated, they disagree on the process to get there. James based his RC on the OBA*SLG model, while Palmer based his Linear Runs on linear mathematics.

Which do I prefer? Keep reading.

Rate stat extensions

Essentially, the next few formulas are descendents of one of the rate stats that I noted above. Both RC formulas are descendents of the OBA*SLG model. EqR is an extension of EqA. They each inherit the limitations of their progenitors. They each try to work around their defects with a twist. This twist adds an additional level of complexity to the model without completely fixing the problem.

James’ Runs Created (RC)

Basic RC=(H + W) * TB / (AB + BB)

RC-T1=(H + HB - GIDP - CS) * [ TB + ((W + HB - IBB)*.26) + ((SB+SH+SF)*.52)] / (AB + W + HB + SH + SF)

RC-HDG23=A * B / C

A=(H + HB - GIDP - CS)
B=[ TB + ((W + HB - IBB)*.29) + (SB*.64) + ((SH+SF)*.53)-(.03*K) ]
C=(AB + W + HB + SH + SF)

I included only three versions of the formula: 1) the original Basic version, 2) the classic Tech-1 version, and 3) the new Historical Data Group-23 version (which is the used for 1955-1987). I could have included all 6,313,659 versions, but I doubt you would have paid $100 for this book. OK, I am being sarcastic. There aren't 6,313,659 versions; it just seems like it. In reality, there are just 24 new versions. The 14 old versions from the Historical Abstract are no longer needed.

Unfortunately, there is still another step needed to generate the new RC numbers. What you still need to do is apply what I call the fudge factor. I hope you have a sweet tooth.

To calculate individual player RC, you have to place the player stats into "a theortical team context consisting of players with average skills and eight times as many plate appearances as the subject". I warn you it won't be easy.

Take the above A,B,C factors and place them into the following equation.

[ ((A + (2.4*C)) * (B + (3*C)) ] / 9 * C minus 0.9 * C

What does all this do for you? Well, it helps to minimize the errors caused by the inappropriate design of RC, namely, that the formula is designed to work on a team basis, not an individual player basis.

You may be wondering, "Does this work?" Well, yes and no. Yes, it does cut down on the error; no, it doesn't get rid of the error. But this isn't really the place for this discussion. If you want to read more about James' changed to RC, read “Deciphering the New Runs Created” where Jay, Don, and I tackle the discussion head on and in detail.

Dick Cramer’s RC (Runs Contributed)

R=[(H+BB+HBP) + (0.5 * SB) - CS - (1.5*GIDP)] * SLG

In the 1987 The Great American Baseball Stat Book, Dick Cramer presented his own RC formula. It's a twist on the OBA*SLG model. Unfortunately, the design flaws still are apparent. Mr. Cramer tried to get around the flaws by introducing a new twist. "Runs Contributed, computed as the difference in the number of runs the overall league would have scored if [the] player's batting totals had first been subtracted from the league totals."

Clay Davenport’s Equivalent Runs (EQR)

EqR=(2 x RawEqA/LgEqA - 1) x (R/(AB+BB))league x (AB+BB)individual

Above I presented Davenport's EQA. I also stated that rate stats need an additional step to be expressed in runs. This is Davenport's step. I don't particularly care for the construct. The introduction of actual Runs into the equation is faulty. If the goal is to predict runs from the various events, you should not use actual Runs as part of the process. The problem is called circular reasoning–which essentially means using the answer to get the answer. Let me illustrate with an example.

Let's say the league EqA is .250 and the league scores 1000 runs in 6000 plate appearances. Say in 600 PA, the batter also produces an EqA of .250. How many EqR would he generate?

Since his EqA is the same as the league's, we can calculate it like this: 600/6000*1000=100. What this really means is that if you have a player with the same EqA as the league, all you have to do is figure what percentage of the league's PA he had, then multiply it by the league's runs. If a player has 10% of the league's PA, he gets credited with 10% of the league's runs. That doesn't feel quite right to me. Calculating player runs in this manner artificially increases the accuracy reading of EqA. In my Accuracy essay, I calculate EqR using Clay's method and also using the method I use to calculate runs for all the other rate stats. You can decide for yourself which one is correct.

Follow the linear brick road

People have tried to place a value on the various events since the year 1 A.B. (After Baseball). As time went on, the attempts evolved. Finally the lightbulb went on and analysts realized that the value of events should be evaluated by how they relate to run scoring. Sabermetricians then employed different means to try and come up with the most accurate numbers. You'll soon be introduced to the result: Extrapolated Runs.

Lindsey Additive Formula

.41*1B+.82*2B+1.06*3B+1.42*HR

According to HGB and my own research, in 1963 an analyst named George Lindsey's became "the first to assign run values to the various offensive events which lead to runs. He based these values on recorded play-by-play data and basic probability theory." Although the values determined by Lindsey aren't quite right, as you'll see, they are in the ball park. All and all, though, an excellent effort and discovery.

From "An Investigation of Strategies in Baseball" Operations Research, 1963.

Pete Palmer's Batting Runs (BR)

Runs=1B*.47 + 2B*.78 + 3B*1.09 + HR*1.40 + (BB+HB)*.33 -(AB-H)*?? - OOB*.50

OOB=H + BB + HB - Left On Base - R - CS

??=Out coefficient determined so that the sum of the other events (for the league ) equal zero.

Although I discuss Palmer's work below and in my Extrapolated Runs essay, I'd like to say a few things here about the foundation of his system.

First, using the out value to establish the baseline was a mistake. This decision restricts the usefulness of the formula. Instead of being expressed directly in runs, another step is required. That's not to say that this lessens the accuracy of the measure; it doesn't. However, the decision does limit BR use to the task it was designed for.

Second, the inclusion of the OOB only clouds the accuracy issue. Since OOB can't be used in the evaluation of individual players, it shouldn't be included. I didn't fully understand this idea myself until this fact was pointed out to me by one of the Internet's best baseball minds, David Grabiner. David was gracious enough to provide feedback on the design of XR.

During the process, I thought about including a similar term in my own formula. However, one of David's comments, "If you are trying to use a statistic to evaluate individual players, its reliability depends only on the reliability of that part of the team statistic which can be obtained from the individual statistics." helped change my mind. If sabermetricians wish to claim accuracy for a measure, they must keep this idea in mind.

Steve Mann’s RPA

RPA=(BB*.25+HBP*.29+1B*.51+2B*.82+3B*1.38+HR*2.63+SB*.15-CS*.28)*.52 + (BB+HBP+AB) * .008 + 3*((BB+HBP+AB)/6200) * 1000 * (OBA-.330)

First version appeared in 1977. This updated version appeared in Baseball Superstats 1989 by Mann and Malling

Paul Johnson’s Estimated Runs Produced (ERP)

ERP=(2*(TB+BB+HP)+H+SB- (.605*(AB+CS+GIDP-H)))*.16
or
ERP=(1B*.48) + (2B*.80) + (3B*1.12) + (HR*1.44) + ((HP+BB)*.32)+(SB*.16) - (OUTS*.10)

OUTS=AB+CS+GIDP-H

New Estimated Runs Produced (NERP)=(TB/3.15) + ((BB-IBB+HBP-CS-GDP)/3) + (H/4) + (SB/5) - (AB/11.75)
or
NERP=TB*.318 + ((BB-IBB+HBP-CS-GDP)*.333) + (H*.25) + (SB*.200) - (AB*.085)

The original version first appeared in the 1985 Bill James Baseball Abstract. It was also used in last year's version of BBBA.

The revised version appeared in STATS 1991 Baseball Scoreboard. I didn't notice this when I originally read the book. Thankfully, around the time I started writing this article, a kindly visitor to my web site named David Smyth passed along the updated formula.

Runs through Regression Analysis

Paraphrasing one of my math texts, I define regression as the process of estimating the value of a dependent variable, Runs, from one or more independent variables, 1B, 2B, 3B, etc.

Of course, the mathematical principles and methods required to generate the estimates are a lot more complex than can be described in one sentence. But rather than spend a bunch of space trying to describe the process, I explain it simply as the application of mathematical methods to generate the best possible run values for each offensive event.

Quite a few people have used the process to generate estimates. Although the values do fluctuate based on the events and seasons used in the studies, you should notice that the numbers don't really vary much.

Jay Bennett and John Flueck’s (ERP)

Expected Run Production=(.499 x 1B) + (.728 x 2B) + (1.265 x 3B) + (1.449 x HR) + (.353 x BB) + (.362 * HBP) + (.126 x SB) + (.394 x SF) - (.395 x GIDP) - (.085 x OUT) - 67

OUT=AB-HIT-GIDP

Data from 1969-1997 seasons (192 teams).

From The American Statistician. February, 1983.

Alidad Tash

RS=1B*.507 + 2B*.658 + 3B*.906 + HR*1.447 + (BB+HBP)*.314 + SB*.145 - CS*.234 + LG*7.26 + SF*.594 + OUT*-.098 + GIDP*-.402 +SH*-.024 + SO*-.013

OUTS=AB - (H+SH+SF+CS+GIDP)

Data from 1973-1997 seasons (652 teams).

From “Win and Run Prediction in Major League Baseball” at http://www.baseballstuff.com/btf/pages/essays/other/tash.htm.

Jefferson Glapski

R=1B*.50451 + 2B*.71574 +3B*0.99951 + HR*1.4228 + BB*.32052 - SO*.087327 + SB*.20291 - CS* 0.15383 -OUTS*0.098929 - LG*8.1139L

OUTS=AB - H - SO

Lg=league (1=National League, 0 if American League)

Data from 1974-1990 seasons (460 teams).

Full text of study can be found at http://home.ican.net/~jng/sabr001.htm.

John Jarvis

OUTS*-0.0971 + 1B*0.4222 + 2B*0.6369 + 3B*0.8816 + HR*1.49 + (BB+HP)*0.3098 + IBB*-0.0211 + SB*0.0737 + ROH*-0.0902 + GIDP*-0.3669 + ER_BF*0.6662 + ER_RA*0.4061 + RAO*0.0637 + RSO*0.7442 + K*-0.0916

ROH=Runners Out on Hit (either base runners or the batter trying unsuccessfully for an additional base)

RAO=Runner Advance on Out (includes the traditional sacrifice)

RSO=Runs Scored on an Out

ER_BF and ER_RA are the number of times the batter is safe on first and the number of times a runner was able to advance on an error.

The OUT category includes outs that are not counted in other categories (CS, ROH, second out on a GDP, K).

Using play-by-play data, John included many new parameters in his equation. Consider this a look into the future. At some point, I'd like to incorporate some of John's ideas into a future super-advanced version of XR.

Data from 1967 AL and for both leagues in the 1980-1986, and 1992-1996 (360 teams).

From "A Survey of Baseball Player Performance Evaluation Measures" at http://pacer1.usca.sc.edu/~jfj/runs_survey.html

The Whole Enchilada

Offensive Wins Above Replacement (OWAR)

The system previously used in this book to evaluate players.

Palmer’s Linear Weights System (LWTS)

The most popular player evaluation system. Due to its inclusion in Total Baseball and its all encompassing design, this method is a big hit with knowledgeable baseball fans. LWTS factors in batting, pitching, and fielding while adjusting for park, position, and era. Although the primary values are based on Palmer's Batting Runs, he also devised a method for incorporating defense and pitching. The end result is that anyone can pull out the book and compare Barry Bonds' 1996 performance with Ty Cobb's 1917 season. (They both accounted for 8.5 wins above average.) Anyone can also find out who the most valuable players of all time are. (No surprise, Babe Ruth is at the top with 107.7 wins. Nap Lajoie comes up a distant second with 94.2 wins. Sorry, you'll have to buy or borrow the book for the rest of the list.) All in all, the LWTS method is an intelligent look at baseball history.

Unfortunately, although I like LWTS and applaud the effort of Pete Palmer, I disagree with a few of his design decisions. First, the baseline is too high. From TB, "The Linear Weights measure of runs contributed beyond those of a league-average batter or team, such league average defined as zero."

Average is the baseline. That's too high. Average players have value. I won't get into the whole replacement level concept here, but average players help teams win pennants. No team can have a superstar at every position. But if it can put together a few stars with a bunch of average players they're in the hunt for a title. On the other hand, put those same stars on a team with a bunch of stiffs and they're an also-ran.

Although this isn't a very complicated concept, so many people still don't get it. To those people I ask this question: Who was more valuable during the 1986 season, Ray Knight or yours truly?

Ray played fairly well in '86. He wasn't a superstar or anything, but he put together a solid year hitting .298 with a .357 OBA and a .424 SLG in 486 AB. What does LWTS have to say about him? Ray Knight's Total Player Rating (TPR) for 1986 is -0.2 wins. This makes him pretty much an average player.

How did I fair in 1986? About average, I guess since I produced 0.0 wins. Now, before you start leafing through your copy of TB looking for my batting line, I'll tell you why I produced 0.0 winsI didn't play in the majors that year. As a matter of fact, I never have played professional ball. Nevertheless, that didn't stop me from having a higher TPR in 1986 than Ray Knight did.

Of course, my argument is faulty. LWTS is designed to compare players who actually played. If we lowered the baseline, I'd still have a higher rating than a lot of players who, in reality, are a heck of a lot better than I am. So instead of comparing Ray to myself, let's compare Ray to someone who actually playedSteve Kemp. What's Kemp's TPR rating? +0.1 wins. Since +0.1 > -0.2, a comparison of TPRs would indicate Kemp had a better season. Let's take a look at their numbers:

Name	AVG	OBA	SLG	AB	R	H	2B	3B	HR	RBI
Ray Knight	.298	.351	.424	486	51	145	24	2	11	76
Steve Kemp	.188	.350	.375	16	1	3	0	0	1	1

I think you'll agree with me that Knight was the more productive player in 1986. This points out the fatal flaw of using average as the baseline. It greatly undervalues the contributions of average players. That's why Bill James invented the concept of the replacement player. That's also why Palmer's baseline is too high.

Another design decision I disagree with is the decision to incorporate positional adjustments in the batting component of the TPR. As Bill James said, players do not come to the plate as first baseman, second baseman or catchers, they come to the plate as hitters. All the runs they generate go into one pool. By comparing players position-by-position, what happens is the runs a shortstop produces end up looking better than the runs a first baseman produces. That's just not true. All the runs are equally valuable.

This problem is compounded further when defensive runs are added to Batting Runs. Skill position players, especially those who compete in eras with a low level of offense at their position, end up getting extra credit for their position. First, they get the benefit of being compared to a lower class of hitter. Second, they get extra fielding runs for playing a more active position. This doesn't make sense to me.

Keith Woolnor's Value Over Replacement Player (VORP)/ David M. Tate's Marginal Lineup Value (MLV)

Here's Keith's own brief descriptions of VORP and MLV: "VORP measures how many runs a player would contribute to a league average team compared to a replacement level player at the same position who was given the same percentage of team plate appearances as the original player had. VORP, in its most advanced form, uses MLV to estimate the change in team run scoring attributable to the player's performance.

Marginal Lineup Value corrects the shortcomings of using RC for individual players by estimating how many additional runs an average team scores if you replace an average batter with the individual. This eliminates the 'batting yourself in' flaw in individual RC, and correctly estimates the impact of the extra team plate appearances a player creates with a high OBP. "

Although I don't agree with Keith's inclusion of offensive positional adjustments, I think his work deserves attention for some other ideas. First, his method of calculating replacement value is interesting. Second, his use of David M. Tate's MLV incorporates an element of play which has not be properly factored into to player evaluation formulasif a player avoids making an out, other players (and the team) get more opportunities. David Tate was the first person to try factor in this component of run scoring. Unfortunately, although the MLV idea is a step in the right direction, I don't think it goes far enough.

A full description of VORP, MLV, and replacement value is available at the Stathead Baseball Engineering web site at http://www.stathead.com.

Lurking in the shadows

Not every sabermetrician's work can be fully enjoyed by the public. Many people with sabermetric knowledge are either current or past employees of professional baseball. Among others, the list includes Eddie Epstein, Mike Gimbel, and Craig Wright. Unfortunately, since their employers consider it in their best interests to keep the stuff proprietary, we all miss out.

Back to the top of page | BTF Homepage | BBBA Web Site