Friday, November 4, 2011

The DoT Matrix -- Use The Force, Luke

"This is the weapon of a Jeopardy Knight . . . an elegant weapon for a more civilized age."

In February 2011, IBM challenged Jeopardy greats Ken Jennings and Brad Rutter to a match against a supercomputer they named Watson.

To the surprise of many carbon-based lifeforms, Watson won easily. But those of us who have been on the show -- especially those who were part of the practice games Watson played -- know the truth.

Watson had a huge advantage on the buzzer.

On Jeopardy, a player cannot ring in until Alex finishes reading the clue and a coordinator offstage flips a switch powering the podium lights. Hit your button too soon, and your podium is darkened for the first quarter of a second after the others are turned on. Wait too long, and the other player beats you. Finding that moment, that sweet spot, over and over again is not terribly unlike what a Jedi does swatting away battle droid blaster shots.

But with Watson, buzzing early was physically impossible. The mechanism that allowed It to ring in was powered by the same button that activated the lights for Ken and Brad. In other words, they could ring in early, but It couldn't.

That buzzer advantage was more than enough to make up for the huge knowledge deficit The Machine faced.

You see, some Jeopardy players are teeming with knowledge. Others are teeming with midichloreans.

Which brings us to this year's Tournament of Champions.

In the previous post, we figured out a way to estimate how much knowledge a player had during his regular season run. But knowing a lot of answers isn't enough; you have to be able to get i on the buzzer to show what you know. Buzzer proficiency, then, can be calculated by comparing the number of answers the player actually knew with the number of correct answers they gave (not including Daily Doubles).

The formula is as follows:

Buzzer Proficiency = Correct Responses Given / ((Answers Known per 60 clues / 60) * Total clues seen)

So the greatest Buzzer Jedi in this year's TOC is . . . drum roll . . .

Kara Spack. The only woman from the regular season to make the field.

When Kara knew an answer, she got in on the buzzer an amazing 72% of the time. By comparison, at his best Ken Jennings got in two times out of three. (Of course, he also knew 54 answers per game, so there's that.)

What's amazing about that is the theory floating among Jeopardy analysts that says that the reason 12 of the 13 contestants in the year's TOC are men is because Jeopardy is not so much a test of knowledge as it is a video game, which in America gives a cultural advantage to males ages 18-34. As Watson proved, what separates the merely smart from Jeopardy titans is proficiency on the buzzer. Kara's numbers may necessitate a re-thinking of the Video Game Theory as an explanation of the gender gap in TOC contestants.

Of course, for all we know in addition to her Jeopardy prowess Kara may really be the greatest Worlds of Warcraft player ever to walk the Earth. Or The Force may just be strong with this one where it isn't with others.

Either way, Kara looks to be a Force all her own in this year's Tournament.

The rest of the field in Buzzer Proficiency:

Kara 72.2%
Buddy 69.1%
Paul 67.9%
Tom K 62.0%
Roger 57.9%
Joon 57.2%
Jay 54.6%
Christopher 51.5%
John 49.7%
Mark 48.9%
Tom N 48.3%
Justin 45.0%
Brian 41.8%

Notice that nobody gets to the TOC with merely average buzzer ability. Also notice that some players can actually make up for an information gap by being good with the buzzer (a la Bob Harris). But the players that are most to be feared going forward are those who are both exceptional in their knowledge base and able to wield their Weapon with skill and precision.

Before long I half expect Roger to start shooting lightning out of his fingers.

NEXT: Degree of Difficulty -- What the BCS can teach us about Jeopardy tournaments

Thursday, November 3, 2011

The DoT Matrix -- What's Makes for a Good Player?

Note: The Tournament of Champions started last night. There was a mix-up with the DVD, so I haven't seen the episode yet, but from what I've read the numbers I'm about to throw at you stood up pretty well for one game. We'll see how they do against actual tournament results.

The Jeopardy Tournament of Champions is a two-week tournament during which the thirteen player with the most wins during the regular season return to Culver City to crown a season-long champion. The winner takes home $250,000 and -- perhaps more importantly -- glory.

Predicting a winner has always proven difficult, since so much of the outcome depends on factors that are essentially random (or at least beyond the player's control). Player match-ups, which players see which boards, how they're feeling that day, other players' luck. all factor into determining a winner.

But while there are obstacles to prediction, there are numbers available to help us determine going in which players will need to catch the most breaks to win, and which ones have the tools to make a deep run (mostly) regardless of luck.

Aside from random chance, there are four factors that determine Jeopardy success. In no particular order, you have to 1) know stuff, 2) be good at buzzing, 3) be able to handle difficult material, and 4) know how to wager.

Smarter people than me have hashed out most of what you'd want to know about #4, and I suspect most TOC-level players have spent at least some time playing with the J-Archive wagering calculator. So I'll just look at the first three in the next few posts.

Factor #1 -- Know Stuff

How much does a given Jeopardy player know? We've already established back in Post #1 the baseline level for an average player: 35.56 clues per 60 revealed. But TOC players are anything but average. You don't get to the TOC unless you excel in one or more of the Four Factors. Luck is a big part of the game, but it can't turn a lump of coal into Watson.

So to find out how much a given Jeopardy champ knows, we ask the same questions we asked in Establishing a Baseline. How many Triple Stumpers were there in games involving this champ? And how does that compare with games involving three "average" players? In essence, we want to know how much better the group did as a result of having this player on the "team."

At this point, I have to point out that coming up with useful numbers requires a much larger sample size than just one game. For the purposes of the DoT Matrix, I am only assessing players who saw at least five games worth of material and played against at least ten opponents. That means the DoT Matrix cannot tell us who was the best player to lose to Ken Jennings (much as I might like to know where I rank in that list).

First we use the average knowledge level to determine what percentage of clues an average player can't answer. Our sample gives us a number ("team average," or TA) around 40%. Then we make the same calculation for games involving a given champion. For instance, a 5+-day champ might reduce the "team" average from 40% to 30% (let's call this CA, for "champ average). We then figure out how much credit to give to the champion for the improvement in team performance, as follows:

Answers known per 60 clues faced = (1 - (CA ^ 3 / TA ^ 2) ) * 60

In our example, a champ that posts a CA of 30% in an environment that typically yields a TA of 40% probably knows 50 answers per 60 clues.

So how does this year's field stack up? Here are the rankings based solely on knowledge displayed in regular season play:

Roger 51.85
Joon 50.28
Mark 50.24
Tom K 49.26
Justin 47.19
Jay 44.84
John 43.92
Christopher 43.01
Tom N 41.98
Brian 41.19
Kara 33.36
Buddy 32.31
Paul 31.97

Notice that Erin and Charles aren't in these rankings because they only played 4 games, and both were tournament champions facing specially-written tournament clues. Also, curiously, the TOC alternate Sara would have been fifth in these rankings (48.97) if she were in the field.

But like on GI Joe, knowing is only half the battle.

NEXT: The Force is Strong with This One -- Assessing the Battle of the Buzzers

Wednesday, November 2, 2011

The DoT Matrix -- What's a Right Answer Worth?

The short answer: it depends on how good you are with the buzzer.

So, for our purposes, let's make another assumption.
  • Unless one of the players on the stage is destined for 4 or more wins, assume that your buzzer skills are up against two average players. Also, assume that your buzzer skills are average.

Now then, once we have calculated how likely an average player is to attempt a buzz on a given level of clue (see previous post), we can calculate the clam rate, which we'll call CR. Against two players, one of three possibilities exists -- either both will try to buzz, one will buzz and the other won't, or both will clam.

If both clam, your Buzz Chance is 100%. If one hits the buzzer, your chances are 50%. If both go for it, your chances drop to 33 1/3 %.

So to calculate your chances of buzzing in on a given clue against 2 average players, the formula is:

CR ^ 2 + (CR * (1 - CR)) + ((1 - CR) ^ 2) / 3

To determine the value of knowing an answer, then, figure out how often you would buzz in successfully and multiply by the value of the clue. (A $200 clue you can buzz in on 60% of the time would be worth $120.)

Note that a wrong answer would only cost you the expected value, rather than the full value, because somebody might beat your wrong answer to the buzzer with a right one, saving you both the points and the shame. *cough* It's his wedding *cough*

So, abiding by standard Coryat rules (only credit what you know by the time Alex finishes reading, non-guesses are worth 0, Daily Doubles are a special case), you can compare your score to the on-screen players by assigning the following values (rounded to the nearest 10):

SINGLE JEOPARDY

Clue Value CR DoT Matrix Value
200 27% 90
400 32% 190
600 36% 300
800 39% 410
1000 48% 570

DOUBLE JEOPARDY

Clue Value CR DoT Matrix Value
400 29% 180
800 40% 440
1200 45% 670
1600 46% 930
2000 54% 1,220

What To Do About Daily Doubles

When I use these numbers at home, I round to the nearest hundred. That way I can pause the DVD and figure out Daily Double wagers based on what I think I would have at that point of the game. There are two rules for determining whether i get to make a DD wager. 1) I had to answer the previous question correctly and 2) I have to win a coin flip (since I figure my chances of being in control of the board when a DD is revealed are right around 50-50 if I knew the previous response). If I'm playing a DD I make a wager base don my score. If not, I ignore the clue.

Note that some folks recommend for those preparing to go on the show to ignore Daily Doubles altogether (or at least treat them like a regular clue). I tend to agree -- if your purpose in keeping score and tracking data is to prepare for The Show itself. But if you've already been on The Show, or if you're just playing along for fun and you don't anticipate using this data later, having realistic numbers on which to base DD wagers can add something to watching the game. Plus, even if you are preparing for The Show, it can give you a sense of what Daily Double wagering feels like.

And yes, I know, some DD wagers are scenario-based and not score-based. When those situations arise, I calculate the difference between my DoT Matrix score and the score of the player on TV, mentally adjust the other two players equally up (or down) accordingly, and decide on a wager.

NEXT: Assessing the field for the Tournament of Champions

Tuesday, November 1, 2011

The DoT Matrix -- Establishing a Baseline

So I've gone and done it.

A while back Ken Jennings posited in an interview that "Most Jeopardy players know most of the answers." So I decided to put that to the test.

Assume with me, if you will, that:
  • Some games are harder than others, but over the course of a five game stretch, the difficulty level of the clues will pretty much average out.
  • Some players are better than others, but the average skill level of a group of challengers is pretty consistent from one week to the next.
If we grant those assumptions, we can begin to construct a profile for an "average" Jeopardy player by analyzing how often clues go unanswered by all three.

If the probability of a given player not knowing the response to a clue is X, the probability of all 3 in a given game not knowing the correct response is X ^ 3.

So step one in building the DoT Matrix was to find out if Cube Rule is correct. I randomly selected 45 games from the last 3 seasons, where none of the players would eventually win 4 or more games. I then counted "Triple Stumpers" -- unanswered clues -- in each row of the game board.

I did not consider Daily Doubles, since their placement is essentially random and since only one player has an opportunity to respond to them.

The Triple Stumper Rate, or TSR, is our X ^ 3. So the cube root of the TSR is the probability that an average Jeopardy player will not attempt a buzz. 1 - X, then, is our Buzz Rate.

The problem we have at this point is the fact that lack of knowledge is not evenly distributed through the Jeopardy player pool. There are some clues so badly written that nobody is going to get them, and there are other clues in subjects so obscure that very few Jeopardy players ever consider learning them for the show.

I wrote one such category for an SHC a while back about Lawn Mowers. Never again.

But how far do those distribution irregularities skew the players' performance? Quite a bit, as it turns out. I tested the Cube Root Hypothesis against performance in Final Jeopardy -- the only clue inthe game that everybody gets a chance to respond to. If the Cube Root Hypothesis is correct, the correct response rate in Final Jeopardy should be roughly equal to the cube root of TSR.

Spoiler alert: It's not. In fact, the Cube Root Hypothesis underestimates the knowledge level of an average Jeopardy player by about 20%. Which means that if the Cube Root Hypothesis says that a typical Jeopardy player should only know 30 responses per game, he probably knows closer to 36.

So adjusting for CRH error, we find that the typical Jeopardy player will have a correct answer and attempt to buzz 35.56 times on every 60 clues he sees.

So if you've ever wondered how you compare to the players you see every night on your TV, there you go.

NEXT: Using Buzz Probabilities and Coryat Rules to calculate a score for yourself as you play along at home.

Tuesday, April 12, 2011

On this Day In History

1606 -- James I combines the flags of England and Scotland into what later came to be called the "Union Jack." In 1707 it would become the official flag of Great Britain.

1633 -- The Inquisition begins against Galileo.

1772 -- Henry Clay is born.

1861 -- Confederate forces bombard Fort Sumter. If their aim had been better, there might have been no baseball.

1945 -- FDR complains of a "terrible headache" before lapsing into a coma and dying.

1947 -- Tom Clancy, David Letterman, and Dan Lauria (the dad from The Wonder Years) are born.

1961 -- Yuri Gagarin becomes the first human being to orbit the Earth.

1981 -- STS-1 Columbia, the first ever space shuttle launch.

1992 -- Euro Disney opens.

And in 1974, four days after Hank Aaron broke Babe Ruth's home run record, I came along.

Sunday, March 13, 2011

Bracket Guess, 2011

Here's my guess. I'm not actually very good at this, compared to other on the Internet, but it's fun to speculate.

1 -- Ohio State
16 -- Alabama State/UNC-Asheville
8 -- Villanova
9 -- Old Dominion
4 -- Connecticut
13 -- Princeton
5 -- Arizona
12 -- Clemson/Colorado
2 -- Texas
15 -- Long Island
7 -- Xavier
10 -- Gonzaga
3 -- BYU
14 -- Wofford
6 -- Cincinnati
11 -- Georgia

1 -- Kansas
16 -- Texas-San Antonio/Arkansas-Little Rock
8 -- Washington
9 -- Marquette
4 -- Wisconsin
13 -- Indiana State
5 -- St. John's
12 -- Memphis
2 -- San Diego State
15 -- St. Peter's
7 -- Utah State
10 -- George Mason
3 -- Syracuse
14 -- Bucknell
6 -- Vanderbilt
11 -- Michigan State

1 -- Pittsburgh
16 -- Hampton
8 -- Missouri
9 -- Michigan
4 -- Loser of Florida/Kentucky
13 -- Oakland
5 -- West Virginia
12 -- Virginia/Penn State
2 -- North Carolina
15 -- Northern Colorado
7 -- UNLV
10 -- Butler
3 -- Purdue
14 -- Akron
6 -- Texas A&M
11 -- Richmond

1 -- Duke
16 -- Boston U
8 -- Temple
9 -- Tennessee
4 -- Louisville
13 -- Belmont
5 -- Kansas State
12 -- St. Mary's
2 -- Notre Dame
15 -- UC-Santa Barbara
7 -- UCLA
10 -- Florida State
3 -- Florida/Kentucky winner
14 -- Morehead State
6 -- Georgetown
11 -- Illinois