Monday, January 9, 2012

What Makes An Easy Question Easy? (DAGGRE.org)

For most of last year I have had the privilege of working with the DAGGRE Team on the Intelligence Advanced Research Projects Activity's (IARPA's) Aggregative Contingent Estimation (ACE) Project.  While all the real scientists have been busy exploring research questions involving Bayesian networks and combinatorial markets, this old soldier has been focusing on more mundane things like "What makes an easy question easy?"

(Note:  If you have not had a chance to check out the DAGGRE.org site and its mind-numbingly cool companion blog, Future Perfect, you should.  Three reasons:  First, it is pretty interesting research that could impact the future of the intel community.  Second, you can actually participate in it.  Third (and maybe most important to many of the readers of this blog),  the DAGGRE team has gone the extra mile to make sure your personal data, etc is secure while participating (check out the FAQ page for all the details)).
As I explained in this post, having some way to evaluate and even rank intelligence reqyuirements according to difficulty is important.  Analysts are supposed to be accurate but if you aren't also evaluating the difficulty of the underlying question, two equally accurate analysts could be miles apart in terms of overall quality.  It would be kind of like saying a little leaguer who hits .400 is as good as a major leaguer hitting .400.

While that distinction is easy to see in baseball, it is much more difficult in intelligence analysis.  Questions come in all shapes and sizes and vary in an enormous number of ways.  There is also a psychological, subjective aspect to it:  Questions that seem tough to some analysts may seem very easy to others.  On its face, it appears difficult if not impossible to come up with a system that can reliably evaluate and categorize questions by difficulty level.

Which is why I want to try.

And I may be making some progress.  I think I have figured out how to spot an "easy" question.  DAGGRE, you see, is a predictive market.  This means that people assign probabilities to the outcomes of questions.  Imagine, for example, I asked if Sarkozy would still be the president of France on 1 JUN 2012 (He is running for re-election in April and May).  Now imagine that you thought the odds of Sarkozy's re-election were 80%.  You could establish your position in the market at that "price" and others would be free to do the same (The Iowa Electronic Markets do this for the US election, by the way).

The market would reward people that were right on 1 JUN and would heavily reward those that were right when lots of others were wrong.  Studies have shown that, on average, these types of markets are pretty good at making these kinds of estimates.

Now, imagine if I asked you to estimate the chances that Sarkozy would still be president of France by 1700 tomorrow?  Sarkozy is not sick (at least I hope he is not) and there are no direct, immediate threats to his presidency.  There is no reason to expect that he would not still be president tomorrow. Likewise, successfully predicting that he will still be in office tomorrow is no sign of great analytic ability.  The question is too easy.

Generalizing this pattern, I think it is worth exploring the idea that "easy" questions are those that start and end their run on a predictive market close to either 0% or 100% probability, do not vary much during the course of that run and, finally, resolve in accordance with their probabilities (i.e. they happen if close to 100% and don't happen if close to 0%.  See the picture that accompanies this post for an idea of what such patterns might look like).  Furthermore, I think that these kinds of questions will see much less trading activity than other ("not easy") questions.

Of course, the problem with this definition is that it only identifies easy questions after the fact, after the question has been resolved.  My hope, however, is that by examining the set of questions that we already know are easy (at least under this definition), that we might be able to see other patterns that will allow us to identify easy questions when they are asked rather than only after they are answered.

Our (I say "our" because I am working on this with one of our superb grad students, Brian Manning) first attempt to get at these patterns will be a simple one -- question length.  We hypothesize that, on average, questions that match the "easy" pattern I described above will be shorter than other questions.  When you think about it, it makes some sense.  After all, "What time is it?" seems like an easier question to answer than "What time is it in Nigeria?" 

Brian has found some research that says that, subjectively, people don't perceive longer questions as necessarily more difficult.  The difference, of course, is that we have a definition of "easy" that is based on objective criteria.  Still, I think it best to start with the easiest possible measurement and then go from there.  Not sure where I will end up or if this will be a dead end but I will keep you posted...