The Deconstructing Sundance Story

It all began innocently enough. A motley group of techie folks whose company just happened to be located in Park City, Utah — the home of the Sundance Film Festival — wondered whether the same statistical modeling technology they used to help classify spam could also be used to predict the winners of the 2006 Festival.

The technology, known as a Bayesian statistical classifier, uses information about what has happened before in order to predict the future. In the case of spam this is easy to understand. For most people, the word "Viagra" in a message is a strong indicator that it is a spam message, whereas your company's name is a good indicator that it is not. When an unclassified message arrives you can add up all its indicators and make a prediction about whether it will be spam or not.

Sundance is really no different. To create inputs for the system, we gathered the last 15 years of Film Guides published by the Festival. The Guides contain a short review of the film as well as other information about it: the cast, the crew, whether it is shot to film or video, how long it runs, etc. We entered all of these features — more than 250 for each film in competition for the last 15 years. We then used data from the Internet Movie Database to classify movies based how successful they had been. Features that appeared in the descriptions of good movies became positive indicators, and features that appeared in bad movies became negative indicators. With that we had all we needed to predict the best future films of the Festival.

Bayesian classifiers work well when the author of whatever is being classified knows where the message truly belongs but is trying to hide it from the audience. Spammers write their messages to make them seem legitimate, but there are often telltale signs that can be recognized by statistical classification. Similarly, Sundance reviewers write their reviews to make every movie in the competition seem good, but do you really believe the Festival's programmers don't have a sense of what films are going to be commercially successful? Our algorithm simply recognizes patterns hidden in the descriptions and uses them to make predictions.

The proof is in the results. In 2006, we accurately predicted the films that would win the four biggest awards of the Festival before a single movie was screened. How 'bout them apples!

To learn more, check out some of the press coverage we received last year (Wall Street Journal, L.A. Times,

What 2009 Holds

This year we are back with a more sophisticated Bayesian algorithm, even more factors recorded for every film, and a whole room full of speedy computers to crank through the data. Even with our track record, please remember that this site is all in fun. Just like Quentin Tarantino directing Four Rooms, past success is no guarantee of future results. You shouldn't make multi-million dollar decisions with a studio's money based on our analysis.

That said, if you buy one of the movies that we've predicted to be a winner, and you need a little extra support justifying your decision, our engineers would be happy to attend your acquisition party in order to talk about the incontrovertible proof of box office success the math provides. To invite us to your Sundance party, just click here.

Where Else Might This Work?

Shortly after it became clear our predictions for the 2006 Film Festival were scary-accurate, we began to get calls from all sorts of people wondering what else we could predict. Reporters, Hollywood producers, day traders, and venture capitalists across the country wondered whether we could predict phenomena from weather to the stock market. The short answer: maybe. Systems like the weather would be difficult to gather enough data in order to make predictions accurate. On the other hand, it might be interesting to look at something like the auditor's footnotes in SEC filings in order to predict what company is likely to become the next Enron.

Have a project that you think we may be able to help you with? Don't hesitate to contact us.

Deconstructing Sundance | Abusing Statistics Since 2006