Wang's statistics blog forecasts 2012 election
Since the 2004 presidential election, the Princeton Election Consortium, a blog run by molecular biology and neuroscience professor Sam Wang, has given political junkies a place to visualize and interpret massive amounts of polling data under one streamlined, statistically consistent roof.
Wang’s interest in the 2004 election campaign led him to look at the evidence more carefully. He recalled the odds at the time favoring a victory by Democratic Sen. John Kerry, as the candidate only needed to win one more swing state: Florida or Ohio. After discussing the election with a colleague who agreed with this argument, Wang decided to launch a more formal investigation of the data.
“I thought about doing a more formal version of that using a computer program,” Wang said. “The problem with polling data in the news was that the polls came out one at a time, and that is what scientists regarded as a very high-noise situation. I thought it must be possible to do better.”
Wang began writing a program that let him analyze data from many different polls. He posted his results — coded by hand — on his homemade webpage, as well as in several political forums. The results soon went viral.
“This was a time when few people were doing this, and I got thousands and thousands of hits, and my email inbox became flooded,” Wang said. “It was then I realized I had something.”
The 2004 electoral cycle was the first presidential election in which so much polling data was available, and soon Wang found himself one of a “handful of geeks for whom this became an activity.” The activity became popular enough for Wang and several other bloggers to be profiled in the Wall Street Journal, in an article Wang said “did a good job of capturing the funny community of people who had a similar idea at more or less the same time.”
By focusing on what Wang called the “meta-analysis of polls,” he was able to provide a daily snapshot of what the predicted electoral college vote would look like at any given moment. He continued adding to the site, which attracted over a million hits as the 2004 campaign went on.
The site calculated the exact actual electoral outcome.
In 2008, Wang was joined by Andrew Ferguson ’08, who automated the entire data collection and analysis process. As a student in the Operations Research and Financial Engineering department, Ferguson learned in the classroom how to process this data, giving him a basis for dealing with polling errors and ways to combine estimates from multiple polls.
“Not every polling site had the same statistically, mathematically sound way of data analysis, and our spin on that was to try to be very faithful to the methods and describe what we do differently,” Ferguson explained. “Statistics is complicated — it’s hard to understand how simple examples apply to day-to-day encounters. It’s an educational service in some ways.”
For those reasons, where before he had to check which polls had available data and manually process it in Matlab, now Wang’s day-to-day work on the site mostly involves writing. The site is now called the Princeton Election Consortium and features posts from the University community across disciplines.
“When I fire up the site, I look at the results now like everyone else,” Wang said. “I purely come up with new ways of looking at the data and writing a little essay each time.”
The level of interest in the Princeton Election Consortium has not faded with the passage of the last two presidential elections. Rather, the range of the blog’s readers has expanded as the methods of analyzing big data become a more relevant topic.
“I get feedback and commentary from leading political blogs and everything from pension analysts in Mexico to street theorists at Rutgers to crystallographers,” Wang said. “Few of these people I have ever met, but the best comments can come from people whose voices I’ve never heard.”
Computer science and public affairs professor Edward Felten said that this kind of work was a good example of how big data can provide a perspective on current politics.
“A lot of work on big data is about very detailed questions of direct interest to only a small community, but here you have a use that goes to the heart of a main topic of interest,” Felten said. “The ability of a Princeton professor to gather and process data to draw conclusions from it is something that would not have been possible in a previous generation.”
This year, Wang and Ferguson have implemented new methods they hope to use to gain more insight into the outcome of the election rather than just snapshots. It will be the first step in the process of making predictions about what will occur in November. For now, though, both expressed satisfaction with maintaining the blog at their current pace.
“We can’t learn anything about our methods until the election actually happens,” Ferguson noted. “We haven’t added too many things. We did a good job in 2008, and I try to not mess with success.”