Cambridge Analytica is in the headlines – accused of scraping the data of 50 million Facebook users, under the guise of academic research, and then misappropriating that data for its lucrative political work, including on behalf the presidential campaign of Donald Trump. The revelations have sparked a public apology from CEO Mark Zuckerberg and the suspension of a key Cambridge Analytica executive.
In the days after the 2016 presidential election, Rolling Stone conducted a lengthy interview with Matthew Oczkowski, then Head of Product at Cambridge Analytica, who embedded with the Trump campaign’s digital operation in San Antonio. Oczkowski detailed how Cambridge believed it had helped Trump beat Hillary Clinton, emphasizing that the data his firm collected gave the campaign confidence that there was a hidden wave of Trump supporters, who did not show up in traditional modelling of the electorate.
Despite the Clinton campaign’s own prodigious data operation, this tide of unlikely, rural and small town voters appeared to blindside the Democrat’s campaign until the final days, when it was too late to meaningfully respond. “To their credit it’s a really hard thing to see,” Oczkowski told Rolling Stone. “If you have voters with really spotty voting records, there’s no data to incline you that they’re going to show up.”
Rolling Stone’s post-mortem on the 2016 election was quickly shelved – the urge to look backwards was overtaken by the demands of covering a chaotic new president who threatened our democracy. We are publishing the interview with Cambridge Analytica now to add to the public record. There are no smoking guns here, rather a meaty conversation that provides depth and context for how the controversial firm says it used data, polling and voter profiling to help Donald Trump secure the presidency.
Rolling Stone: What was Cambridge Analytica’s role in the campaign.
Cambridge Analytica: We moved down to San Antonio, a team of us. There were six people on our data science team – 12 people from Cambridge total. The reason we were hired, the reason we pitched the campaign, was to attempt to quantify “the Trump effect”: How Donald Trump and his supporters are different from generic Republicans, which is a really interesting problem to solve.
So how would a Trump voter differ from a typical Republican? One, Trump supporters are often more white, more rural, and younger than traditional Republicans. And the issues they care about are also quite different. If you look at the profile of a typical Republican, the three main issues they care about are jobs and the economy; national security; and big government or over-regulation. For a Trump supporter, the top issues we see are law-and-order; immigration; trade and wages. The profile of these people – the makeup – is quite different.
Our worry was that if typical pundits fell into the trap of assuming this was going to be a Romney-Obama-like election from 2012, a lot of predictions would be way off. We’ve been studying this “disenfranchised” voter for a long time. A lot of people were skeptical that they existed, or that they would come out of the woodwork. A lot of the movement that people saw in the last month of the election was this effect taking place. These are people who are 0-in-4 or 1-in-4 voters – which means that they have very spotty voting records: They have voted one, or zero times in the last four general elections.
When did you first see data that this wave of unlikely voters was real, and rideable?
A question I get a lot that ties into this is: Why was public polling so wrong? Even early on with us being on the campaign, we were doing a lot of polling for the campaign too. We were conducting 1,500 surveys a week, in every battleground state – which is a ton of polling – rolling every day for three months.
We started to notice that how you conduct polling, and the sample you poll from, is massively important in understanding the electorate. Usually, you pull a voter file of people you think are likely voters. You research them, and then you come back with an understanding of the electorate. The problem was, most of the time, to come up with the set of likely voters, pollsters use vote history: They’ll take a sample from 2012, who voted there, make some weights and extrapolations get their sample and go forward.
We knew that this race was going to be far different from 2012, just in terms of the types of people who stood out – mostly from [the difference the campaign was seeing in] donor profiles, people who engaged with the campaign. But we really didn’t have any conclusive data to back that up until absentee-ballots and early-vote returns started coming in.
What was the date on that?
Our big understanding of shifts all happened about a month leading into the election – so early October then leading into two weeks prior. That’s when we really started to do a lot of investigation. Our real confirmation came from these absentee ballot and early vote returns.
If you were to look at data as it came in, you would have noticed three major trends. One: African American votes were way down. Two: Hispanic votes were marginally up. But three: 65+ rural voters were really high from the returns we saw. This meant one of two things. Either it was an enthusiasm question — where folks who normally vote on election day were so excited for Trump they were voting early — or this was indicative of a larger trend that would carry into election day, which I don’t think a lot of people suspected.
If you were to look at those three scenarios, your political intuition would say that looks pretty good for Republicans. The decrease African American and increase in Hispanic [voting] will cancel each other out. Then the 65+ vote was only going to help Republicans in rural areas.
How did you react to this data?
What we did then is say: We have to re-look at this electorate. So we re-weighted all of our polling and re-weighted all of our modeling. And we saw really good results in the Rust-Belt – a 1- to 3- percentage point increase. In almost every state in the Rust Belt – so Michigan, Pennsylvania, Ohio, Wisconsin, Indiana and Iowa. States like Pennsylvania got almost a three percent boost. States like Michigan got closer to two. Wisconsin closer to one depending how you look at it. From there we were able to make a lot of conclusions and focus budgeting.
In the two weeks leading into the election, we felt good about Pennsylvania and Ohio. We pretty much knew those were in the bag, which makes the map very dynamic at that point. Florida became the linchpin for the election very quickly.
So Pennsylvania was looking golden, in your model, and that made the race dynamic – that was your breach of Clinton’s firewall. But you needed then to have Florida to make the math work?
It really came down to understanding Florida and turning that out. If Trump won Florida, our models had him with a probability of victory over 50 percent. We actually built a tool that’s called an Optimized Path to 270 Calculator, which simulated about 100,000 outcomes of the election and optimized based on polling and internal modeling data we had – what the easiest path to 270 was. And we were able to provide simulations back to the campaign: here’s your best way to 270 based on what we’re seeing right now. That was used by [Trump digital director] Brad [Pascale] and others to make budgeting decisions.
You mentioned 1,500 surveys? Help me understand the analytics behind what you guys do. Give me some texture about the tools and data you were working with.
We had three teams that worked on Trump campaign. We had a research team: people who did polling. We had data science team. And we also did digital marketing for the campaign, advertising.
We used, typically, three different streams of data to do data science on a model of politics. The first source is voter data – data we get from the RNC, political data on people’s vote history and also ID work from door knocks and phone calls: what issues you said you cared about in the past.
Cambridge also owns a lot of commercial data, which is demographics, geographics, interests, purchase history – that commerical set that really makes up an individual. The third thing we used was internal, first-party data – that is donors, people who buy stuff at the store, attendees, website signups, petition signatures, things like that.
In collaboration with the RNC, the campaign, Brad and all those guys, we merged that data together. The problem is that data is static. The only way to make dynamic is to do research and polling.
It was a very circular flow, the way the process would work: We would go into the field over the course of a week. We would collect 1,500 surveys, per state, of all 17 battleground states.
These are online surveys?
We used mixed method. A combination of online surveys, auto dial to landlines, and then live calls to cell. We tested that methodology. It’s proven it’s the most representative and accurate way to get survey responses. We’re polling every day, 200 every day, which rolls to 1500.
That survey data gets delivered to our data science team, real time, via API — an automatic connection that feeds the polling work into our database. We get our polling in, we then use that to enhance our internal data and modeling. It allows us to re-run our models every week. Every Wednesday we’d update who we thought a persuadable voter was, who we thought a GOTV [Get Out The Vote] target was. All of our issues modeling – the issues we thought individuals cared about.
And we delivered those “audiences” directly to the digital marketing team and other media organizations within the campaign. So we’d build an audience and go to the creative team and say this its the set of people in Florida we want to target. Hispanics in Southern Florida. Here’s the issues they care about, here is the type of tonal message they like to hear or see, here’s the geography we have to focus on.
And then they would go create an advertisement, based on our suggestions, to those people. And then we’d go in, do research polling again to prove that it worked, and then run this whole process over, and over again, on a weekly basis.
Did all this data give you confidence going into election day?
We were never like: Oh sure, we know we’re going to win the election. A lot had to break his way.
But your data gave you a fundamentally different view from the way Clinton saw the race. Did you have a sense that they were flying blind, that they weren’t seeing the data, or weren’t interpreting the data the way you were?
What they didn’t see was this very increased [turnout among the] disenfranchised rural voter. To their credit it’s a really hard thing to see. If you have voters with spotty voting records, there’s no data to incline you that they’re going to show up. Until you actually start looking at early returns.
It was an interesting chess match for six months with the Clinton campaign. In the beginning, leading into the first debate, we had board control. Trump was visiting places trying to expand the map in Minnesota, New Mexico. After the first debate, when poll numbers started to dip, Hillary gained back board control. She was trying to push into Arizona and Georgia, other Republican areas. After the third debate — and particularly after the FBI opened up her investigation again – the game board control came back in Trump’s favor.
What we do in politics is both an art and a science. The art comes into play when you have to use political intuition. If I’m the Clinton campaign and my polling is telling me Trump is only down two points in Wisconsin, political intuition would say a Republican hasn’t won that since 1984. That can’t be right. I wonder if they let the data truly speak to them, if they were seeing the same thing and it wasn’t just too much art being put into the mix.
It seems like the Clinton campaign [eventually] caught on – but it was a little bit too late. They had to divert resources. They started going heavy into Michigan. They were into Virginia again. Virginia was very close. Having to spread that map so fast, so late – they couldn’t catch up.
How did you ultimately win in Florida?
The problem we saw with Florida was the major Hispanic population. Twenty five percent of our persuadable voters were Hispanic. Much more than in any of those Rust Belt states. So the fear was that if Hillary Clinton took a large enough clip of the Hispanic vote – more than is historically normal – she’d have a really fighting chance of winning Florida.
What happened is she didn’t win as much of the Hispanics as she thought. Our assumption is that a lot of these older Hispanic voters voted Trump – these folks don’t like illegal immigration, particularly in the Cuban community. And then Trump ran up the score in the rural counties of Florida. We knew by 8:30 central time that there was no way she could make up the votes in Florida – even though Miami-Dade, West Palm Beach, Broward County were still yet to come in. He had run up the score so much in the Panhandle, some of these rural communities, outperforming Romney by 10 to 30 percent in some of these cities. The math just didn’t make sense for her at that point.