The Future

Read the full Kogan email: Researcher says Facebook is scapegoating him

The man who made the quiz at the center of the Facebook controversy accuses media outlets of grossly exaggerating the usefulness of his data.
The Future

Read the full Kogan email: Researcher says Facebook is scapegoating him

The man who made the quiz at the center of the Facebook controversy accuses media outlets of grossly exaggerating the usefulness of his data.

In a March 18 email obtained by The Outline, Aleksandr Kogan — the scientist who provided Cambridge Analytica with millions of Facebook users’ data back in 2014 — described the “predicted personality scores” he gave to the company as “actually not useful for micro-targeting.” Kogan attempted to defend himself to his colleagues by saying he has tried desperately to communicate that “micro-targeting is not a use-case for personality predictions from social media,” but media outlets keep suggesting that it is. Even with the vast trove of data he collected, Facebook users could not have been targeted individually with ads. However, that doesn’t mean that Cambridge Analytica wasn’t able to improve its algorithms with other sources without Kogan, or that the data couldn’t be used to inform a much bigger machine.

Though parts of Kogan’s email have been quoted by various outlets, it has never been published in full. CNN and Bloomberg both gave an overview of Kogan’s claims regarding his usage of Facebook data and touched on his allegations of inaccuracy.

Kogan wrote that “the predictions we gave SCL were 6 times more likely to get all 5 of a person’s personality traits wrong as it was to get them all correct.” David Carroll, an associate professor at Parsons School for Design who is suing Cambridge Analytica for the data it has on him, elaborated on this to The Outline in a phone interview: “The way that the psychometrics work is the more data you put into it, the more accurate it gets,” “There's an ‘R value,’ which is the accuracy value. It's basically like, are the predictions [the model is making] above average in accuracy.”

According to Kogan’s email, the R value for the predicted personality scores he shared with the SCL Group (Cambridge Analytica’s parent company) was shockingly low, only “.3,” or 30 percent. “The correlational accuracy between predicted and actual scores was around r = 0.3 for all big five traits,” wrote Kogan. “Models with that sort of accuracy tend to predict most people to be near the average—when in doubt, the average is the most sensible prediction.”

Meaning, at the time, even with the data of at least 30 million people, the predictive model shared with Cambridge Analytica basically wasn’t any more accurate than flipping a coin. Though, that’s not to say that Cambridge Analytica wasn’t able to increase the accuracy of the model over time, as Kogan provided the company with the data in 2014.

“My hypothesis is that they continually enriched the model throughout 2015 during the primaries,” said Carroll. “The Carson campaign and especially the Cruz campaign were sucking up vast amounts of data in various invasive ways and increasing the accuracy of the model — getting this value higher and higher and higher… If you look back on the Cruz campaign’s practices they were the most invasive in history, there's reporting that shows that the Ted Cruz mobile app would automatically upload your address book contacts to the mothership, so those got into the hands of Cambridge Analytica,” Carroll told The Outline.

If Cambridge Analytica was able to increase the accuracy of this model by more than 20 percent, the extent to which the company could predict and influence the electorate would be frightening. These worries are only compounded by the fact that this model exists separately from the data collected from Facebook and elsewhere. Even if Cambridge Analytica deleted all information obtained from outside sources, as it told Facebook it did, this prediction model would still exist.

Kogan did not respond immediately to a request for comment, but we will update this post if we hear back. Here is the complete copy of the email Kogan sent to his colleagues at the University of Cambridge’s psychology department on March 18:

Hi Everyone,

I know there’s been a lot of concern raised over the news stories published today, and though I’ve already been in contact with Mark and the University’s PR team about it this morning, I wanted to also write to everyone else to give some clarity on the allegations (and what is and isn’t true).

It’s been honestly a surreal week: I’ve been asked quite seriously by reporters from the NY Times and the Guardian if I am a Russian spy. I really tried to explain that one seems just silly. If I am Russian spy, I am the world’s dumbest spy—I did, after all, change my last name to the James Bond villains when I got married. That’s really leaning in, I guess! Nonetheless, there is an infographic in the Guardian with an arrow from the Russian government pointing at me. For those that don’t know that story, we chose Spectre as a derivation from Spectrum—we wanted to choose something that was related to light since we were both scientists and religious, and light is a strong theme in both. We ran into Spectre because my dad’s surgeon was named Jason Spectre when he was sick, and we thought it was a really cool name and hit the light theme perfectly.

I’ve also seriously been asked if the FBI has reached out, if the two congressional committees in the United States have reached out, and if Parliament or any authorities in the UK have reached out. No one has—I suspect they realize I’m actually not a spy. Though if anyone does, I’d be more than happy to testify and speak candidly about the project.

Anyways, before I detail the project, I wanted to share some of the internal actions I have taken within the University about this project for the last few years. The first news of the project was reported in December 2015. The day of the article, I setup a meeting with Trevor, explained to him everything that happened, and then worked with the university PR team on the best response. Since then, multiple articles have been published with effectively the same claims. I’ve done my best to provide the department and the research office a description of reality, and the evidence that I had to back it up.

This new set of articles makes a few new claims—that Paul highlighted—that particularly warrant some quick comment. First, my post at St Petersburg State University (SPSU), this is mostly an honorary role. I have visited the University I believe 3 times total in the two years I’ve had it. As many of you know, I was born in the former Soviet Union and immigrated to NYC when I was 7 years old. But I do love visiting St Petersburg since it’s an absolutely beautiful city. Before I took the role, I asked Trevor and then subsequently the research office for clearance. I was given the green light. The grant that they cite was to a set of colleagues at SPSU. I was named on grant to help its chances of getting funded. In terms of the work, I did quite honestly very little on it—I had a couple of meetings with them to discuss their methods and results. I’m not even an author on any of the papers published by the SPSU team. The team also visited my lab at Cambridge once for a mixer with my PhD students. On my 3 trips to St Petersburg, all were no longer than a week, and it was mostly me giving a workshop on Regressions in R, and a few talks on how social media data CANNOT be used effectively to make individual-level predictions.

Second, the claim that Facebook shut our app down and then I got it reactivated by telling them it was for academic research is a fabrication. If my memory is right, we hit the API rate limit, and once we paused collection for a minute, were able to continue. There was no exchange with Facebook about it, and, as I detail below, we never claimed during the project that it was for academic research. In fact, we did our absolute best not to have the project have any entanglements with the University.

The genesis of the project occurred 4 years ago when one of our department’s PhD students introduced me to SCL. They were interested at the time in general survey consulting. I later introduced SCL to David and Michal from the Psychometrics Centre. Over the course of a few months, SCL updated its interests and decided that they wanted to buy the myPersonality dataset from David. He eventually declined on the grounds that when he collected the data, he told his users it was going to be for academic research. So then a new proposal was made that we would collect Facebook data and use David and Michal’s personality API to make predictions about people’s personalities. The initial plan was that I would form a company, called GSR, and do the data collection, and then David and Michal would make the predictions. Eventually, SCL asked for David and Michal were removed from the project because of a disagreement on monetary compensation, and GSR took over the role of both data collector and predictor.

I originally created my Facebook app in 2013 to be used for academic research, which my lab used for a number of studies. At that time, the app description did indeed state it was for academic research and I mentioned the research was for my lab at the University of Cambridge.

Later, in 2014, once I formed GSR, but before starting the project, I moved the app into GSR, and (a) changed the name of the app to GSRApp, (b) changed the logo, (c) changed the description, and (d) changed the terms and conditions. In fact, the only aspect that was the same about the app was its app ID number. In the GSRApp, we made clear the app was for commercial use—we never mentioned academic research nor the University of Cambridge. The project was in fact done entirely through GSR—the university played no role. In the Terms of Service of the GSRApp, we clearly stated that the users were granting us the right to use the data in broad scope, including selling and licensing the data. These changes were all made on the Facebook app platform and thus they had full ability to review the nature of the app and raise issues. Facebook at no point raised any concerns at all about any of these changes. Thus, we operated with the understanding that Facebook was fine with the updated nature of the app. We also did get assurances from SCL that their lawyers thought what we were doing was perfectly legal and within the frame of the Facebook ToS. Sadly, I did not get my own legal council at the time—a big mistake in retrospect!—since we weren’t getting paid for the work (our compensation was the data), and we all know that Cambridge academics aren’t exactly exorbitantly paid.

The app collected data from about 250,000 users. We recruited them through Qualtrics, had them complete a number of surveys, and authorize the GSR app. Through the app, we collected public demographic details about each user (name, location, age, gender), and their page likes (e.g., the lady gaga page). We collected the same data about their friends whose security settings allowed for their friends to share their data through apps. Each user who authorized the app was presented with both a list of the exact data we would be collecting, and also a Terms of Service detailing the commercial nature of the project and the rights they gave us as far as the data. Facebook themselves have been on the record saying that the collection was through legitimate means—though they have chosen to not talk about how I changed the app’s ToS, name, and description for the project, and instead have only talked about the app’s initial version and also its third version (which came after the SCL project) and was the personality test.

We eventually provided SCL with 30 million people’s predicted personality scores (all of them Americans). The predictions themselves are actually not useful for micro-targeting. The correlational accuracy between predicted and actual scores was around r = 0.3 for all big five traits. Models with that sort of accuracy tend to predict most people to be near the average—when in doubt, the average is the most sensible prediction. So to correct for this, you can assign people into percentiles or spread out the scores to look more realistic (which we did). But this inflates the inaccuracy to the point that the error is BIGGER than simply assuming everyone is average. In fact, from our subsequent research on the topic, we found out that the predictions we gave SCL were 6 times more likely to get all 5 of a person’s personality traits wrong as it was to get them all correct. In short, even if the data was used by a campaign for micro-targeting, it could realistically only hurt their efforts. I’ve tried quite hard to explain this to reporters—even giving a very long paper going over proof after proof illustrating this. But almost none have actually reported it. They have chosen instead to run with the story of the scores being highly accurate, and thus, influential.

After the first article broke in 2015, we worked with Facebook to try to correct any issues they believed occurred and we deleted all of the data.

The experience has also been quite frustrating in terms of reporting. Almost every story I have seen has gotten big and small details wrong. They almost all get the big take home message wrong too—that micro-targeting is not a use-case for personality predictions from social media. And for when a reporter can’t explicitly state something, they will state facts that are highly suggestive. The one that really got me was waking up to the Guardian having an infographic with an arrow from the Russia government pointing to me. Sure, this is true, but probably sensible then to have arrows from the US, UK, Canadian, and Chinese governments also pointing to me since they have all also funded my research at one point or another.

I ask you keep this letter in confidence. It’s been a trying day, but I respect all of you as colleagues and believe it’s important for the department to have the facts, so I wanted to write to all of you as quickly as I could.