David Carroll was fighting Cambridge Analytica before it was cool

Paris Martineau Mar—21—2018 05:22PM EST

Cambridge Analytica claims to know you better than your closest friends. David Carroll, an associate professor at Parsons School of Design, wants them to prove it: He is undertaking a case that may earn real data protection for users around the globe, including Americans, thanks to European laws.

Last year, Carroll got his hands on his official Cambridge Analytica voter file from the 2016 U.S. election. He soon realized it was missing essentially all of the most important information. What’s more, Carroll saw something fishy: the file he requested was signed not by a head of Cambridge Analytica, but by a much higher authority at a U.K. government contractor, Strategic Communication Laboratories, now known as SCL Group.

While Cambridge Analytica’s status as a defacto offshoot of SCL Group is now well known, the connection wasn’t as clear in early 2017. SCL Group’s roots place Cambridge Analytica’s activities under European jurisdiction, which means European data privacy standards likely apply. So, Carroll decided to sue, for both his complete voter data file and every American’s. And thanks to the EU’s new data protection laws — which kick in in May — his attempt to force transparency upon one of the world’s shadiest companies might actually work.

I gave Carroll a call to talk about Cambridge Analytica, data protection, and his potentially landmark case.

So you requested your data profile from Cambridge Analytica back in 2017 and decided to sue the company on the grounds that what they actually ended up giving you was an incomplete profile. How did you end up determining there was missing information?

I had been very suspicious of the company as a kind of psyops contractor/dark arts Black Ops unit. When the chief operating officer Julian Wheatland signed my data protection disclosure letter it caught my eye, because I was down this rabbit hole and I was like, “Why the hell is this guy signing my letter?”

There were other forensic clues too, like the email address that I got the disclosure form from. They didn’t do a good job of hiding this in the beginning, but now that we [have] all of these revelations from Christopher Wiley and Channel 4 and the work of The Guardian and the Times, [we can tell] that Cambridge Analytica is kind of a fake company. It was literally made potentially for Bannon, as described by Wiley, and those legal documents do indicate that. As the Times said in its coverage, Cambridge Analytica LLC is a shell company — a place to hide the Mercer money — but it’s actually strongly affiliated with SCL [Group, formerly known as Strategic Communication Laboratories] in different ways and ultimately it’s all part of the SCL group. Parliament asked Nix if data can be shared between Cambridge Analytica and SCL group and he replied the data can be shared from Cambridge Analytica to SCL group, but SCL group can’t share its data down to Cambridge Analytica. That’s because SCL group has X-list security clearance as a military contractor.

Jesus.

Yeah it’s horrifying. So I knew the general background of this, so when [Wheatland] signed the letter I was like, “What the hell is going on here?” Once I posted it on Twitter and got the British scholars and academics and legal minds checking it out, they were like, “This is not legal.”

Right, because Wheatland was a chairman at SCL at the time, not Cambridge Analytica. So the fact that he was signing your Cambridge Analytica data protection disclosure was obviously pretty fishy.

So that led me to find Ravi Naik, our solicitor, and that’s when I started the process of pursuing this.

What specifically led your team to believe that the data Cambridge Analytica had given you was incomplete?

There are two big reasons why we know it’s not complete: one is very obvious and public, and one is very sophisticated. The first one is obviously if you look at how Cambridge Analytica advertises itself to its potential clients; it says that it has 4,000-5,000 data points per voter. So you look at the data that I got, it’s a tiny fraction of that. It’s maybe a baker’s dozen?

The much more sophisticated answer to that question is embedded in the expert reports that are attached to the claim as it was filed: both independently came to the conclusion that there’s lots of evidence that it can’t possibly be complete.

How did they come to that conclusion?

If it’s complete data, you should be able to look at the data and arrive at the model without any additional data. My name, zip code, age, party of affiliation — which is the known stuff in the voter file — they can’t generate a political profile from that. [The generation of a political profile] is too nuanced — in particular, they both call out the very high ranking of gun rights on my file. They were able to look at a couple people who had also gotten their files and were willing to submit it to the legal team. There needs to be something else to explain how they got the model.

One of the experts that we got is David Stillwell, who is one of the original founders of the whole psychometrics model at Cambridge University. Kosinski, Stillwell, and Kogan are the original creators of the techniques. (The story of course is that Kosinski and Stillwell could not agree to SCL's terms, and so they did not participate.) Kosinski left Cambridge and went to Stanford, Stillwell and Kogan stayed at Cambridge, but Kogan went on form GSR. Having Stillwell explain this is really useful for our court because he is just a super credible assessor of this because he knows what he's talking about.

So, you’ve filed this lawsuit in Europe. Is there any sort of legal precedent for a case like this regarding the exploitation of user privacy on such a massive scale?

In Europe they have a model that we don’t even have in the U.S.; it’s a completely foreign concept to us. Europe basically has three categories of legal definitions [when it comes to personal data protection]. You have ‘subjects,’ which are people whose data is processed. You have ‘processors,’ or people who can process data and you have ‘controllers,’ the organizations that actually, well, control it, sort of the master custodians of it. And in the U.K., every data controller has to register with the Information Commissioner, which is their regulator. The Information Commissioner is an agency that we don’t even have in the United States, where people can complain if they feel that their information has been abused or their privacy has been abused. Like, imagine when your identity is stolen, like you could actually do something about it. [laughs]

The legal team also is fully confident that actually every U.S. citizen has rights in the U.K. The question is, can I prove it in case law?
— David Carroll

Wow. What a world that would be.

I know! The Google Spain case really established this idea in Europe that, by processing data in Spain, Google had to deal with Spanish authorities, so where data is processed is significant to the European structure. When I requested my data [from Cambridge Analytica] in January and received it in March, that was not only proof that the data was processed in the U.K., but also it gave me the legal status of a subject which gave me the ability to complain to the Information Commissioner’s Office (ICO). They took my complaint extremely seriously and added it to their internal investigation. The legal team also is fully confident that actually every U.S. citizen has rights in the U.K. The question is, can I prove it in case law?

Why do you think America is so far behind Europe when it comes to these sort of protections?

Two reasons, primarily. One is just our sort of pro-business, pro-free-market attitude — especially when the Republican Party is essentially the ruling party in the United States — and the fact that these are American companies. This is Silicon Valley. This is the proper pride and joy of our innovative sector. Lawmakers and regulators are captured by the industry, and lawmakers are completely deferential to these homegrown businesses. There's no appetite to challenge their power because this is America.

When you ask Americans what privacy means they would say things like, the ability to keep a secret, which is actually a very unsophisticated understanding of what it really is.
— David Carroll

The second reason is more cultural. Europeans have a different cultural idea, I believe, about what privacy means and signifies. They strongly associate privacy with dignity. When you ask Americans what privacy means they would say things like, the ability to keep a secret, which is actually a very unsophisticated understanding of what it really is. Europeans tend not to even use the word privacy; they understand it as a fairly meaningless term, especially from a legal perspective. And that's why they prefer the term data protection.

What is the effect of this change in terminology?

Data protection is a legal concept that not only suggests property rights and civil rights, but also something you can prove and disprove, like you can prove or disprove whether your data was protected or not. It’s really hard to define privacy, let alone determine if it was injured. From a legal perspective they’re way ahead of us, and we need to catch up fast.

Perhaps this is a bit of a cynical question, but do you see a realistic scenario where the U.S. would actually be able to catch up with these laws?

My most optimistic view is that it is inevitable through market dynamics. In May when the EU’s GDPR [General Data Protection Regulations ] are enforced, I believe it will economically set the standard for the entire data industry because it will be too expensive to make an internet that is GDPR-compliant and then another internet that’s not GDPR-compliant.

May isn’t that far off. Have you seen any examples of this play out in the real world?

At Davos, Sheryl Sandberg announced that Facebook was adopting the GDPR as its global standard for all 2 billion users and that they would be redesigning their own privacy settings by launching what they call the Global Privacy Centre, which hasn’t been released yet but I’m extremely interested to see what that will look like. This is their attempt to be GDPR-compliant, because they understand that it’s impossible for them to determine who is an EU citizen and who is not, So they just might as well assume that they have to conduct business as if only EU citizens were using the service.

The issue I’ve always noticed is that technology generally moves too fast to properly regulate. By the time U.S. lawmakers are able to get something on the books, it’s old news.

The Europeans and the British are really good at this stuff. Brussels is the Silicon Valley of regulation, and there are some beautiful things in the GDPR that are very forward thinking and trying to forecast the future issues. For example,when we get to the point of designer humans, there are provisions to protect DNA information in a data context.

There are rules about how data collection disclosures have to be in language that children understand. They believe in and like their government. They pay taxes, they like the services they get, and they like having a governed society.

Wow, that seems like a fantasyland in comparison to America.

America is so anti-government and so pro-business in comparison and so knee-jerk attitude is ‘Oh government couldn’t possibly do a good job of this!’ But that’s such an American idea. Across the pond they actually think that if we put our minds to it we can we can do this. And they did. I mean, it remains to be seen, but the fact that they have forced Facebook to adopt it shows that the only thing that’s going to check the power of Facebook and Google is the EU. It’s the only thing that has the muscle and the cultural values to do it.

Obviously these policies are relatively new, but have you seen some concrete examples of them being effective when it comes to actually checking these seemingly uncheckable tech giants?

Yes, so far, the fines that they have been levied against [these companies] have been not large enough. One of the key things that the GDPR does is increase the fines to a level that will be more than the cost of doing business. So, for Facebook and Google we’re talking about like multiple billions of dollars.

The other thing is the Europeans are not deferential to Silicon Valley because it is not their business. If you watched when Facebook and Google and Twitter got dragged to Congress, the American lawmakers would always — especially the Republicans — open their remarks and spend half of the time gushing over the companies: “Oh you’re so innovative! You’re the engine of our economy! I’m so proud that you’re an American company!”

We need regulators in other countries because you get this nationalist business loyalty, and the U.S. lawmakers just can't stomach the idea of inhibiting an American business.
— David Carroll

By comparison, when the parliament came to Washington, D.C. and had the same companies on the front line they had it out! They were asking tough questions and they were not at all like ‘oh you’re so amazing’ I mean, they said a couple of things but there was no deference. We need regulators in other countries because you get this nationalist business loyalty, and the U.S. lawmakers just can’t stomach the idea of inhibiting an American business.

Okay, last question: With all of this in mind, what do you think of the Cambridge Analytica revelations that have recently come to light?

The controversy over whether or not they deleted the Facebook data that they collected with the GSR app is a red herring. Did they delete the models, the algorithms, the software, the intellectual property that they derived from having the data in the first place? That’s the only question that matters.

You have to create a ground truth by putting data in, but then you can get rid of the data because you don’t need it anymore. It was just used to create the model. So somebody could ask you, “Hey did you guys delete the data?” “Yeah, we deleted the data, here, we’ll delete it right in front of you. Here, boom, it’s gone.” But they are not deleting the software that they built from it which actually can exist independently from the data. And so we’re asking the wrong questions, as usual.

This interview has been condensed and edited for clarity.

Did they delete the models, the algorithms, the software, the intellectual property that they derived from having the data in the first place? That's the only question that matters.

David Carroll

David Carroll was fighting Cambridge Analytica before it was cool

This professor is trying to force transparency upon one of the world’s shadiest companies.

The Outline