The algorithm is innocent

William Turton Oct—02—2017 04:54PM EST

For a few hours this morning, if you searched “Geary Danley,” the name that was erroneously identified by some as the shooter who killed more than 58 people in Las Vegas Sunday night, Google would present you with not one, but two 4chan threads.

Google screenshot Ryan Broderick

The 4chan threads didn’t show up in the normal search results, but in a highlighted box at the top of the page titled “Top stories.” Those 4chan threads were obviously filled with bizarre conspiracy theories about the shooter’s political views.

The company tried to downplay the event in a statement, saying the reason 4chan appeared at the very top of its search results, highlighted with a photo and set aside in a box, was the fault of an algorithm. “Unfortunately, early this morning we were briefly serving an inaccurate 4chan website in our Search results for a small number of queries,” Google said. “Within hours, the 4chan story was algorithmically replaced by relevant results. This should not have appeared for any queries, and we’ll continue to make algorithmic improvements to prevent this from happening in the future.”

In an email, Google explained the algorithm’s logic. In this case, the algorithm weighted “freshness” too heavily over “authoritativeness.” There were not many results for the name, and therefore the algorithm lowered its standards for its top stories module, which includes content from both news sites and around the web (the 4chan result specifically came from the web, and did not appear in Google News).

Blaming the algorithm has gotten pretty common. In April, the popular photo filter app Faceapp released a feature that was supposed to make users look “hot” but blatantly gave them white features like lighter skin and rounded eyes. The company behind the app called it “an unfortunate side-effect” of the algorithm and “not intended behavior.” After ProPublica reported that Facebook allowed advertisers to target “Jew haters,” the New York Times chalked it up to a “faulty algorithm.” In all three cases — the 4chan Google result, the racist Faceapp filter, and the Jew hater ad targeting — the algorithm was not faulty. A truly faulty algorithm would be like a computer program that does not compile or catches itself in an infinite loop. These algorithms are executing; they are doing what they were designed to do. The problem is that they are not designed to exclude misinformation or account for bias.

If I were designing an algorithm that was going to scrape the web and highlight stories at the top of Google, I might blacklist some sites to make sure it’s not littered with bullshit. And the first site I would probably pick would be 4chan, an entirely anonymous, user-generated, virtually unmoderated forum with a track record of astroturfing and spreading hate. The problem is not that an inaccurate 4chan post appeared in the top stories module, as Google’s statement says, it’s that a 4chan post even appeared at all.

The algorithm merely executes.

Meanwhile, the company’s statement cast responsibility on an algorithm as if it were an autonomous force. And while algorithms like the ones that govern Google’s search engine have gotten sophisticated and complicated, Google still has full and complete control over them.

This shortcoming is often on display at Google and elsewhere when there is a traumatic breaking news event. Over on Facebook’s “trending” section, algorithmically compiled stories for the shooting includes an article from a Russian propaganda outlet, Sputnik, incorrectly saying that the FBI had connected the shooter to ISIS (in fact, the FBI said the opposite). Facebook may not have intended for its algorithm to be surfacing false information from Russian state-owned news outlets, but that doesn’t mean it is the algorithm’s fault. The algorithm merely executes. No one other than Facebook decides what it does.

Right now, FB's trending topic page for the Las Vegas shooting features two (2) posts from a Russian propaganda outlet. pic.twitter.com/jDR1V0zzPy
— Kevin Roose (@kevinroose) October 2, 2017

The only reasonable conclusion at this point is that tech companies like Google and Facebook do not care about fixing this. Based on Google’s statements it does not appear that the company plans to prevent 4chan from popping up in its top stories module in the future. Instead it defers to the vagaries of its algorithms, as if doing anything proactive would be interfering with their sacred work. “There are trillions of searches on Google every year. In fact, 15 percent of searches we see every day are new. Before the 4chan story broke, there wasn’t much surfacing about [geary danley], and so we weren’t showing a Top Stories section for this set of queries. So when the fresh 4chan story broke, it triggered Top Stories which unfortunately led to this inaccurate result,” the company said in an email. The wording from Google here is strange, as 4chan has no news stories, only threads populated with the images and musings of 4chan users.

The passive language used here is all the more disingenuous given that Google made a conscious choice in recent years to start endorsing certain search results over others. Traditionally, the company acted like a reference; you asked for resources, it provided them to you and let you work out answers yourself. If I searched for “flat earth” on Google, I will immediately find many communities and information for flat earth truthers. That is Search working as intended, and there is an implicit understanding by people who use it that some of the results may not be credible. But when Google decides to highlight stories at the top of its page, it is attaching an aura of credibility to them. Special categories including the Top stories module as well as the featured snippets and “Knowledge Graph” excerpts attempt to save you time by directing you to a more curated result, or provide a direct answer to questions without the need to dive into search results at all. When these curated answers are wrong, Google often points to low search volume, which means too little data for the algorithm to come up with a good result.

Making sure 4chan is not included in this module seems like the lowest possible bar you could set for Google, and yet the company failed to clear it. (Occasionally, Google even doubles down when asked why it is pushing bad information in front of users: When I asked Google in March why its “Top stories” module was serving an article from Breitbart written by a climate change denier, the company told me the feature was working as intended.)

It’s not about the algorithm. It’s not about what the algorithm was supposed to do, except that it went off and did a bad thing instead. Google’s business lives and dies by these things we call algorithms; getting this stuff right is its one job.

Wrong Answer

Google’s featured snippets are worse than fake news

The highlighted answers given prime placement over search results are often shockingly bad.

The algorithm is innocent

Google and Facebook deflect responsibility onto algorithms, as if they don't control their own code.

Consider the source

The Outline