The Wayback Machine’s archives could be deleted with the click of a button

Paris Martineau May—23—2018 04:22PM EST

The internet is forever, except when it’s not. Sure, that one terrible tweet you made back in 2015 will probably outlive us all in the form of screenshots, memes, and increasingly-terrible dunks, but most things online will quickly fade into the ether. Usually this is a natural occurrence, but occasionally it’s forced. Since the internet is far too expansive and ever-changing for any one person to keep track of, we generally just don’t, instead choosing to outsource our collective memory to the fleet of helpful web-crawlers and archival services that already troll the web’s depths. But even that system, it turns out, has a shaky foundation.

The Wayback Machine is a 279-billion strong (and growing) collection of preserved web pages maintained by the non-profit Internet Archive. Far more than just a digital archive of blogs past, the Wayback Machine is one of the few remaining means by which users can reconstruct the way the internet once was, and hold online content creators accountable for their past actions. If a particular version of a website has been captured by the Wayback Machine, no amount of updates or changes to the page can overwrite the existence of the first version. No one can sneakily edit the content of a snapshotted article, post, message, or what have you, and then turn around and claim that it’d actually been that way the whole time. The whole point of the Wayback Machine is supposed to be that it captures things as they are, like they are frozen in amber, a true lawful neutral through and through.

However, as of late, it’s become increasingly apparent that this whole system has a gaping flaw: The model relies upon the good graces (or, perhaps more realistically, the ignorance) of those managing the sites that get archived, who can technically force the service to nuke the entirety of their site’s archives with the click of a button. As reported by Motherboard, certain pages have vanished from the site’s digital archives after gaining mainstream attention:

The company in question is FlexiSpy, a Thailand-based firm which offers desktop and mobile malware. The spyware can intercept phone calls, remotely turn on a device’s microphone and camera, steal emails and social media messages, as well as track a target’s GPS location. Previously, pages from FlexiSpy’s website saved to the Wayback Machine showed a customer survey, with over 50 percent of respondents saying they were interested in a spy phone product because they believe their partner may be cheating. That particular graphic was mentioned in a recent New York Times piece on the consumer spyware market…

Now, those pages are no longer on the Wayback Machine. Instead, when trying to view seemingly any page from FlexiSpy’s domain on the archiving service, the page reads “This URL has been excluded from the Wayback Machine.”

A site can stop the Wayback Machine’s webcrawlers from archiving its pages by including a “robots.txt” text file, which sends a signal to bots of all kinds to skip over it. However, these sorts of exclusions are typically indicated by robot tag or specific error message. The exclusion message uncovered by Motherboard is noticeably different, leaving the exact reason why FlexiSpy’s site was purged from the Wayback Machine shrouded in mystery.

A similarly strange series of events unfolded only a month prior. In April, Mediaite uncovered a number of caustically homophobic posts which appeared to be written by Joy Ann Reid on her old blog The Reid Report using screenshots from Wayback Machine. Reid denied ever having written them, and instead insisted that their existence had to be the work of “hackers” who had tampered with the Wayback Machine results (a claim with which the Wayback Machine itself has denied). As ridiculous as it may sound, it’s pretty much impossible to independently confirm or deny, as a robots.txt request was placed on The Reid Report blog back in December, which automatically triggered the exclusion of the entire site from the Wayback Machine archives.

It’s difficult to reckon with the (occasionally understandably) spotty nature of the Wayback Machine when it is one of the few remaining guardians of digital history. Other archival services exist, sure, but not on this scale, and the current fake news crisis raises the record keeping stakes even higher. High-profile removals like these are a sobering reminder of just how fragile the internet’s collective memory really is. If someone wants to wipe their digital history from the records, they can. And there’s really nothing we can do to get it back.

The internet’s collective memory has no backup

The Wayback Machine depends on good will and ignorance.

Bye bye

The Outline