In January, the Central Intelligence Agency made millions of pages of documents publicly accessible through the internet.
These documents contained details of UFO investigations, the files of Henry Kissinger, and much more. They were previously unclassified, but they were incredibly hard to access. Interested parties had to visit an understaffed library in suburban Maryland and submit to being videotaped in order to use CREST, the CIA Records Search Tool, to crawl through 25 years of electronic records. It took years of effort by information activists to get them online.
The records were first requested under the Freedom of Information Act in 2011. The CIA rejected that request, claiming — incorrectly — that the records were classified. Then in June 2014, the agency accepted the request but claimed it would take 28 years to get it all onto 1,200 CDs. In 2015, FOIA publisher Muckrock sued again for the release of the records. After some legal prodding, the timeline was reduced to six years and a $108,000 bill.
In February 2016, Michael Best, a writer and former software engineer based in D.C., started a Kickstarter campaign to manually extract the data by printing it out and re-scanning it. He raised about $15,000, more than his $10,000 goal, and got to work with a MacBook Air and Fujitsu scanner. But in late October, the CIA suddenly reversed course and put all the records into its Electronic Reading room with the note that the “CIA recognized that such visits were inconvenient and presented an obstacle to many researchers.” The whole saga, and the CIA’s contradictory statements, are explained in this excellent Muckrock blog post.
Now that CREST is online, Best is shifting his focus. The CIA tool has minimal search filters and incomplete optical character recognition, he said, which makes it difficult to get relevant documents by searching keywords.
In order to make the documents easier to search, Best and a team from Data.World, a social media company that connects people interested in open data, are downloading them and adding metadata so that the info can be meticulously indexed. This more searchable version of the database will be hosted by the Internet Archive.
Best hopes that, because they are copied to the Internet Archive, the documents and their texts will appear on simple Google searches, so that anyone can organically find and get access to the information. The copying also guards against any self-censorship the CIA may attempt to do after the fact.
The release of these documents has already borne some really weird fruit, including a recipe for German disappearing ink, details of Project Stargate, the effort to use psychic powers to fight against the Soviets, and of course, alleged UFO photos.
As of this writing, Best and the team have analysed and indexed 688,000 out of 930,000 documents located on the CIA’s website, with the CREST archive accounting for about 775,000 of those.
The release of these documents has already borne some really weird fruit.
“The public release of the CREST database is obviously a victory for the public, and will hopefully encourage other agencies to do similar things. Unfortunately, it’s not the end of the road,” he said.
Best sees the release of hidden government documents as a way to reveal the truth of national and international affairs. “Letting everyone access the raw documents makes it easier to kill baseless conspiracy theories. People can check the raw documents and see what they actually say,” he said. “What makes it into government reports isn’t always the whole truth, or even the truth at all.”