rants.org » PRISM: The Problem with Collect-Then-Select.

[Note: This post now uses the phrase “collect-then-select”, instead of “collect-then-analyze”, which wasn’t quite as accurate. Other than that, and adding the references at the end, I’ve made no changes. There is a redirection in place from the old URL.]

One notion that keeps surfacing in the ongoing PRISM ^[1] leak is that intelligence services have started collecting vast amounts of data just to store for potential later use under a specific warrant. In other words, they want to have it all easily at hand for when they’re actually investigating someone and need to discover that person’s contacts, social network, travel patterns, consumer habits, etc.

For the actual investigation, so the claim goes, they’ll obtain warrants as needed, even if the initial collection was unwarranted — in other words, the collection phase can skate by without a warrant, because even though they have the data they haven’t actually looked at it yet, so no one’s rights are being violated. Then later when they do look at it, they make sure they have a warrant.

This sounds sane, or at least like a good-faith attempt to abide by some kind of legal framework while still getting the job done… until you think about it:

A low-level systems administrator just leaked thousands of top-secret documents. How can they guarantee that your data is safe, even if it’s supposedly just being stored and not analyzed?

This point is understandably hard for intelligence services to acknowledge. No one wants to think about their system’s failure modes. But if you’re collecting and storing private data about millions of citizens, failure modes become not merely important, but a dominant consideration.

Legal protections are designed with failure modes in mind. We cannot guarantee that our systems operate as designed; we can at best hope. This is why “collect then select” is a problem. It’s not because the data is hurting anyone by sitting idly in a storage facility, unexamined by humans or machines. It’s because you can’t be sure it’s really idle. If a conscience-stricken 29 year old can leak thousands of top-secret documents to a journalist, a more mercenary employee — or perhaps just one whose family is being threatened by some very interested party — can access your data and make it available to someone else. This risk is inherent in the centralized collection and storage of the data. By collecting it, the intelligence services have created another route of vulnerability for private information about you. I’m sure they’re doing their best to protect it, but in the long run, their best probably won’t be enough.

Anyway, as Moxie Marlinspike eloquently argues, we should all have something to hide ^[2].

References:

I’ve seen the “collect-then-select” notion described in many places. The three I was able to dig up after the fact are all from the New York Times:

Disclosures on N.S.A. Surveillance Put Awkward Light on Previous Denials ^[3]:

“Right now we have a situation where the executive branch is getting a billion records a day, and we’re told they will not query that data except pursuant to very clear standards,” Mr. Sherman said. “But we don’t have the courts making sure that those standards are always followed.”

N.S.A. Chief Says Phone Record Logs Halted Terror Threats ^[4]:

Analysts can look at the domestic calling data only if there is a reason to suspect it is “actually related to Al Qaeda or to Iran,” she said, adding: “The vast majority of the records in the database are never accessed and are deleted after a period of five years. To look at or use the content of a call, a court warrant must be obtained.”

ACLU Files Lawsuit Seeking to Stop the Collection of Domestic Phone Logs ^[5]:

Timothy Edgar, a former civil liberties official on intelligence matters in the Bush and Obama administrations who worked on building safeguards into the phone log program, said the notion underlying the limits was that people’s privacy is not invaded by having their records collected, but only when a human examines them.

That same article goes on to make another important point about why collect-then-select is problematic:

Moreover, while use of the database is now limited to terrorism, history has shown that new government powers granted for one purpose often end up applied to others. An expanded search warrant authority justified by the Sept. 11 attacks, for example, was used far more often in routine investigations ^[6] like suspected drug, fraud and tax offenses.