Hacker News new | past | comments | ask | show | jobs | submit login
NSA stores metadata of millions of web users for up to a year, secret files show (theguardian.com)
211 points by simonbrown on Sept 30, 2013 | hide | past | favorite | 46 comments



> "The Obama administration has repeatedly stated that the NSA keeps only the content of messages and communications of people it is intentionally targeting – but internal documents reveal the agency retains vast amounts of metadata."

The administration and NSA can't defend themselves because they don't know how much more evidence Snowden took with him to reveal their consistent lies. Or who else might release other evidence of other illegal acts, or behavior that would lead to losing funding, votes, etc.

Whistleblowers seem now the strongest defense we have against overreaching and unaccountable centralized power violating the Constitution. Other people in Snowden's position must read Hacker News.

If you're out there, think about acting on your conscience as Snowden and others did. Your position is stronger than ever. You'll have support from the EFF, ACLU, Guardian, and most of this community.


I would probably reach out to the EFF first as a sanity check before taking such a step.


Does the NSA or another clandestine organization monitor the EFF's communications to watch for exactly this?


If they do then it should be expressly illegal to even accidentally include communication fr USG watchdog groups. I know it's already illegal, since they technically need warrants, but seeing as that hasn't stopped them from doing so anyway under absurd technicalities, I seems that we need a black list (never ever monitor these people and orgs) to match the white list (people of interest we are actively monitoring). Everything in between should continue to be covered under current warrant requirements. Override the black list should require both a warrant and the signature of the president of the US and a representative or senator from the state where that person or organization is domiciled.


"If you're out there, think about acting on your conscience as Snowden and others did. Your position is stronger than ever. You'll have support from the EFF, ACLU, Guardian, and most of this community."

I think that's an awful lot to ask of somebody, considering what happened to Snowden: he was charged with some very serious crimes and will probably have to remain in exile indefinitely. If the government ever catches him, or if the Russians decide they don't want him around anymore, he may have to spend the rest of his life in prison. The resources of the EFF, ACLU and The Guardian are tiny compared to what the U.S. government can throw at him (the government has essentially unlimited resources). Someone who is as brave and selfless as Snowden is indeed rare, and I think it's unlikely that we'll see his actions repeated in the foreseeable future.


I'm pretty sure I remember that one of the original leaks showed that they keep metadata on Americans roughly indefinitely, and that they can keep full data (i.e. not metadata) for up to 7 days in order to determine whether or not the individual or their actions can be considered a threat. The reason I remember this is that I immediately thought that building a google-like index for each individual, and storing that index indefinitely, would be equivalent to what was stated in the leak. In fact, given all the subsequent leaks, I'm pretty sure that's exactly what they are doing.


The facts in the quote above are not inconsistent. Metadata != content. People so easily confused are easily manipulated.


The only way these abuses are going to stop is if the government employees and contractors who ordered and/or architected these systems are prosecuted and convicted.

When is that going to happen, Eric Holder? Barack Obama?

Unless it does, good luck, Democrats, hiring the best data guys for next election.


Isn't step one President Obama ordering these programs to be stopped? It's hard to see how he's going to prosecute anyone for carrying out programs his own administration either started or allowed to continue.


> Isn't step one President Obama ordering these programs > to be stopped? It's hard to see how he's going to > prosecute anyone for carrying out programs his own > administration either started or allowed to continue.

Yeah, President Obama is the one driving all of this. How can anyone even imagine that he would stop it? And, the next president (probably Hillary) will drive it even harder. 9/11 pushed the US past a point of no return. If you read some recent US military history you can get a well integrated picture of what's really going on. The Office of the President essentially has it's own military, separate from the US Armed Forces and the US Legislature - SOCOM. The President by his hand alone can order SOCOM, in secret, to strike at any time and even to assassinate American citizens. That's an incredible power. It's not the only way the Office of the President has become more powerful in the last decade, though. The number of "signing statements" used by Obama to bypass congress is mind blowing. The only way the power of the Office of the President will be reduced in the future, and things like the NSA domestic surveillance activities stopped, is through a civil war that ends in a new constitution and completely restructured government. Does anyone really think that members of congress aren't being blackmailed with information the domestic surveillance activities produced? Look at the recent monitoring of press phones to see that the capabilities are turned against our representatives. Look to the recent insider trading investigation to see the kinds of things they'd be blackmailed over. They're powerless.


I think it's impressive that you think the figurehead has the power.

I promise you that if you think Congress is being intimidated by the intelligence community, than it's equally if not more so likely that the president himself is as well.

Here's the likely scenario: the President has his first real sitdown with the intelligence community after inauguration, and they hand him a manilla folder. Inside the folder is the file they've kept on that new president for decades. Every phone call, email, text, app-download or piece of mail they ever sent is in there. It's all there, every blackmail piece, every high school sweetheart, every illicit lover, everything. The blackmail isn't overt, but the nature of the asymmetric relationship is established and the truism that information is power is reaffirmed.

I simply cannot believe that the figurehead-of-the-decade is the puppeteer and not the puppet.


Not just blackmail material on the president, but on every family member, friend, colleague, and political ally. There's no reason not to suspect that this has been the norm for decades.


> The only way the power of the Office of the President will be reduced in the future, and things like the NSA domestic surveillance activities stopped, is through a civil war that ends in a new constitution and completely restructured government.

I'm inclined to agree. My knowledge of political history is rough at best, but I have a hard time recalling a large-scale reduction of a government's power through legislation and prosecution alone.


It's possible that laws have to be passed to prohibit it, in which case ex-post facto prevents the current and past practices to be prosecuted.


Prosecute them for what.


er, isn't it good data guys who are creating the technology responsible for these abuses?


The only way these abuses are going to stop is if the government employees and contractors who ordered and/or architected these systems are prosecuted and convicted.

Wink + nod and nothing will happen even if they did something illegal. But, all these shenanigans were OK's by Obama's DOJ and handpicked rubber-stamp judges. So technically NSA et al did nothing illegal, judges approved all of it.


I was re-watching the original "Bourne series". I noticed that in episode 2 of the trilogy, a journalist reveals information about a secret CIA program etc. It's all fiction but the journalist works for a UK-based newspaper called "The Guardian". After being disappointed by The Economist once more, because of it's unethical and insane stance towards Syria, I consider the Guardian the one and only trustworthy newspaper in the Euro-Anglo-Saxon world (or might say 'western') world. I like also the New Yorker and The Atlantic but they are not newspaper in the strict sense.


I reject calling it metadata. It's actual data.


Correct. Those should be simply referred as "records of who electronically contacted whom and what, including when and how." Then it doesn't sound so unimportant.

"Metadata" is a real-life Newspeak word, just like Orwell writes, chosen so that "a thought diverging should be literally unthinkable, at least so far as thought is dependent on words."

http://www.newspeakdictionary.com/ns-prin.html


"Metadata" as commonly used in this context would conceivably include the URL of a web request, right? So that would include all of your search history and all web pages you've ever visited, for instance.


Yes, we should say "records of who electronically contacted whom and what, including when, how and from where, including the search history and visited web pages" and never use "metadata" for that. It almost scary to me to type it in that exact form.


Certainly there's a difference, in say, NetFlow data containing just IPs and protocols/ports and the full contents of a message? Similarly, storing the request line of an HTTP request is far different than the entire request header and response. Same for email headers, EXIF data, etc.

I think it's rather reasonable for someone collecting information to delineate and make it clear they aren't keeping the full contents. Sure, it's also part of the spin, but it's hardly making up a new term.


It is different from "everything" but it's still "the complete records of who electronically contacted whom and what, including when, how and from where, including the search words and visited web pages."


That's very true. The content they're collecting in their databases (e.g., what URLs were accessed by person X) would be the actual data. The metadata is data about that data, e.g., who authorized its collection, or what other agencies it was shared with, or the schema that describes the database fields.

https://en.wikipedia.org/wiki/Metadata


If you can make it to DC on October 26th (anniversary of the Patriot Act), join the ACLU, EFF and 100+ other groups at the Rally Against Mass Surveillance: https://rally.stopwatching.us

If you can't make it, help us raise money for buses and transportation to bring as many people to the rally as possible: http://igg.me/at/stopwatchingus


"The net effect is that NSA analysts look at 0.00004% of the world's traffic in conducting their mission – that's less than one part in a million."

Weird how direct quotes from leaked documents sound less bad than the articles about them.


Did you read the article?

"However, critics were skeptical of the reassurances, because large quantities of internet data is represented by music and video sharing, or large file transfers – content which is easy to identify and dismiss without entering it into systems. Therefore, the NSA could be picking up a much larger percentage of internet traffic that contains communications and browsing activity.

Journalism professor and internet commentator Jeff Jarvis noted: "[By] very rough, beer-soaked-napkin numbers, the NSA's 1.6% of net traffic would be half of the communication on the net. That's one helluva lot of 'touching'."


I read the article and rejected its assertions as speculative horseshit, yes. What of it? The original document cites a figure (1862 petabytes) and two percentages (1.6% and 0.00004%) of that figure. The math is not hard.


The assertions in the article are not "speculative horseshit". In 2011, Netflix+Youtube+BitTorrent alone were over 50% of internet traffic [1], and all three have likely grown since then. The NSA has no interest in the vast majority of this traffic.

The exact numbers are speculative because nobody has data accurate enough to determine them with any real certainty, but the point is that the NSA is misleading the public about the extent of its surveilance by orders of magnitude.

[1] http://techcrunch.com/2011/05/17/netflix-largest-internet-tr...


nobody has data accurate enough to determine them with any real certainty

You mean, other than, y'know...a classified document by the NSA describing what proportion of estimated total internet traffic they look at?


Total internet traffic is not the total volume of communications. What you're saying is like saying that your county's prosecutor has a 0.25% conviction rate because 1% of residents are charged with crimes and 25% are found guilty. Measurements are useless if you measure the wrong thing.


what makes you think the number mentioned in the classified document is, or intended to be, accurate?


The issue here is that only a tiny percentage of the information transmitted is of interest to someone doing surveilance.

No one cares about the data of the YouTube videos you watch, the content and images and JavaScript and CSS and what-have-you of the web pages you visit, the content of the files you download etc. Examples of what would be of interest, and only comprises a tiny portion of all web traffic, is

* All URLs you request (will include your search history)

* The contents and headers (and maybe attachments) of your e-mails

* Your online IM conversations

* _Concievably_ the video from your video calls, but this is too much data to store

* Arguably any data transmitted through a text form online

Again, the interesting data is only a tiny portion of all web traffic, because the rest is transmitted repeatedly and can be recreated on demand. Much of the interesting stuff is, in fact, in the "metadata".


The issue here is that only a tiny percentage of the information transmitted is of interest to someone doing surveilance.

Yes, exactly! And the document in question even states what that percentage is.


I hate to point out the obvious but valuable NSA-type traffic is text and voice...I doubt they care 'bout youtube and netflix streaming traffic. I'm sure in 'absolute terms' that statement is true but it is a lie of omission.

I'd be willing to bet 0.00004% is probably close to 100% of all the metadata that crosses a router used by a US-based ISP or an international fiber line owned by a US company. :P


> but valuable NSA-type traffic is text and voice...I doubt they care 'bout youtube and netflix streaming traffic.

You don't think they monitor YouTube for terrorist video?

You don't think they monitor other video sharing sites for video of terrorist organisations releasing propaganda? That seems like useful intelligence, and they'd be foolish to ignore it.


What if they just ran their calculations something like this?

Suppose a 10MB youtube video is watched 100x. That's 1GB traffic. The NSA could record the video (10MB), then simply record who watched it (maybe a few dozen MB in http headers, etc -- call it 40MB)

They could report they only monitored 50MB of 1,000 MB, or 5%.

If the video was watched 1,000x, they're "monitoring" around .5%. 10k views, they're monitoring .05%.

Pretty easy to see how they can report a true number that also doesn't reveal the extent of their wiretapping.


Only a year, are we sure about that?


Many of the documents Edward Snowden released are some years old. Only one year seems strange in light of the new storage facilities being built. The capacity there will almost certainly be orders of magnitude greater than that. I found William Binney's testimony at this event highly enlightening: http://youtu.be/qBp-1Br_OEs?t=53m24s


In his MIT talk he mentioned that the facility will be able to hold up to 100 years worth of information. I am not sure if it was for Americans or just the sum total of everyone.


He mentions that (conservative) estimate in the talk linked above as well (the question starts at 1:25:16) and apparently that's for the World's communications.


Maybe it's only one year when there is no chance that the data becomes useful in the future... which would mean that it isn't one year.


    "...but is permitted to keep US communications where it 
    is not technically possible to remove them..."
That's a pretty big loophole that incents them to design future systems where it isn't "possible" to delete the data, where possible is likely to be interpreted as a gargantuan task or where data that should be deleted is so intertwined with data that isn't that they say they can't delete data without corrupting data they aren't required to delete. Many if not most NoSQL data stores are likely to be argued as impossible to delete because of the denormalization of data. Requiring them to only develop future systems where data is 100% guaranteed deleteable should be a legal requirement. (Fortunately this same requirement would also make indiscriminate dragnet big data analysis prohibitively expensive and slow)


Even hacker news is kind of over these leaks, this worries me immensely. We live in a kind of digital prison right now, let's hope there is a way out.


I don't think it's that people are "over" them, it's just that we already know.

Everything you do online, on your phone, with your credit card, etc. is [possibly] being stored and analyzed.

The shock has warn off when we hear more.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: