It's not safe to assume the NSA doesn't log DDG searches. Look at the PRISM logo - it's a beam splitter. Read the slide, look at the "Upstream" portion.
They're logging all your URLs and headers. How much are you willing to bet they can't decrypt https? I dont understand all the hubbub _is focused solely_ on direct server access (the bottom half of the slide), when "Upstream" access is just as big a concern.
EDIT: rephrased my concern about direct vs upstream
"How much are you willing to bet they can't decrypt https?"
I'd bet quite a bit, though not "my life", that they do not have a generalized "read everything" ability for all forms of SSL. They may have what cryptographers would call "a crack", but that's a low bar, and doesn't prove they have a practical attack.
However, DDG is currently using 128-bit RC4, which is very weak. [1] I wouldn't care to bet anything that the NSA doesn't have an RC4 cipher crack that is practical to run on wide swathes of traffic.
RC4 is very popular, which I believe is because some people claimed it was a defense against the BEAST attack. I researched this for work, and I couldn't find anyone whom I trusted saying that was a good mitigation. The people I trusted merely observed that RC4 was not vulnerable, but never said you should switch to it. Only secondary sources ever suggested that. My conclusion was that there was a reason for the primary sources never suggesting that; in response to a theoretical break of the rest of SSL, the correct move was not to move to a solution that had much more practical attacks already known than what BEAST demonstrated. But now it's even sillier; BEAST has been either entirely or almost entirely mitigated in browsers (there's no server-side defense against BEAST, but there's a client-side one you can use, and browsers now have it). As far as I can tell, RC4 should be abandoned and we should resume using stronger ciphers for SSL. Anyone still concerned about BEAST should update their browser.
Not without someone noticing. Some sites have pinned certs in Chrome, which would stop this, and even without that you would expect some knowledgeable techie at Facebook or Github or something to be using their home laptop and say, "Wait a sec, this isn't my company's public cert!"
Not having seen any blog posts screaming, "OMG, my site is being hijacked wholesale," I can only assume that the NSA isn't doing this (or has managed to squelch by legal order every single person privy to the real cert at MITM'ed sites, which is absurd and would beg the question, why not obtain the private key from these people in a similar way?).
Do they need to MITM? If they have a copy of the private key, can't they just use it to decrypt the data .. even old data for which they've only just acquired the key?
Having the root CA's private key doesn't give them access to the end entity's private keys. When you ask a CA for a cert, you only provide them with your public key (in the form of a CSR) for them to sign. The CSR does not contain the private key.
True, but they would have to do this for every single web server they would want to collect information from. Not impossible, but it'd be a lot of work.
They have to set up impersonating SSL certs for every connection they want to MITM. While there'd clearly be value in them inserting or subverting network hops between "the great unwashed" and gmail/facebook/aim servers, there's very little chance the NSA have access to hops along the path between my (Australian) adsl connection and my vps (located in Australia).
For internal (or routed through) US traffic - while Verizon's lack of interest in protecting customer data is probably shared by major backbone providers - I _strongly_ doubt even the NSA has enough gear hanging off backbones to actively MITM any significant proportion of the firehose that'd represent. Even the AT&T "secret room" probably doesn't house enough gear to be able to create fake(signed)certs and MITM every SSL connection for millions or more simultaneous users browsing every https site under the sun.
Having said that, I'd bet good money the _do_ target specific SSL traffic - has anyone checked the SSL connections to TOR entry and exit points recently? That'd be one spectacularly obvious path to try "speculative MITM attacks".
Heh, I was wondering if that might get noticed. :)
On the one hand, don't take my word for it; I also have not found anyone I trust who has verified my explanation directly. On the other hand, I did do my best to read the primary sources very carefully, both for what they say and what they don't say, and I was confident enough to implement more conventionally strong ciphers on the services I'm responsible for, so my money is where my metaphorical mouth is.
If they can decrypt some SSL (maybe low bit) it is likely a very intense process that requires vast hardware resources, so even if they can do it, it is not likely it is being done for all traffic, but could be applied to some traffic.
Doesn't necessarily mean they don't have the ability to decrypt, it just means that they have a process in place so in case shit ever hit the fan (like it just did) they can come back and say "what's the big deal? we have a process in place"
That's not universally true - there's a remarkable amount of TLS/SSL encrypted email-in-transit, either via STARTTLS ESMTP commands or SSL over port 465 (and 993/995 for IPAM and POP3).
I don't think there's a way to guarantee your mail always travels over TLS/SSL secured connections, but I suspect more of it does than you think.
There's a straightforward way to make sure your email is always encrypted in transit: encrypt it before you send. No promises about making sure your email can always be read by the recipient, though...
And here's the problem. Email needs to be able to be read by the recipient, so until a significant portion of email recipients can handle encrypted mail - the NSA doesn't need to attack my encrypted email storage, because enough of my correspondence ends up in cleartext in gmail/hotmail/yahoo et al.
This is a hard one to solve. GPGmail seems to get broken with every Mac Mail.app release. Vast numbers of people rely on webmail - which'd need server-side or in-browser GPG decryption. My Mom's not going to use command like gpg tools. How the hell do we bootstrap our way up to ubiquitous encrypted email?
Hm, should be easy enough to have some browser plugin that lets you select a text/data field and recipient list field and encrypt it with the appropriate key; and to do something similar for recognition and decryption of fields.
I think there are complications though - you need to be very sure that rogue javascript can't dig around in your plugin and extract my private key. I'm not sure how securely sandboxed plugins can be.
What's the normal procedure for making a call whose output depends on a file that must be kept secret? Is there a typical OS API pattern that's seen in the various programs like ssh, scp, and so on?
I think the one of the problems is that software like GPG and OpenSSL go to a lot of trouble to make sure private keys don't hang around in memory for any longer than absolutely required - minimising the risk of having the OS preempt the executing code and write the key out to swap (or having malicious code slurp it up out of ram). The bare-metal hoop-jumping required to get that right might not be possible in the context of a browser plugin.
More concretely, they don't have to rely on decrypting https. An approved NSA or FISA order to DDG will give the government an en clair "wiretap" on your DDG searches for up to a year. They may not be able to get the searches you did before the wiretap began, but that's all.
There's no question that they're monitoring upstream traffic. In fact they may still be doing the old ECHELON trick in which the US eavesdrops on non-Americans, the rest of the world spies on Americans (among others) - and then everyone swaps the data received.
But in the light of the PRISM documents it's even more likely than it was before that the NSA doesn't have the ability to decrypt HTTPS, or at the minimum that the US considers it too important to risk giving it away by using it on routine Top Secret signals intelligence. (And/or maybe too resource-intensive to use for that.) The strongest evidence for this is that we haven't heard anything about such a capacity yet from Snowden, Greenwald et al., who all have the full PRISM deck (along with other documents) in their possession and would surely tell us about it if they knew of it. So either 1) the PRISM slides do mention the ability to decrypt SSL or SSH streams but Snowden and the journalists haven't picked up on it (not impossible given the apparent incompetence they displayed over "direct access"), 2) it's too sensitive to mention in a self-aggrandising Top Secret overview of upstream and "direct collection" Internet signals intelligence, which probably means it's not in use (or at least not in regular use) for upstream collection or 3) they really don't have it.
A supporting reason to think that they don't have it, or hardly ever use it, is the apparent emphasis on "direct collection" in the PowerPoint. Why go to the hassle of dancing the frenemy minuet with Google and other fairly-anti-surveillance Silicon Valley firms when you can just get what you want from upstream collection at the apparently more-accommodating telcos? This isn't conclusive because even if you could understand all the traffic into and out of someone's Facebook account you'd still like to be able to see the internal state of the account, in particular so that you'd know what they'd been doing before the upstream surveillance began. But I think it's at least as likely that the whole new focus on direct collection is a workaround for the fact that, thanks to SSL and SSH, upstream collection just isn't what it used to be back in the days of ECHELON.
As the slide said, You Should Use Both: direct collection to give you access to US-company servers, probably bypassing the HTTPS problem, and upstream access to give you data, probably only unencrypted data (email!), that passes through the US without going to a US-company server.
(If you want an exotic alternative theory, you could speculate that the PRISM document is a fake, a limited hangout http://en.wikipedia.org/wiki/Limited_hangout by the US spooks, maybe precisely to direct attention away from their ability to decrypt HTTPS streams. But this now seems unlikely, for example because DNI Clapper would surely have to have approved a managed release of a set of documents that both gave away the Verizon metadata surveillance and so also implicated him in perjury.)
I can't remember which interview it was, if on Democracy Now, or his MIT lecture video, but Bill Binney stated that the NSA in fact does decrypt HTTPS.
If Bill Binney said that, and if he is right, I'd assume the most likely explanation is that NSA can push over some low-security SSL connections of the type jerf describes above https://news.ycombinator.com/item?id=5877362 , but has to rely on "direct access" to get around most or all high-quality (but still widely-used) SSL encryption. (Or, again, that it also has the capacity to break high-grade HTTPS connections, but it's holding that back for really important occasions.)
With the history of the gov/NSA being effective crypto gods - my money is they are ahead of decrypting SSL and HTTPS and even of it is not real-time, they store streams from target end points regularly for slower offline decrypt.
I wonder why so many people believe this. Many simple and weak ciphers have been around for decades and - although they are considered to be very insecure by cryptographers - certainly can't be decrypted in real-time (!) on this scale (!).
This has been talked about many times now. All compromising a CA lets them do is to create believable certificates to be able to man in the middle connections, but they can't be doing that for a large number of connections because it's resource intensive and detectable.
They still don't have the private keys of the sites if they break into the CA.
I find this quite plausible, with or without the knowledge of Page, Zuckerberg et al. the NSA might very well have the private keys of these companies. I would not be surprised if the CEO's of these companies choose to be ignorant of the NSA's methods to not have to lie to the public, shareholds and Congress.
Also, given that the world's best engineers work at either high-tech companies or the NSA there will be some who have switched between these industries, giving the NSA/CIA a headstart to get any information these companies hold through old-fashioned spy-tactics.
What about Yacy? (http://yacy.net). I am not sure if the queries inside a peer network can be decrypted as easily as http requests (I am not a networking specialist though, it is just an opinion).
I ceased using Google search except as a last resort when this story broke, and I had no idea what I had been missing out on with DDG: Excellent keyboard navigation. Also, DDG's results compared to a year ago are night-and-day. It seems to listen to my keywords better than Google did too, a growing annoyance I had. If you haven't, you really should try out DDG for a week.
This impression must partially be due to how you write your queries, and the quality of the natural language parsing in DDG/Bing (your query leaning heavily on that). I searched "ruby element in array", Rails not being a programming language, and I get the Ruby Array docs with the first result (which is what I would want) and the StackOverflow result for the third. On Google, the StackOverflow item is first and the Array docs are third for the same query.
In this case you and I think that our respective search engines are providing the better result.
If you ever find yourself wanting to fall back on Google's results, just throw !sp at the front of a DDG search for StartPage's proxied Google service (or !g if you absolutely must go to Google). DDG has many, many shortcuts for directly searching StackOverflow, GitHub, Amazon, Wikipedia, etc.
This is most likely that Google has more user behavior data than DDG. If enough people use DDG and click on the StackExchange link for that query (or similar queries), DDG will be able to get that to the top.
On the other hand, did DDG just use Bing API, and only Blekko crawls the web? Or do I get my search engines mixed up?
Personally I don't, but it should be noted that there are other indexes besides what you have mentioned. That said Samuru and Procog are pretty interesting.
For programming questions, I almost always append site:stackoverflow.com to my search term... generally the anser is on SO, but I prefer the search engine's results over SO's search... on DDG and Google.
He said "I prefer the search engine's results over SO's search..."
DDG's bang notation uses SO's search rather than the search engine's results.
Note that DDG's bang notation is also redundant with Chrome and most other browsers that let you set your own search engine keywords. Chrome's is especially nice, just type the first few letters of the site's URL and hit Tab and you're in a site specific search. It works automatically for any site implementing OpenSearch (or you can add it yourself by right-clicking the search box), and you don't have to memorize any keywords, so that's two advantages over DDG right there.
That was discussed here some time ago and many people suggested explicitly omitting SO and StackExchange sites when learning new languages/technologies. The argument was that while the answers on SO generally get the job done, they rarely provide the context, depth and coverage the docs give. I happen to agree - I only hit SO when I don't know what exactly I need to search the docs for. It's absolutely great for this. For serious learning not so much.
Concise, but wrong: It keyed off of "find", so it produced a top hit of items related to that method (which means in rails: find in the database)
Now you could argue that I should have omitted "find", and there was a time where that was how search engines were used, but the fact that google got the semantics right is why it is first rate.
You still very well may be leaving a trail behind; it's just likely that such a trail won't be correlated among multiple different domains you visit (from ad tracking bugs of a single company embedded into multiple websites you browse).
I second the observation that the results have improved considerably over the last year. There are still holes where I have to resort to using the !g option and use google (non-English searches would be the most prominent example) but I've set all my default search options to DDG and haven't regretted it!
That makes two of us. While lots of people say "well Google won't miss you", I think we still have the duty to do what's right.
The USA was founded as the most free, exemplary democracy in history and what we saw in this scandal is precisely the opposite of what this Nation earned so Google could exist. Then Google betrayed us all and was in bed with the government in the most sinister way possible since 2009.
Honestly? I'd rather give someone else a chance. Then if DuckDuckGo spies on us in the future, I'll switch again.
This is great news for DuckDuckGo and I'm all for it, however, DuckDuckGo isn't completely private right? From what I see, DuckDuckGo obtains much of its data from 3rd parties, such as Bing/Microsoft and Yahoo. http://help.duckduckgo.com/customer/portal/articles/216399-s...
"While our indexes are getting bigger, we do not expect to be wholly independent from third-parties. Bing and Google each spend hundreds of millions of dollars a year crawling and indexing the deep Web. It costs so much that even big companies like Yahoo and Ask are giving up general crawling and indexing. Therefore, it seems silly to compete on crawling and, besides, we do not have the money to do so. Instead, we've focused on building a better search engine by concentrating on what we think are long-term value-adds -- having way more instant answers, way less spam, real privacy and a better overall search experience."
While we do use 3rd parties to fulfill some of our organic results we always make those calls from our machines. We never pass along IP addresses. This means that while our other sources might see your queries, they are not tied with PII (personally identifiable information).
I think you're not quite understanding how it works. They retrieve the data from e.g. Bing behind the scenes and the 3rd parties have no way to connect a query to an individual user.
That said, maybe there'd be an issue if you enter personally identifiable information as a query. But who does that?
Duckduckgo redirects all requests to HTTPS and uses only HTTPS; it is highly unlikely the NSA or anyone would be able to decrypt that traffic, unless of course they force DDG to divulge their SSL private key. Which I suppose is plausible.
"Why would anyone believe DDG isn't already penetrated?" should be read as, "Why would anyone believe that DDG isn't already penetrated and that their own computers are not already penetrated?" Your trust chain has to start somewhere.
Why do you trust any binaries you've got? Where did your first-use/bootstrapping compiler come from?
And even if you wrote your own OS and compiler from the ground up - who wrote your BIOS? Your network card firmware? Your disk controller software? Your CPU microcode?
This is why it's important to look at PRISM as a political issue and not merely a technical one, like I see a ton of people doing now. The best solution to government spying isn't to tell everyone to use Linux and DuckDuckGo, it's to change the spying itself.
Shifting use away from, as Bruce Schneier puts it, feudal architectures, both puts the Government on notice that its methods aren't appreciated, and creates a damaged class (the SAAS feudal lords: Google, Facebook, AWS, Apple, Salesforce, and others) who can petition the government to lay off the tactics as it's hurting business.
https://www.schneier.com/blog/archives/2013/06/more_on_feuda...
Hell, push this hard enough and a sufficiently feasible decentralized VOIP might become sufficiently common enough to put the WiFi carriers out of the voice business, relegated to carrying encrypted bits. They might know your handset location, your data usage, and the Tor entry point you're using, but that's it. It's something I've been giving though to.
Indeed - but the political changes (if we get them at all) will take time - time probably measured in years or political terms.
The "merely technical" solutions are going to be important in the meantime. Duckduckgo, encfs, Tarsnap, GPG, Tor, ForceSSL - things like that will (probably) help in the meantime (especially if we can help convince "regular users" to use them), as will encouraging places like DDG to implement TLS cyphers that use forward secrecy.
Or because their entire business differentiation from the beginning has been privacy.
This does not mean that it is not compromised, of course. But its why people would believe it isn't compromised. And its a much better reason than your sarcastic imitation.
I have no idea. I (as I said) was not trying to assert that DuckDuckGo is not compromised, I was asserting that the reason that people believe it to be secure is because they've differentiated on privacy since the beginning.
Mostly, I don't like seeing condescending, inaccurate statements.
There is nothing inaccurate about claiming that people believe that DDG is protecting their privacy because of how the website presents itself and the claims made by the company. That is exactly why people believed (and many continue to believe) that Hushmail is protecting their privacy. The way companies advertise themselves is not necessarily reflective of reality.
Exactly, I think what we might be needing is a non-US (perhaps Iceland or NZ) based alternative. The moment, duckduckgo becomes relevant enough (i.e significant traffic) then is game over IMO.
Ask Kim Dotcom about how well NZ's liberal laws worked out in practice when the US copyright police showed up asking the local cops to wildly overstep their legal authority…
I mean _seriously?_ Helicopters, silenced assault rifles, security dogs, and 72 cops - sent in against someone accused of _copyright infringement?_ And then a Hollywood showreel of the raid gets produced and publicised?
I _like_ New Zealand, they talk the talk, but when it comes to walking the walk - they're lead around by the nose to do whatever the US wants.
the issue is with HTTPS. To read that traffic NSA collects private keys from limited number of [big] companies as collecting them from all companies would be a very public affair. Thus from NSA's price/performance point of view smaller enterprises may be still be off the hook (for now).
They could get a valid key pair from a CA and MITM the connection. It could be detected if the user knows what the public key should be and compares it with what they received, but that seems pretty unlikely.
>It could be detected if the user knows what the public key should be and compares it with what they received, but that seems pretty unlikely.
there is no need to know what the public key should be - only that there are several [more than expected] different keys. Any distributed organization (including Google itself who can be fully expected to monitor which certs their users receive especially after Iran/Diginotar story) could notice it and thus identify the MITM. Thus Google must be on it. Thus no need to involve extra certs from CA though of course i'm not arguing NSA's ability to do that.
Things like PRISM makes me completely want to back out of the Google ecosystem.
Part of that would be replacing Gmail. That can be done, but what good (free) options exists for a webmail solution?
I'd also love this instant to cut gtalk (or "hangouts" which it is called now. hopeless), but Google just declared hate on XMPP, so setting up your own node will land you on your own tiny island.
The trend is clear though: Google is stuffing the exit-holes while the US government is requiring more and more of Google's data.
If you haven't started moving out yet, you better get started. And for the love of God, ditch Chrome. Support someone who supports the open web and respects your privacy.
I know, it's ridiculous. But people do want to see how many likes the story has, and cross-service comments are pretty next-wave, so you have to have widgets now for 5 social sites, 7 comment systems, global analytics, live analytics, sitewide ad, contextual ad, site cookie and maybe add two more because I haven't thought of them... that's 19 right there!
We really need something better than having these MASSIVE amounts of callouts. It's like those pictures of Internet Explorer totally taken over by toolbars, except it's a different set on every single site on the web. Bah!
It turns out that if you block all the trackers and 3rd part widgets, the website continues to function as the use desires, and nothing of value is lost.
Everyone was ok with google's tracking for ads and commercial purposes, even though it felt kinda weird, we had a feelign it wasn't evil somehow.
Now I can't even look at Google anymore, they're like a spouse that cheated on you. You knew the spouse may have been watching you, but at least they weren't fucking with someone else while you trusted them.
Anyone going to say it? DuckDuckGo searches are still really low quality. I WANT them to be a legitimate competitor in the space (in fact, I've been having similar hopes for Bing for years) but it's just not there yet.
Glad to see them succeeding, but personally the privacy of my web searches doesn't bother me - as long as they aren't being passed along with personally identifying information. I'm far more worried about emails, messaging, video, storage etc.
Can someone explain to me (or point me in the direction of something that explains) what Google and Bing store in terms of tracking when you are not logged in?
Obviously you can use VPNs or TOR to be really safe, but do you need to go that far if you want an untracked search on Google and Bing?
They have access to IP+time, your search query, and cookies for correlation to other requests. It's valuable information and Google openly documents that they are keeping and using it, along with anything else you leak to them: http://www.google.com/policies/privacy/
IP+time is enough to get your personal identity information from your ISP (physical location of the endpoint, billing information), I have no idea if Google's relationship with ISPs is good enough to buy that or if it's only available to cops.
> They have access to IP+time, your search query, and
> cookies for correlation to other requests. It's
> valuable information
Valuable for blackmail, but not really useful for anything else; the commercial value of information rapidly degrades over time. Knowing I want to buy a new fridge today is very valuable, knowing I wanted to buy one last month is nearly worthless.
> IP+time is enough to get your personal identity
> information from your ISP (physical location of the
> endpoint, billing information), I have no idea if
> Google's relationship with ISPs is good enough to
> buy that or if it's only available to cops.
Despite what the RIAA think, a user agent's IP is nowhere near accurate enough for use as identification.
I hope that ISPs do not release personal identity or billing data to arbitrary third parties. I know my ISP (sonic.net) claims they don't[1], and even privacy-insensitive companies such as AT&T have privacy policies that would forbid them from selling personal data[2].
Even if it were possible for random companies to obtain personal data from an ISP, I doubt that Google would have any interest in participating.
I love targeted advertising because it is so blatantly obvious and hilariously over-optimistic.
I search for a lot of random crap with more curiosity than intent to buy. I looked up the price for several windmills , the late 19th/early 20th century style, (~$1000, by the way). For weeks or months afterwards, I saw windmill ads on a sizable fraction of the websites I visited.
To be fair, it's far more likely that I am going to by a windmill than a random ad viewer, but the probability is still staggeringly low. There had to be a hundred other products I was more likely to buy than the windmill, that would be more valuable to show me. But no! I had viewed their product and I! Must! Be! Targeted!
I really wonder what the set of products that do well from targeted ads looks like.
It does well because the probability of you buying a windmill multiplied by profit per sale is still more valuable than something non targeted like tampons (some of the advertisers get the numbers wrong but not the ones at scale.)
Interestingly, from the CPMs I've seen re-targeted/re-marketed ads perform on par or below contextually targeted ads. No one even comes close to Google for contextually targeted ad inventory (unless you are operating in a narrow niche and you are selling inventory directly, but lots of time and money to even match them.)
>>Valuable for blackmail, but not really useful
>>for anything else; the commercial value of
>>information rapidly degrades over time.
>>Knowing I want to buy a new fridge today is very
>>valuable, knowing I wanted to buy one last month is >>nearly worthless.
I think that blackmail is already bad enough. Considering which topics somebody might want to learn about on the internet, various diseases for example.
>>I hope that ISPs do not release personal
>>identity or billing data to arbitrary third parties.
I hope that too. But this information is gathered and stored somewhere in flawed systems which are operated by humans which might decide to follow their own interests more than the interests of the customers. I know of at least one story where an employee of a search engine has been using his privileges to stalk other people.
> I hope that ISPs do not release personal identity or billing data to arbitrary third parties.
My ISP, Time Warner/RoadRunner, claims that they will do so. It's kind of ambiguous because their declaration combines several services and kinds of data covered by different laws. I think the applicable part for their cable ISP service is:
In the course of providing Time Warner Cable Services
to you, we may disclose your personally identifiable
information to [...] consumer and market research firms,
credit reporting agencies and authorized representatives
of governmental bodies.
Selling their DHCP logs and customer records to a commercial data aggregator (who could then sell it to anyone) appears to be compliant with their privacy policy.
Valuable for blackmail, but not really useful for anything else; the commercial value of information rapidly degrades over time. Knowing I want to buy a new fridge today is very valuable, knowing I wanted to buy one last month is nearly worthless.
That depends on what they can get out of the data, besides the obvious. I'm thinking of the story about Target knowing that a girl was pregnant before even her father did[1]. Even longer trends can probably be derived, regarding personality traits, income, etc. That information is worth a lot even months or years after it was captured.
"Despite what the RIAA think, a user agent's IP is nowhere near accurate enough for use as identification."
Maybe not when I'm coming from our company's NAT (700+ employees behind a single IP address) - but the number of people on my Comcast connection is limited.
The IP address from which you send your request to a search engines website may be regarded as a personally identifying information. In case that this information becomes publicly available the connection between your search terms and your IP address will be visible. In fact, this has happend in the past and there was a searchable database with the leaked information online where you could look up search terms.
I don't know what data Google and Bing are collecting, but here is one quote from the wikipedia entry on internet privacy concerning the AOL search engine:
A search engine takes all of its users and assigns each one a specific ID number. Those in control of the database often keep records of where on the Internet each member has traveled to. AOL’s system is one example. AOL has a database 21 million members deep, each with their own specific ID number. The way that AOLSearch is set up, however, allows for AOL to keep records of all the websites visited by any given member. Even though the true identity of the user isn’t known, a full profile of a member can be made just by using the information stored by AOLSearch. By keeping records of what people query through AOLSearch, the company is able to learn a great deal about them without knowing their names.
Based on the leaked PRISM presentation, they could send your search log to NSA, and NSA can identify you based on your IP address that your ISP provides.
If the exit node is compromised you’re fcked for good.*
It's not that simple, otherwise there wouldn't be any value in using an Onion architecture. Assuming you're using HTTPS, which every decent search engine supports, they either also need to create a fake but acceptable certificate for the domain, or to also control entry nodes and match the entering requests with the exit ones.
The NSA might be able to do it, but it's not just a matter of controlling an exit node.
It might be used internally for additional ranking signals. But the privacy policy states it can never be tied back to you as an individual so nothing to worry about.
From memory the main reason they do this is to allow downstream websites to determine if a user was referred to them by DuckDuckGo without the actual search term. IE you know they came from DDG but with no leakage.
I run searchcode.com (which provides a lot of the code doco and sample results) and since this was done I can now determine how much referral traffic actually comes from DDG but have no idea what you were searching for when you click through.
This irritated me, as it makes me somewhat skeptical of the "we don't track" claim.
However, I noticed that if you use their HTML version (i.e., use duckduckgo.com/html/<query> instead), that they don't do the click-tracking. The only downside I've noticed is that there's no infinite-scrolling mode, you have to hit "next".
Whether they're still tracking the search queries, though, I have no idea...
DDG just claims they don't store your history. Presumably they could use redirected clicks as quality signal, without tying it to your browser cookies.
Whenever I think about DDG, I think "Oh, you think Gabriel Weinberg suddenly cares about your privacy after he sold a truckload of user information to Classmates.com?"
In the case of DDG, that would be difficult. DDG uses SSL. If you make a mistake and type "duckduckgo.com" instead of "https://duckduckgo.com", it will automatically redirect you to the secure page. Unfortunately, that redirect gives a man-in-the-middle and opportunity to hijack your connection, even with SSL; however, that's tricky enough that its hard to imagine anyone pulling it off without ever being noticed.
HSTS allows a site to indicate that in the future it should always be loaded over a secure connection, so you only have an interceptable connection the very first time you visit that site. Both Firefox and Chrome allow sites to add themselves to a list to "preload" HSTS enforcement, so even that initial connection which is man-in-the-middle-able doesn't happen.
I don't see them in the current lists, so DDG should contact Mozilla and Google to get added to their preloaded HSTS lists[1][2] so all connections will automatically happen only over HTTPS.
The initial request/redirect response is insecure. So a MITM can intercept the redirect response and replace it with his own content. That content could be, for example, a 200 response status and HTML pulled from the attacker's HTTPS connection to the target site.
So rather than being redirected to a secure connection, I happily communicate with the attacker instead.
They don't need the existing SSL cert. The "beauty" of SSL is that they can use a cert generated by any CA trusted by your browser - or even a second one from the same CA -, even if there's already a cert issued by one.
It could also be named PRISM as a form of misdirection, to make people think that the codename referred to upstream-collection operations. (It's beyond doubt that upstream collection is still ongoing too, though.) Or it could be that spy organisations just like optical metaphors. FWIW the You Should Use Both slide seems to use PRISM to refer specifically to the "direct collection" and not the upstream collection capability.
I have a private writing app and I can confirm that this NSA thing has been good for me too and I'm betting lots of services concerned with privacy. I saw a 5X increase in users in the past week and reviews in blogs are getting more numerous.
I wonder though if now all services regardless of their real focus will start marketing privacy as a feature and muddy the waters, making it hard for consumers to discern who is really about privacy and who just uses it as a marketing ploy.
Anyone else wondering if DDG will come out with their version of email? Maybe something that uses end to end encryption (maybe working together with Mozilla Foundation)?
I would prefer if someone else did it. Not because I do not trust DDG, but because I don't like the idea of one company providing everything (or too much). Why step into the footsteps of the dinosaurs?
I'd think adding E2EE to email would be like what pagerank did to the search engine. Why build from the ground up when you can build on the shoulders of the giants?
I agree, and that's why I was thinking that a company that is still growing and is known for their privacy practices could be good (beach-head?) at doing this (at least being able to advertise it on their own services to get some traction and feedback).
It's not like google or anyone else is going to do it. And looking at DDG traffic, it seems like it is a growing need. Then again, how to you monetize encrypted emails? contextual encyrpted ads? ;)
This is where they would kill it. Search would be supplementary, but if they pivoted and focused on email (eventually rebranding it), then that's a home run in my book. I'd like to see this, and would be interested in beta testing if/when it happens.
I'd prefer they became more competitive on search first: I try DDG a few times a year and it's a pretty big self-handicap when everything takes minutes rather than seconds to find.
I've been using DDG for nearly a year now, and have rarely turned back to Google. I only do so when I really cannot find what I'm looking for on DDG, and I try to tell DDG about the bad results (which I've actually seen get fixed). Getting bad results happens infrequently though, so the benefit of using DDG really outweighs the occasional inconvenience.
One thing that I prefer about DDG is that it doesn't try to guess what language I want to search in other than by my input. It is ridiculous that google forces me to go through worse results based solely on my location, it shouldn't matter where you are from.
DDG is has been my primary search engine for about a year now, and I do what I can to proselytize for it. Their results aren't always as comprehensive as the bigger engines, but I can usually find what I need.
Somewhat related but I've been canceling my subscriptions (Office365 in my case) to services I can't properly secure..
I'm also in the process of selling my surface pro and going back to a Linux laptop (OneNote 2013 sucks with touch on the Desktop for me .. and Microsoft's Windows8 version won't allow you to not use Skydrive)
Also, I pay Microsoft .. why won't they let me save my OneNote docs in a secure way using their Windows8 apps?
http://commons.wikimedia.org/wiki/File:Upstream_slide_of_the...
They're logging all your URLs and headers. How much are you willing to bet they can't decrypt https? I dont understand all the hubbub _is focused solely_ on direct server access (the bottom half of the slide), when "Upstream" access is just as big a concern.
EDIT: rephrased my concern about direct vs upstream