Hacker News new | past | comments | ask | show | jobs | submit login
Everybody gets WebSockets (cloudflare.com)
193 points by jgrahamc on May 5, 2016 | hide | past | favorite | 72 comments



Neat. We've been using CF websockets for some time now on the Enterprise plan. If you're intending to use CF websockets, be prepared for random (and potentially massive) connection drops, and be sure you're architected to handle these disconnects gracefully. Cloudflare rolling restarts have caused hundreds of thousands of websocket connections to be disconnected in a matter of minutes for us. If you plan to operate at that scale, make sure you're able to tolerate the thundering herd of reconnecting clients as cloudflare disconnects them.


As you know, when terminating a WebSocket connection due to releases CloudFlare now signals this action to both client and origin server by sending the 1001 status code (aka "going away", see section 7.4.1 of RFC 6455), so both sides are aware that the WebSocket termination is only a transient event, and that they can expect to immediately re-establish a connection again on retry.

We're working on additional refinements to the release process to minimize disruptions.

(I work at CloudFlare.)


Aye! Well aware. I actually think you added that for us specifically! Anyways, it's worth noting for anyone interested that they'll need to be able to handle spikey reconnects.

In our case, we keep a buffer of the last N websocket messages sent to the client, and when the client reconnects, it sends the last sequence id that the client saw, and the server catches it up.

https://discordapp.com/developers/docs/topics/gateway#resumi...


Ugh, how do you shard websocket connections when you have more connection than 1 server can handle ?


If you use Elixir/Phoenix, you can have many servers running same code, connected with each other as Erlang nodes. When client wants to broadcast something in whole network, it is broadcasted locally but is also sent do all servers and servers broadcasts it to their local clients. It's very easy to write.


It's not ideal for sure but you can use DNS to divvy it up by adding several IPs to your domain name.


You can also do it out of band and have an API call that returns the server you're supposed to be connected to.


This is extremely wonderful, thank you!

CF WS support means I will no longer have to keep around a separate, direct-to-server (sub)domain around for my WebSockets projects, and this also means goodbye to any websockets-related SSL certificate hassle in this regard.


It's nice but it was never a real technological barrier as much as it was a psychological barrier. You could always have added your WebSocket service under a different subdomain and then turned off CF for that specific subdomain only (CF always had this feature).

You can have different subdomains map to the same server IP address so I can't think of a reason why this would be a problem. That's what we did for https://baasil.io/

It's still nice though from the point of view that you no longer have to think about it. A lot of users wrongly assumed that you couldn't have Cloudflare AND WebSockets but you always could. At least now there won't be any confusion since it will just work by default.

Also, now that CF supports WebSocket as part of their offering, maybe they will also offer features like WebSocket rate limiting and such (for protecting against DoS)... And that would be useful.


I don't really see what the benefit is here for the developer? With HTTP CloudFlare can do caching etc to take load off the origin, but the origin server has to deal with each websocket (right?)


The benefit is this will allow people who previously had to reveal their origin IP (that is non business class users) in order to support websockets will now be able to hide origin behind cloudflare.

Cloudflare can't protect you from DOS attacks if the attacker knows the IP address of your origin.




Pretty trivial to get a new IP address these days.


IP enumaration is also pretty trivial. I guess that what attackers might be using to "unhide" domains. Scan the net and request the required domain name?


That won't work if the origin firewall allows only CloudFlare IP ranges.



If you're interested in CDN-like support for WebSockets that reduces burden on the origin server, check out https://fanout.io/ (disclaimer: founder). We translate incoming WebSocket messages into HTTP requests sent to the origin server, and you can send messages to multiple client connections at once using the API.


All the things CloudFlare provides but not the caching (e.g. WAF, DDoS protection, DNS, SSL, ...)


The biggest thing for me is that you can use secure WebSockets (wss://) without having to setup TLS on your origin server. This greatly improves the ability to establish WebSocket connections across proxies.


So what you want to do is fooling clients into believing there is transport encryption while actually there is none and the communication with the origin server happens in the clear?


From what I remember, CF actually requires an ssl certificate, but they'll accept a self-signed one (because they already validate ownership themselves.




It's not wrong, given that it's a reply to this statement: "... without having to setup TLS on your origin server." Strict mode is optional. It's certainly possible (and highly recommended!) to use transport encryption in both directions with CloudFlare, but that's not what jephir described here.


Why wouldn't you setup tls?? That makes no sense. By setup he means buying a cert


You don't need to buy certificates. There are at least four CAs offering free certificates, with at least two (Let's Encrypt and StartSSL) offering API-based issuance. Getting a publicly-trusted certificate from Let's Encrypt is roughly the same amount of work as finding out how to get the OpenSSL CLI to issue a self-signed certificate, or using CloudFlare's tool to get one from their CA.


Cloudflare is underrated, I am personally looking forward to their future.


I like Cloudflare a lot, but I am worried about their prevalance.

Similar to other companies I give up a good control of my life to, Cloudflare essentially becoming a new internet backbone is ever so slightly terrifying.

Cloudflare itself is a "single point of failure." (Though massively unlikely with their incredible infra).

Cloudflare could shut off all of it's servers, or it's nameservers could go haywire, and there would be a massive internet outage. Much like when AWS goes down.


I'm a little confused by the pricing page: https://support.cloudflare.com/hc/en-us/articles/200169466-C...

It's not clear to me what a "low" or "high" volume of connections is, and if that is number of connections or throughput.


The rest of that page suggests they haven't decided yet. They say "Barring abuse or attack, we will not impose limits errors for any application without contacting the customer. Customers whose usage claims a disproportionate percentage of resources for their current plan level may be asked to upgrade to the plan level that matches their needs." I currently have a $200/month business subscription (for https://cloud.sagemath.com), which was required for websockets before today, and will very likely switch to a $20/month pro subscription. We typically have up to about 500 concurrent connections on our site. Incidentally, we switched to using CloudFlare about a month ago after being hit by a DDOS attack (400MB/s incoming traffic).

[Edit: actually, we're not switching since only the business plan provides "Advanced DDoS protection - layer 3 and 4", which was why we are using CloudFlare in the first place.]


Curious why they hadn't been doing this all along. This seems like low hanging fruit to add value. Maybe I'm missing something.


We were doing lots of other things :-)


I don't know what that many simultaneous long running connections effect at their scale. Along with the inability to cache.


Did you read it


Anyone else watch the video and notice all the falsehoods?

Allegedly, previous to WS, the only alternatives were flash and polling! Except that long-polling isn't really polling in any meaningful sense (they very clearly describe polling that isn't so-called long-polling), and in particular doesn't have the latency-vs-bandwidth tradeoff and impact that actual polling has.

Allegedly, WS are "exponentially more efficient"! Except that they're not, they're more efficient by a constant factor.

Allegedly, the range of bandwidth saving is 99.8% to 99.9%! Except that obviously if you're passing enough data around in few enough responses, the bandwith savings could be more like 5% or 10%, and I find it hard to believe that typical uses actually save over 50%.

I sure do think web sockets are sweet, but wouldn't it be nice if there was some way to hold liars accountable?


I get that the overhead of an individual message over an individual connection is lower, but for that efficiency you give up layer 7 routing capability, make load balancing difficult, have more long-lived connections to your servers. Does HN generally feel like these are worthwhile tradeoffs?


Absolutely yes.

Connections are very very cheap to maintain, and very very expensive to build up and break down. Even in environments with load balancers, stateful firewalls, etc., it is much easier to keep a socket open than it is to create a new one.

If you're doing any kind of realtime data, websockets are the way to go. It is way, way easier to do stream processing and deal with backpressure than it is to have a gigantic number of clients doing http polling.

When Cloudflare mentions that websockets haven't had a gigantic uptake, I don't wonder if it's because Amazon's ELB doesn't support them and people don't want to roll their own haproxy solution.


ELBs support websockets fine with TCP/Secure TCP listeners.

What they -don't- support at that layer is sticky sessions, ip hashing, or similar; when a connection drops, you could reconnect to -any- instance behind the ELB. You have to engineer for that.

They also don't support Socket.IO if you have more than one instance behind the ELB (as, presumably, you do). This is because Socket.IO doesn't just open a websocket connection, and on failure fall back to something else; instead, it makes a REST call to the backend to see what connections it supports, then makes the websocket call. Both of these calls have to hit the same Socket.IO instance, because they're tied together with a session identifier. If they hit separate ones, then Socket.IO rejects the websocket, because that instance has never seen that session identifier before. But this is particular to Socket.IO's implementation, and not websockets themselves.


How is that better than HTTP with keep-alive. With that approach you also have long standing connection, but when connection is broken it'll be reestablished seamlessly.


There is more to this trade off, because push enables many neat things:

- you can halves the number of DB requests by 2 because refreshing is done by message passing half of the time instead of making a new requets trigerring DB calls. - cache invalidation is pro-active, and linked to events instead of just time. - you can have much more live stuff: settings, routing, configuration in general. You only need to load it once from your store, after that, any update is propagated. A CMS becomes cheap. - making clients talk to each others is easy. - websocket don't have CORS issues. You client code can query your live site, or a local backend, and synchronize both. - you can use the same canal of communication between your user clients and internal (micro-services) clients. The architecture becomes simpler. - if you have a heavy authentication process and complex sessions data, they can now be store with the persistent connection instead of being queried for every request.

Of course, not all that is enabled by default. You need to use some framework such as crossbar.io or meteorjs to get those benefits for free. Yet, it's very sweet.


Maybe you should ask the people using your web site if they care.


I hate how unspecific they are in this table: https://support.cloudflare.com/hc/en-us/articles/200169466-C...

How many is "Important to your operations"? If there's no number, it's hard to predict if I'll need Business or Enterprise.


Clustering WebSockets problem comes in my mind.

Nvm I have a central Redis server with SocketIO.

And do CF limits transfer rate like 128k/sec for free plans? If yes then it is just an experimental toy anyway.


What features do websockets offer over SSE/EventSource?

Ok, I know you can send data, but you can do that with HTTP/Ajax + SSE.


Wonderful news. This was the main thing I was hoping for from them.


So much for the purely state-based HTTP protocol.


[flagged]


That was possible before with a separate subdomain for DDP/WS. 1. export DDP_DEFAULT_CONNECTION_URL="http(s)://ws.domain.tld" 2. disable CloudFlare proxy (switch to "grey cloud") for that subdomain.


Did you read the post?


Not Tor users


Precisely.

Because of CloudFlare's position regarding Tor users, along with the erroneous idea that an identity is an IP address, these changes do nothing for the Tor user and developer community at large.

Tor is used for more than just routing around censorship. I use it to create a seamless network of all my computers all via hidden services. So every machine has a "hidden service domain name" of [hash].onion . Knowing all the hashes of my machines means I can then use all my machines as a computing cloud.

I've also figured out how to seamlessly handle DNS resolution of onion addresses at the resolver level, meaning all Linux programs that can handle DNS names can also handle Onion names. Effectively that means that tools like Puppet and Chef work over Tor as well.

Cloudflare serves to undo and retard growth of Tor and I2P (which gets much less attention). And there are definite positives of using Tor... along with anti-censorship and strong anonymity claims.


As an example where CloudFlare could be very useful is the following: (since I couldn't edit my parent comment)

I have a site. Because the site is... disparaging to political figures, I want to run behind Tor as a hidden service. Now, CloudFlare is good at their core business, so I hire them to cache my .onion site for mass consumption. It also defends my .onion site from being slashdotted/reddit hug of death/HNbombed.

All the better if fellow Tor users access my site: I want people to consume/use my site. That's why it's published.

And here is CloudFlare, destroying Tor user's capability of using any site that hires CloudFlare. It's a completely ridiculous situation, and a horrid solution they ascribe to that does nobody any favors.

Now, they do have some valid reasons. And there can also be technological ways to solve it without shitcanning every Tor user.

1. Tarpit defense. Slow down connections that show 'harassing behaviors' No you don't need to see that webpage 100 times in a second.

2. Offer a CloudFlare.onion hidden service. There's no reason they can't get into Tor HS as well. Facebook already is.

3. Limit bandwidth to known Tor exit nodes. Don't block.


re. 1) and 3):

CloudFlare is not blocking exit nodes because of volumetric DDoS attacks. That's not really viable over Tor anyway. It's mostly to block comment spam, crawlers and vulnerability scans (e.g. SQLi), which is unfortunately often done through Tor in automated ways.

Also: Website owners using CloudFlare can whitelist Tor traffic if they so chose.

I don't see how 2) would change anything with regards to the blocking situation. All the aforementioned problems would still apply, the only difference would be that they'd lose the IP address as an identifier, making it even harder to filter malicious traffic. I'm not against CloudFlare implementing something like that, but I fail to see the relevancy and I do wonder if tunneling all hidden service traffic for a site through a centralized service (which needs access to the plaintext in order to do any kind of caching or filtering) is a good idea for a project like Tor.


> Knowing all the hashes of my machines means I can then use all my machines as a computing cloud.

went looking through your github and blog for some code andor a writeup

any plans to post? god of magic :p


Yeah, I was making some big changes on my blog, and things kind of exploded in my face. My fault, it wasn't terribly critical so I nuked and reinstalled.

Just install Tor as you normally would, and turn on Hidden Services from port 22 to port 22 (for SSH). Keep track of the generated Tor onion hostname when you restart with Hidden Services enabled.

______________________________________

But here's the 'magic' part how to get .onion resolution across a Linux system:

get the following packages (Ubuntu, Debian)

    sudo apt-get install tor iptables dnsmasq dnsutils
Add the following to the /etc/tor/torrc file

    VirtualAddrNetworkIPv4 10.192.0.0/10
    AutomapHostsOnResolve 1
    TransPort 9040
    DNSPort 53
    DNSListenAddress 127.0.0.2
Restart TOR

    sudo service tor restart
Edit /etc/dnsmasq.conf and add the following:

    listen-address=127.0.0.1
    resolv-file=/etc/realresolv.conf
    server=/onion/127.0.0.2
Make a new file, called /etc/realresolv.conf . Add this in the file:

    nameserver 107.170.95.180 (or whatever nameserver you choose)
    nameserver 8.8.8.8
Restart DNSmasq:

    sudo service dnsmasq restart
Run the IPtables firewall update for redirection

    sudo iptables -t nat -A OUTPUT -p tcp -d 10.192.0.0/10 -j REDIRECT --to-ports 9040
Also, this script must be run at every boot, so add this in /etc/rc.local, ABOVE the "exit 0"

    /sbin/iptables -t nat -A OUTPUT -p tcp -d 10.192.0.0/10 -j REDIRECT --to-ports 9040
________________________________________________


thanks for the quick write up, i'll have to take a weekend to play with it

god of magic form: https://upload.wikimedia.org/wikipedia/en/thumb/2/2a/Angelke...


I certainly have the laugh :)

And was really, one of the best villains I've ever seen in any game/movie. Intentionally poisoning everyone, for expediency, is disgustingly evil.


Tor for remote access? There are much faster ways to create seamless networks of machines.


Indeed.

I have my IoT system across multiple networks, some of which I do not control the router. One possible solution was to have my VPS machine provide tunnels in a star topology. It's pretty sucky for the star center, because they end up getting all the traffic. And it's also pretty hard.

Another solution was to dyndns, port forwarding, router redirection, poking through NATs and all that... for each location I'm in. That's bad. Real bad.

Perhaps I could do some point-to-point trickery... but that doesn't work when both machines are NATted.

Tor Hidden Services provides a way to automatically breach the network seamlessly, and provide a routable address to that machine, no matter where that machine is. I take it to a cafe in Washington DC? Within 10 seconds, it's back on Tor. South America? 10 seconds.

The topology, once done, looks like a humongous ethernet hub, with no promiscuity mode. And each node is the 16 char hash.

Then, I can code against .onion addresses. They just work, and I know if I establish a connection, I can send data.

I'm already sending MQTT telemetry data from one network to my broker in another house 30 miles away. And it's sending pictures and metadata both. And it just works.

EDIT response:

>What kind of throughput & latency do you usually see?

Latency is a bigger one, obviously. It depends on the construction of the bridge. If I'm not using any overlays (OBFS3, OBFS4, scramblesuit, etc) then initial lag times can go in excess of 30 seconds. Once that initial connection is established, then lag times go down to about 200-300ms range.

Using overlays, because the network blocks various vanilla types of Tor, can take a lot longer. That's because those overlays are beat on by China and Iran. Sometimes they will dead-route packets (5% of the time). Initial transit I've seen up to 1 minute, with avg ping times around 500 ms.

Throughput is a different beast. The only network I haven't saturated was my 1GBps desktop at work. I can stream movies directly with the speeds I routinely get. Just that initial bridge construction will make you think something went wrong.

(responded here because "I'm responding too fast")


Not to diminish your TOR advocacy, but check out tinc.


What kind of throughput & latency do you usually see?

I imagine this is a lot better than hitting an exit node.


Except if you are behind a restrictive corporate firewall that filters anything not in RFC 2616.


If you are using secure web sockets (wss), this shouldn't be an issue as the traffic goes over port 443 and the proxy won't be able to tell if it's HTTP or another type of traffic.


Don't many companies install their own certificate on company-owned machines so they can MITM secure traffic too?


I can't speak to that vulnerability, but from what I have read websockets are more likely to not break on firewalls because they are using a well known port (either 80 or 443). Websockets also start out looking like a standard HTTP request. Compared to say MQTT or another protocol which uses its own port, websockets don't require any special setup. However, that really only applies to encrypted websockets; unencrypted has been known to cause issues with older proxies.


Furthermore websockets mask the payload so that naive proxies that do deep packet inspection to find HTTP requests don't fall to the poisoned cache trap.

http://security.stackexchange.com/questions/36930/how-does-w...


A firewall is more than a port filter.

Websockets don't use standard HTTP requests other than using standard ports. The header lines not mentioned in RFC 2616 get removed by a lot of corporate firewalls (that includes filtering proxies etc.).


I've never seen that. Also, this would break many websites that are part of the HPKP preload list I would think (need to double check), which include most popular websites.

However, I've often seen port 443 being blocked :|.


It's quite common in corporate networks. Locally installed CA certificates (i.e. anything that's not in the original root CA list) are exempt from HPKP rules for this particular reason.


Usually not much of a problem behind HTTPS (or wss:// in this case).



I'm aware of corporate MitM proxies, but it's less common to run into these kinds of issues if you simply use wss://, because a) they're not quite as common as your typical ancient HTTP caching proxy and b) there's a better chance they either support new protocols natively or at least support HTTP CONNECT (like squid).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: