So ES has insecure defaults, I get that and it's been discussed to death.
But who the heck, in this day and age, exposes clusters directly to internet traffic? I don't care what the defaults or security measures you have. DONT EXPOSE SERVERS.
Place them inside a VPC, preferably a private one(in AWS parlance, behind a NAT GW). Use _something else_ to send traffic to them. If you are on AWS or similar (but not Azure I guess), add a load balancer to it. So now access would require creating a new load balancer, pointing to the servers in question, adding listeners on the desired ports, and configuring the appropriate security groups. Only then you can send external traffic. On the specific ports you configured on both listeners and security groups only.
Do this everywhere and you are in a much better shape. You still need to configure servers correctly, but if you mess up, nothing happens, unless you mess up many other things in an error cascade.
Even doing the official Elastic training - Elasticsearch Engineer 1 gives you everything you need to hurt yourself to set up, use and admin a cluster, but no security is covered unless you sign up for the next course and pay another couple of thousand dollars. Security really should be covered as a default in EE1.
I'm a huge fan of beginner tutorials that include security as a default, rather than having it as the thing you do last - and then commonly in actual project work all the development gets done against an insecure cluster in dev, then someone turns security on at the end, it all breaks and you now have a group of stressed-out people only incentivized to remove the thing that is now delaying the project at the very last moment. Makes for some easy mental gymnastics.
With RavenDB, you cannot setup a unsecured server unless you are _really_ trying. And we worked on getting secured setup to be a click through process with under 10 minutes to setup a whole cluster.
That was done explicitly because of issues like that. Security isn't a feature, and the fact that your product keep leaking details is not the fault of the user the 100th time this happens.
This is completely unrelated, but I remember your blog about dotnet development. I followed it about 10 years ago and remember when you started with RavenDb. I haven't done any dotnet development in about 7 years, but you taught me a lot about programming properly. Thanks.
I think a big part of it is there are so many "here's how easy it is to set up!!!" guides out there, none of which actually tackle security first, or in a way that reflects the training (or lack thereof) many of these folks who are actually doing the work have.
It's nice to assume that everyone setting up backend services for the multitude of companies out there have gone through accredited training and have years of strong production experience with security chops.
The reality is that because the gap between technically inclined and technically clueless is so large, anyone who can stumble through an online tutorial can be seen as "experienced" to someone who isn't.
I'm not sure what the answer is, but this is going to keep happening - maybe a 3rd party service that evaluates "Getting Started" guides for backend services? If basic security protocols are not covered, they get a red mark, and business owners could use that as an indicator of whether their tech folks could potentially screw it up.
Also, don't use 0.0.0.0/0 in a security group rule!
P.S. Azure has load balancers and security groups too- in fact their security groups are better than AWS's in some ways such as supporting thousands of rules instead of only 50.
Azure can even configure mutual authentication between the LB & the underlying servers, which would cause any direct server access to result in a 401[0].
0 - For API servers. I'm not sure if you could configure this with services like Elasticsearch.
That’s also a thing on Azure, but you can actually deploy certificates for mutual authentication as well. That way if somehow the network layer is pierced, you have another layer of protection.
AWS Elasticsearch was one of the last services which didn't support VPC until late 2017 [1], moreover if you had created a cluster without VPC support the migration is very cumbersome and application changes (to enable double writes) are required to execute it without any downtime [2].
Not to mention it's a massive pain to use elastic search with serverless... Especially if you want it in a private VPC. Adding ES raised our monthly bill significantly since it required also adding a NAT gateway (which then double-dips data transfer charges, actual bandwidth out and NAT processed data), the cost was replicated across multiple "States" (Dev/test/prod), and increased cold start times (below they did work to optimize that) since the lambdas accessing ES needed to be in the VPC too.
I can see developers crunched for time (or businesses, money) not taking the additional steps to get there... And this is the same deal for Redis/memcache, which is another reason I think we see those exposed sometimes too.
(To be clear, the additional costs are minor compared to a big business budget, they would be more detrimental to a low (~<400/month) budget project .. or a team that can't dedicate 300-600 man hours to implement this)
I'm not sure if that's better. You're just introducing more complexity in to your network..
If you don't have the budget for a nat/load balancer or want to just keep it simple, a simple iptables rule would do! Then test with nmap regularly to see if it's correct.
> If you don't have the budget for a nat/load balancer
It's not such a big budget in any of the big cloud providers.
And my point is: if you don't need access, you won't even have such load balancer. Unless someone goes out of their way to provide access to your server, no external access will exist.
It's usually not intentional. It's common to either assume by default it will listen on 127.0.0.1 and connect to it. With cloud VMs, people spin up a VM for non-public use but check the box that adds a public IP and forget it's there.
It's a design problem in my opinion. By default, listening on 0.0.0.0/0 should exclude loopback interfaces by the OS. That way, anything makig incorrect assumptions would fail and would require correction. Second, cloud firewall rules should imply deny all when "none" is selected. That way, having no protection is the same effort as adding at least one manual rule.
I have seen people taking the easy way out setting up dev environments where these security holes are ignored. Hopefully in some near future, everyone would use something like tailscale[1] or nebula[2] instead of taking the easy way out and leaving it all open.
What's the difference between a VPC and iptables? I agree that you shouldn't expose insecure services. But why do I need to introduce an entire private address space and cloud-managed SDN services to achieve that goal? If it weren't industry status quo, I'd almost call you a shill for the union of ops teams working to secure jobs for years to come. Almost.. (;
I would think that the post you respond to either supposes a hosted service (you do not control the server and its iptables) or that it multiple layers of protection is good for something critical.
But yes, if it's your own server, everyone should remember that regular Linux features are darn powerful, too.
A VPC, with security groups, can be much easier, depending on how your instances are managed. You’re sure that all the indtances have the same rules and that the rules are always updated, even in if you use auto-scaling.
If you use some provisioning tool you can have the same thing with IPtables. But why are you on AWS if you don’t want to used the features Amazon provides?
I am astonished that ES still does not recognize that there should be at least minimal protection against exposure by default. I mean, it's not super hard to generate good password on install, and if it's not necessary, it can always be manually disabled, but it is astonishing that "let's somebody else worry about security" is still a thing...
(https://www.theregister.com/2020/07/17/ufo_vpn_database/)
So ES has insecure defaults, I get that and it's been discussed to death.
But who the heck, in this day and age, exposes clusters directly to internet traffic? I don't care what the defaults or security measures you have. DONT EXPOSE SERVERS.
Place them inside a VPC, preferably a private one(in AWS parlance, behind a NAT GW). Use _something else_ to send traffic to them. If you are on AWS or similar (but not Azure I guess), add a load balancer to it. So now access would require creating a new load balancer, pointing to the servers in question, adding listeners on the desired ports, and configuring the appropriate security groups. Only then you can send external traffic. On the specific ports you configured on both listeners and security groups only.
Do this everywhere and you are in a much better shape. You still need to configure servers correctly, but if you mess up, nothing happens, unless you mess up many other things in an error cascade.