So: I'm trading the need to cluefully and actively manage and audit my security group policies for... the need to cluefully and actively manage and audit an entire network topology? Which requires a lot more knowledge and moving parts, does it not? Can't you mess up your VPC configuration just as easily as you mess up security groups?
Just because various famous companies failed to properly audit their security group settings doesn't mean it's rocket science. It just means that they didn't notice the problem in advance.
Write a script that captures your SG policies and writes them to a file in some nice parseable format. The dumbest possible version of that might be:
ec2-describe-group | sort > /home/MY_SG_SETTINGS
where I threw in a `sort` because I don't know that ec2-describe-group returns rows in a consistent order.
Write another script that runs the above command every hour and yells loudly if the output ever changes. Wire that script to (e.g.) Nagios.
Now, use the time you've saved not implementing VPC to write some tests that attempt to connect to various important ports from outside your security group. Log lots of scary warnings if they ever succeed. Wire those to Nagios, too.
Unless your security groups are numerous and complicated, or your developers demand the power to open and close arbitrary ports on arbitrary machines seven times per working day, this would seem adequate for most use cases.
I know there are use cases for VPC, but it doesn't feel like this is one.
I'm not saying that one solution avoids the need for setting up things in a sane way. In both cases, you need to configure things properly or you can shoot yourself in the foot.
However, depending on the systems your company uses and your configurations, one solution might be simpler than another. Network topologies tend to be very static. You might have a public subnet, and a few private subnets.
Over time, you might add or remove web servers to the public subnet, or add various services to the private subnets. Especially if you are using a Service Oriented Architecture (SOA), you might have many different services that need to interact with each other, but not with the public internet.
When you are using security groups, you need to make sure to have separate SG's for each category of machine (public, private) and then manage them at a service level, and manage the interaction between multiple SGs. For example, Web should be available on port 80 from the internet, but the DB should only be available on port XX accessible from the Web SG. But also accessible from the same port from a Service SG, etc.
So as you continue to iterate on your services and deploy new ones, you need to be constantly tweaking the security group configurations. With VPC, once you have a sane public/private topology you can forget about it.
Additionally, most SOAs try to provide some form of high availability. For us, that means cross region / cross AZ replication and availability. Doing cross AZ is fairly simple in both EC2 and VPC, but doing cross region in EC2 is a pure nightmare. You cannot apply a security policy across regions, so you have no simple way to allow your nodes to communicate.
Since VPC acts as a distinct private network, we can simply use site-to-site VPN configurations between our regions, and nodes can easily and freely communicate with each other. There is nothing to worry about, since the private subnets are connected over VPN, and are using hostnames that are only routable within our private network.
Don't get me wrong, security groups can be properly used to provide a totally secure environment where only trusted nodes are allowed to communicate. You can add monitoring and configuration testing easily as well. But once you try to scale up past a few servers, move to a SOA, and provide cross-region availability, VPC becomes the simpler alternative.
Just because various famous companies failed to properly audit their security group settings doesn't mean it's rocket science. It just means that they didn't notice the problem in advance.
Write a script that captures your SG policies and writes them to a file in some nice parseable format. The dumbest possible version of that might be:
where I threw in a `sort` because I don't know that ec2-describe-group returns rows in a consistent order.Write another script that runs the above command every hour and yells loudly if the output ever changes. Wire that script to (e.g.) Nagios.
Now, use the time you've saved not implementing VPC to write some tests that attempt to connect to various important ports from outside your security group. Log lots of scary warnings if they ever succeed. Wire those to Nagios, too.
Unless your security groups are numerous and complicated, or your developers demand the power to open and close arbitrary ports on arbitrary machines seven times per working day, this would seem adequate for most use cases.
I know there are use cases for VPC, but it doesn't feel like this is one.