No, my entire infrastructure is not based on AWS; this was a small side project,...

scarface74 · on April 5, 2019

PS. This may be a little of pot and kettle, but what makes it so important to you that I'm convinced that AWS is unicorns and rainbows and having more control and visibility is a bad thing?

Because I was developing and maintaining web servers, database servers, queueing systems, load balancers, mail servers and doing development at small companies before AWS in its current form existed. I’ve done both.

The problem most posters here have isn’t lack of visibility, it’s not understanding the tools - it’s a poor carpenter who doesn’t understand their tools.

What am I supposed to do to satisfy you here? After finding the network does weird things, and not being able to get timely help with, I should conclude that AWS will solve my needs, if only I paid them more money? If only I had wasted more time watching videos that don't come up against reasonable search terms, maybe I wouldn't have come to the conclusion that having an amorphous, uninspectable, blob of a network between my servers and the internet is a bad thing?

Do you usually jump in on a new technology or framework and get frustrated because you don’t know how something works when you didn’t take the time to learn the technology you use?

I’m sure you’ve never had s problem with any piece of open source software that was “out of your control” or did you just download the software and patch it yourself?

The truth is that yes any piece of technology is complex and you thought you could just jump in without any research and start using it. I don’t do that for any technology. When I found out that we were going to be using ElasticSearch for a project, I spent a month learning it and still ran into some issues because of what I didn’t know. That didn’t mean there was something wrong with ES.

toast0 · on April 5, 2019

> Do you usually jump in on a new technology or framework and get frustrated because you don’t know how something works when you didn’t take the time to learn the technology you use?

I usually jump in and try to solve my problem with the tools that seem right for the job. Reviewing the documentation as needed. I get frustrated when the documentation doesn't match what I have observed. In this case, the connection limits of the default stateful firewall on EC2 were not mentioned anywhere I could find (knowing exactly what to to look for and where to look, I think I could find a vague mention of the general limit today). The specific limits per instance type certainly still aren't. I found forum posts about the connection limit with useless responses from employees. I reached out to my support contact and got useless information (they did get me access to bigger instances though, which just have a larger, unspecified, connection limit, that was still small enough that I could hit it). I reached the conclusion that AWS must be great, because people love it, but it has very non-transparent networking, and useless support for non-trivial issues. I'm sure if we were spending more, we would get better support people and they'd share insights into their networks -- I've seen that with our other hosting providers, although network insights there weren't needed for everyday things, just more details were nice when the network broke, so we could help them detect future issues and plan for failures.

If the technology mostly works, but I need deeper knowledge, maybe for optimizing, I'll seek out deeper references, but third party references during discovery is very dangerous. When there is a conflict between what is observable, what is documented in first-party sources and what is documented in third-party sources, I would have no context to know which documentation indicates intended behavior and which is most out of date.

> I’m sure you’ve never had s problem with any piece of open source software that was “out of your control” or did you just download the software and patch it yourself?

Download and patch myself, and upstream the patches if I have the time and patience. Isn't that the point of open source? Everything is broken, but at least I can inspect and fix parts of the system where I can see the code. Hell, binary patching is a thing, although I wouldn't want to do that on the regular. I've had patches accepted in the FreeBSD kernel, OpenSSL, Haproxy to fix issues, some longstanding I ran into.

At the end of the day, I'm responsible for the whole stack, because if it's not working, my users can't use my product. It doesn't matter if it's the network, software I wrote, open source software, managed software; even software on the client devices is my problem if it doesn't work. The more I can inspect, the better.

scarface74 · on April 5, 2019

The specific limits per instance type certainly still arent

In the official documentation they do mention that different instance sizes have different networking capabilities. This is stuff you learn early on when learning AWS - from the official books.

In this case, the connection limits of the default stateful firewall on EC2 were not mentioned anywhere I could find

The fact that security groups are stateful and Nacls are stateless are questions I ask junior AWS folks when interviewing. That’s like one of the level 1 questions you ask to know whether to actually bring them in for an on-site.

and useless support for non-trivial issues. I'm sure if we were spending more, we would get better support people and they'd share insights into their networks

So what did you find when you turned on VPC logging for the network interface of the EC2 instance? Again this level of troubleshooting is a question I would ask junior admins during an interview process.

I’ve had 100% success rate with our business support using live chat. With things that were a lot hairier and with my own PEBKAC issues.

Download and patch myself, and upstream the patches if I have the time and patience. Isn't that the point of open source? Everything is broken, but at least I can inspect and fix parts of the system where I can see the code. Hell, binary patching is a thing, although I wouldn't want to do that on the regular. I've had patches accepted in the FreeBSD kernel, OpenSSL, Haproxy to fix issues, some longstanding I ran into.

I’m sure my company wouldn’t have any trouble with approving our running our own patched version of our production database an OpenSSL....

toast0 · on April 5, 2019

> So what did you find when you turned on VPC logging for the network interface of the EC2 instance? Again this level of troubleshooting is a question I would ask junior admins during an interview process.

Hey --- this sounds like something my support tech should have asked me, or should have been mentioned by the employee response in the forum. I didn't look at VPC logs; I did see that SYN packets (or maybe SYN+ACK responses, I don't remember) were mysteriously missing in tcpdump between the ec2 instance and the server.

Either way, I'm not interviewing for a AWS position; I was just trying to use a managed off the shelf service. Looking through the docs now, I did find the mention of connection tracking [1], but even there, there's no mention of a limit (of course, as someone familiar with firewalls, I know there's always a limit with connection tracking, which is why I only rarely write stateful rules, and wouldn't have assumed default rules were stateful. I had read the bit about the default security group, which says:

> A default security group is named default, and it has an ID assigned by AWS. The following are the default rules for each default security group:

> Allows all inbound traffic from other instances associated with the default security group (the security group specifies itself as a source security group in its inbound rules)

> Allows all outbound traffic from the instance.

> You can add or remove inbound and outbound rules for any default security group.

Unfortunately, "allows all outbound traffic" was misleading, because it's really allows all outbound traffic subject to connection limits.

> I’m sure my company wouldn’t have any trouble with approving our running our own patched version of our production database an OpenSSL....

I'm assuming that's sarcastic, and you're actually saying you don't think you would be able to get approval to run a patched version of software. Are you saying that if you find a problem in OpenSSL (or whatever enabling technology) that causes your system to be unreliable, your employer will not let you fix it; you'll need to wait for a fixed release from upstream? From experience, upstream releases often take weeks and some upstreams are less than diligent about providing clean updates; not to pick on OpenSSL, but a lot of their updates will fix important bugs and break backwards compatibility, occasionally breaking important bits of the API that were useful. I guess, if you're in an environment where you have no ability to fix broken stuff in a timely fashion, it really doesn't matter whose responsibility it is to fix it, since it won't be fixed.

I really hope you didn't need my patch for your databases systems; but maybe you want/wanted it for your https frontends if you were doing RSA_DHE with windows 8.1 era Internet Explorer or windows mobile 8.1 so that your clients could actually connect reliably. Anyway, if you're running OpenSSL 1.0.2k or later, or 1.1.0d or later (might have been 1.1.0c), you've got my patch, so you're welcome. Fixes the issue well described here [2]

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-ne...

[2] https://security.stackexchange.com/questions/104845/dhe-rsa-...

scarface74 · on April 5, 2019

You specifically mentioned OpenSSL. Our auditors would have been up in arms at my previous company where we did do everything on prem if we ran a custom version of OpenSSL. Do you really think we could pass either HIPAA compliance or PCI compliance with our own unvetted version of it?