Hacker News new | past | comments | ask | show | jobs | submit login
Security Architecture Anti-Patterns (ncsc.gov.uk)
231 points by napolux on Jan 15, 2020 | hide | past | favorite | 39 comments



This is pretty unhelpful; the case I would make is that it's providing security largely by defining the problem away. For instance: it's usually unrealistic to require that all administration happen through clean-room systems that don't ever browse the web.

The real-world practice of security is in large part the deployment of risky systems with mitigations in place for the likely attacks that will target them. So, for instance, getting everyone to talk to the admin console on a purpose-built Chromebook with no Internet access is probably not a realistic option, but getting every system with admin console access MDM'd and requiring access to admin consoles to route through an IdP like Okta to enforce 2FA is much more realistic, and thus likely to happen.

The patterns in here that aren't unrealistic are pretty banal. I don't doubt that UK NCSC sees systems designed to be unpatchable, but modern engineering norms (Docker, blue/green, staging/cert environments) --- norms that have really nothing to do with security and are common to pretty much every serious engineering shop --- address that directly anyways.

Other patterns don't really make sense; for instance: you should design to make your systems patchable (sure, that's again a basic engineering requirement anyways), but also make sure your dev and staging environments aren't continuously available. Why? Those are conflicting requirements.


I respectfully disagree. I have seen many of these antipatterns in production in many medium & large size orgs, and I think the six scenarios presented in this doc are more common than you think.

The "browse-up" scenario is extremely common because engineers/administrators usually prefer to remote directly onto the systems their working on from their main machine rather than endure the inconvenience of needing to securely connect to another host first. Many of these admins/engineers would think it's inconceivable for their machines to be vulnerable but have no issues downloading dev tools, libraries and dependencies onto their machines from third party & untrusted sources (e.g. Github, NPM, etc).

'Docker, blue/green, staging/cert environments" - believe it or not, these are seen as emerging trends in many orgs rather than the norm as you suggest here.

And regarding designing systems to be patchable, you say: "sure, that's again a basic engineering requirement anyways", but again I'd counter that I've come across many systems that haven't been patched in months or years because it's deemed too hard. Another similar issue I've come across is where an org's DR processes have not been properly tested because it's too hard to failover without causing significant disruption. Both can easily be designed for early on, but for legacy systems that were implemented without this foresight it still remains an issue.


The way that I'm reading the "browse-up" scenario, however, isn't how you're describing it. Admins wouldn't "secure connect to another host"-- they'd have to use a trusted and known-clean device to perform all that administrative activities. Connecting to that device from another host (i.e. using it as a "jump box") seems to be specifically disclaimed as an "anti-pattern".


That's not how I read it. This past in particular:

> There are many ways in which you can build a browse-down approach. You could use a virtual machine on the administrative device to perform any activities on less trusted systems.

The point is to tailor your risk to the systems your accessing. You should interact with less trusted content in more secure ways if you're also interesting with high security systems.

So if you're using firejail/bubble wrap to consume less trusted content (web, email, videos, etc.) and selinux/apparmor; I think your system would match their description of browse-down, for most low to mid security systems. For high security maybe Qubes/VMs. Then highest security you start thinking about multiple machines with kvm switches.


Correct - the guidance serious shops give is to created privileged access workstations (PAWs) for critically sensitive work (think AD domain admin work or NW engineers, etc). You wouldn't expect most devs to be down in the weeds, but who knows


Another approach to browse up would be to not grant god access to a single administrator. Require all changes to go through a pull request that requires another admin’s thumbs up, etc.


That seems to be unfeasible to me in the event of a downtime.


If we are talking engineering there are OT systems that are not patchable. You cannot blue/green docker deploy machine that is running industrial system. It is all nice and easy if you run web farm where you can just balance stuff to other server.

For the first one, I would say you could make admins use "clear Chromebooks" but probably no one is going to pay for that.

For other banal ones, I would say it is good to remind people about "management bypasses" are not good idea.


If just one machine is running an industrial system that can’t go down, you have a serious problem. Hot standby predates web servers.


Depends on the kind of control it's doing. Industrial control requiring milliseconds level availability is extremely hard to fail over.

This is how e.g. chip fabs and chemical plants get such loses on power outages. A power plant might be on order of single second of sustain.

The solution there is to actually make the full scale redundancy and compartmentalization or accept the losses.

Most of the time we're not dealing with that.


I have much trouble to understand what is meant by the "browse-up" scenario. If it's "don't use devices being able to download stuff from the internet" I would deem this extremely impractical advice.


I agree.

This is more a list of anecdotes that are definitely bad. But they’re not representative of the kinds of common mistakes I would call out.


This is amazingly concrete and understandable from a technical perspective for a government security document. Where can I find more like this?

Everything I’ve seen in ISO security standards, for example, is written at an abstract theoretical level about the design of security bureaucracy rather than the design of actual systems.

One bone to pick: basically all tech companies expect you to be oncall for your services via your laptop. They’re not paying anybody to sit in the office overnight, and commuting in when you get paged with seriously delay mitigation. Is “browsing down” even possible under those circumstances?


Is “browsing down” even possible under those circumstances?

From TFA:

There are many ways in which you can build a browse-down approach. You could use a virtual machine on the administrative device to perform any activities on less trusted systems. Or you could browse-down to a remote machine over a remote desktop or shell protocol. The idea is that if the dirty (less trusted) environment gets compromised, then it’s not ‘underneath’ the clean environment in the processing stack, and the malware operator would have their work cut out to get access to your clean environment.


Yeah, tech workers aren’t going to tolerate doing everything except admin consoles in a VM or RDP session. Maybe on a special purpose workstation but not the daily driver company Macbook.


Ideally you'd do the least amount of browsing and reading email possible on your work laptop and sandbox whatever is left if possible.

Something like Qubes OS (or maybe manually using containers or virtual machines) could be an option. Running snaps and flatpaks also ensures some level of sandboxing if I'm not mistaken. Using a separate user for riskier activities is also worth thinking about.

I think it's also true that all OSes are moving towards more sandboxing by default (permission to read files, permission to start at runtime, admin access, etc.) so it's less of a risk than it used to be.


> Ideally you'd do the least amount of browsing and reading email possible on your work laptop and sandbox whatever is left if possible.

Ideally.

How many people are posting here from their work laptops? And how many have SSH access to at least one "secure" system?

Granted, HN is unlikely to be a threat, but other sources may be. There has been progress in sandboxing, but dev machines are specially vulnerable, as in many cases you need people to be admin on them to do their jobs effectively.


> * Is "browsing down" even possible under those circumstances?*

Not a security expert, but based on their explanation of "browsing down," I think it's possible if the laptop is sufficiently locked-down. The issue isn't fundamentally with the management device being remote, it's being less-trusted. In the limit case, you could have separate management-only laptops that get passed around to the on-duty employee.


>Is “browsing down” even possible under those circumstances?

Seems like it could be done by having a mobile workstation that doesn't read email or browse the web, just acts as a secure 'satellite' administration device that does little more than VPN back into the administrative network. From there, you jump off to a terminal server if you need to browse or email.

The termination of that admin VPN would probably need to be a distinct endpoint from the general VPN access concentration, and have additional security/authentication measures in place.


At that high a level, getting too granular about actual systems just ends up with people throwing your standards out because their special snowflake of a use case cannot possibly work under it.

The reason it ends up focusing on the bureaucracy is because they hope if you can get the bureaucratic part right, the organization can have the relevant expertise in house to make informed decisions about risk, the minimization and mitigation of which is really the goal of the security function.


I agree, it's a great doc. Some more concrete examples.

For number 1, administering a Windows Active Directory domain controller from a desktop that is also used to browse the public Internet and check email.

For number 6, networking groups use this a lot as the reason to not patch routers.


On a personal note, the advice not to browsedown from less trusted devices often means an organisation supplying a trusted device.

That potentially conflicts with IR35 for contractors who would then not be supplying their own equipment.

I've also seen it result in a contractors *nix laptop being swapped out for a Windows laptop (built by a junior employee) with mandated "phone home" software installed. Personal biases persuade me that this wasn't necessarily an improvement in the security of the system.

I should say, I'm generally a fan of NCSC advice and I think it's great they're putting their thoughts out there.


This is a great list and I'll keep all this in mind, even if I see number one as near unfeasible.

The one I found most surprising was number four. If using PaaS is better practice, then I certainly won't feel lazy anymore for not wanting to deal with the administrative overhead of IaaS (or kubernetes, but that's a different beast entirely, one I try to avoid by using nomad). This will help my impostor syndrome, though calling it that is probably presumptuous on my part.

That aside, why is this site using react to the point they need JavaScript enabled for it to run? What could a site such as this possibly need all that interactivity for? Also why is what seems to be the entire css embedded on the head? That's just weird.


The main page's source code actually doesn't contain CSS. That tag is written dynamically by the JavaScript.


Is that a common practice recently (or maybe a practice I missed entirely)? Las time I used webpack (for example), we were still creating a minified css file that was meant to be linked in index.html, but I haven't done frontend in a while.


> Anti-pattern 4: Building an ‘on-prem’ solution in the cloud

It didn't use this as an example, but good heavens how it is that I still have to understand VPC and NAT to use half of the cloud?

10./192. was never a security measure; it was a IPv4 rationing scheme is that IPv6 made obsolete.

AWS made a big deal of the fact that Lambda functions can now be launched in a VPC in less than 15s.

Why are people doing that at all????

Because of bearded network admins that setup their on-prem network in the cloud.

Now networking in the cloud is so complicated, people are turning to FaaS because has a better chance of skipping the morass.

---

P.S. Don't get security groups confused with NAT. AWS had security groups long before its VPC service was even a thing.


This ends up tougher than the expected because most Enterprise folks don't deploy entire systems to cloud providers. They deploy some subset, and still need connections back on prem. Instead of abstracting them into APIs, they just spin up DirectConnect or VPN, and boom you have tight coupling and not much more than an extension of on prem.


While 10./192. private addresses in IPv4 were in largely designed to help deal with address space exhaustion, they also are important because organizations can use them without having to own the addresses or register them in any way with IANA (or equivalent) since they are not publicly routable. IPv6 still maintains this feature with unique local addresses, and the entire fc00::/7 address range in IPv6 is allocated to private networks and is not routable on the public internet (not that AWS uses these, any IPv6 address they assign to you is a globally routable address). A lot of stuff just still doesn't support IPv6 yet (RDS for example https://aws.amazon.com/premiumsupport/knowledge-center/rds-i...) so you're options are to either give that endpoint a public address and manage your security groups well, or give it only a private address which gives you the added benefit of the endpoint not being publicly routable (which is a nice second layer of security beyond security groups), downside being the things that need to talk to it must now also live in your private subnet, hence Lambda launching in a VPC.


> A lot of stuff still doesn't support IPv6 yet (RDS for example)

Almost all software (OS's, browsers, databases, load balancers, etc.) supports IPv6.

Some third-party services don't. But that's usually irrelevant to my point. For example, RDS allocates public IPv4 address at no extra cost. In any case, my point is that cloud providers are unwisely shoehorning NAT into everything; citing AWS as a contributing factor just reinforces that.

---

There's no longer any need for local IPs, except so that we can still design 90s-style networks.g


> Why are people doing that at all????

From a security standpoint: because it dramatically improves your security posture. Defense-in-depth is a real thing, and people who care about security use it (the people who pay big bucks).


Excellent and to the point. I see this apply to many technology SMB companies as well. We once compiled a few actionable recommendations for smaller companies that host on AWS and that post ended up being our most popular article https://www.templarbit.com/blog/2018/11/21/security-recommen...


> You need to enable JavaScript to run this app.

Nah.


There's no way you would have been able to find it without the page loading, but for anyone else in the same position, the direct PDF is available at https://www.ncsc.gov.uk/pdfs/whitepaper/security-architectur....

I would maybe question whether an article that can be perfectly embedded in a static PDF without any changes or downgrades really needs an entire React stack and a Service Worker for the browser, but :shrug:. Every org is free to make their own engineering choices.

It does seem to be a pretty good list, so worth taking a look at.


Is this because the site is using React? I had a look at the source of the page and I'm guessing this is React-based? Are the benefits gained from using React worth it for the limitations you get?


Bloody hell, this document is great, it's like reading Ross Anderson. Exactly the type of security advice we need to get out to IT people.


Top comment "This is pretty unhelpful", ya, requiring running nonfree javascript to learn about security anti-patterns is an antipattren. Fuck corn, fuck bread.


Great description. How do you get security architecture into the design phase of a system when you are doing dynamic and iterative product development?


They are only mutually exclusive if your business and product management teams deprioritize security. In my experience, the typical reason that security gets neglected (as opposed to just making reasonable trade-offs) is that management and product management both care too much about just shipping shiny things and don't care enough about doing right by the end user. I've seen better and worse teams. Most teams fall into a category of "you're lucky you're not big enough to be a target."

General best practices I can think of, in broad organization level strokes:

1. Make sure security is implemented at the dev ops layer through practices such as logged just-in-time access to production systems, secret vaults for service keys and certificates, airgapped machines for handling secret keys, etc.

2. Make sure security best practices are implemented by default into your API's (CORS, TLS 1.3, whitelist based firewalls between services that shouldn't need to talk to each other, etc.) and make it transparent to the API caller, at least when it's your own services talking to your own services.

3. Make security an element of design and code reviews. Square, for example, did this by having subject matter experts advise teams on security design when projects were still in the ideation/design phase.

Ultimately, security costs a non-trivial amount of time, and it requires training your developers to be able to reason about security.


This is super helpful. Good find.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: