Why Authorization Is Hard

jzelinskie · on Sept 15, 2021

Disclaimer: I'm a founder of Authzed (YC W21), a productized form of Zanzibar

I'd like to reiterate that Policy Engines and Zanzibar-like systems are orthogonal and can be used together very successfully. However, the article claims that ABAC cannot be done with ReBAC systems which is false[0] and it claims that Zanzibar systems do not support the concept of "public", when the system at Google does[1]. The availability of Zanzibar-like systems outside of Google is still relatively new, so the user experience can be greatly improved. For example, the Authzed Schema Language[2] is a vast improvement over Zanzibar's raw userset rewrites.

That being said, I think the Oso UX is quite nice in comparison to many products in the space, but architects should always spend the time to figure out what's best for their requirements. If you're just starting to explore AuthZ, this article is a pretty good primer for the problems in the space and why you're unlikely to design something great on the first go if you build it yourself. It's really hard to write about this subject in a digestible fashion, so props to the team!

I especially liked the quote "[...] authorization is a topic as cool as moving to Kubernetes!". Considering almost all of our team is ex-CoreOS and has deep ties to Kubernetes, we truly believe authorization is cool enough to stop working on Kubernetes ;)

[0]: https://link.springer.com/chapter/10.1007/978-3-662-43936-4_...

[1]: https://www.youtube.com/watch?v=mstZT431AeQ

[2]: https://play.authzed.com

samjs · on Sept 15, 2021

Hey! Yep, you can totally use a policy engine like XACML or OPA with Zanzibar. We talk about that a little further down in the post.

The "public" example was meant to be a simple example of attribute-based access control but you could replace that with other similar ABAC examples for why you might need to bring in an additional policy engine.

stevenpetryk · on Sept 15, 2021

I've never used it in prod, but have heard great things from former Intercom coworkers about how easy it was to rewrite the whole auth layer in Oso. Never heard someone describe being "blown away" building an auth layer but here we are.

Congrats to the Oso team for building a great product :)

samjs · on Sept 15, 2021

Thanks! I'll pass it on to the team :D

I've got to say, the folks at Intercom made it particularly fun. They were sending us traces and graphs from their internal systems when we trying to figure out some issues with them (e.g. we ran into this datadog context problem: https://github.com/DataDog/dd-trace-rb/issues/1389)

lmeyerov · on Sept 16, 2021

We have been enjoying Casbin (ABAC):

- technically: embedded in your DB, so not another infra problem + can stay in-DB for hotpath queries (ex: view all). As db-native RLS etc keeps maturing, my bet is this is where a lot will end up anyways. This is orthogonal to their discussion of service vs monolith - it enables working at the data tier vs app tier.

- governance: our app does not depend on an outside company

At the same time, it was surprisingly slim pickings for such a core thing, so more diversity the better!

Edit: For context, we have been thinking a lot about authorization recently and investing here, so a recent post on 'The Sharing Paradox' on how we view it as an important way to grow successful team use, esp with modern features like friendly ABAC UIs: https://www.graphistry.com/blog/100x-sharing-paradox

sizediterable · on Sept 16, 2021

Hey Sam! Just a minor correction; The unified data access later at Facebook is the Ent Framework, which can be backed by different data stores such as MySQL, TAO, or ZippyDB. It is also primarily accessed by one monolithic service, so it's not the best example for the microservices scenario you illustrated. I can understand getting this misconception from the Prisma post. I'm curious where they got their bad intel from (I guess it's just good marketing to compare oneself to FB engineering, facts aside)

samjs · on Sept 16, 2021

Oh, and if you’d be up for sharing any more info on how authorization _is_ done at fb, I’d love to chat. Email is in my profile :)

samjs · on Sept 16, 2021

Hey!

Thanks for the correction. I’ll get on updating that.

ianstormtaylor · on Sept 15, 2021

Wow, it's really refreshing to see authorization explained in a way that dives into the fine details. Most articles I've seen introduce global role-based restrictions and leave it at that which is super frustrating.

Oso looks really well considered. Wish I had this in the past. Well done!

samjs · on Sept 15, 2021

Thank you! You might also enjoy the series we've been writing, Authorization Academy: https://www.osohq.com/developers/authorization-academy

It builds up a bit more gently and introduces a lot more concepts than I could fit into one (already long) blog post.

KineticLensman · on Sept 16, 2021

This is really well done and for me stands out as an example of something written by experts that is actually accessible to newbies like myself. E.g in terms of explaining new concepts and motivating why specific things are important

jillesvangurp · on Sept 16, 2021

Dealing with complex authentication and authorization frameworks is also hard. The solution to authentication and authorization is rarely picking a framework and then calling it a day. If you don't know how the framework works and what its failure modes are, you are likely to make some mistakes.

I was taught one useful thing about security: the triple gold (AU) standard; which refers to the need to do authentication, authorization, and auditing. People seem to always forget about the latter but it's actually equally important. If bad people try to get in, you need to know. If there's a security bug causing people to get in that shouldn't, you need to know. Etc. That's auditing. Being able to audit what happens and who is authenticated and authorized for what and why is important and quite often also a legal requirement.

I always start from that angle: so I need detailed security logs with context I can make sense of. I need observable software basically.

I've implemented custom role based security in various projects. It's not that hard but you need to understand some basic design patterns. Users have context. That context includes roles that are associated with privileges that have a particular scope in which they apply. It's basically the set of parameters that drive the calculation of which list of privileges this user has; which is a function that has to live somewhere and that needs to be tested extremely thoroughly.

You assign roles to users but you verify privileges. This keeps decouples user management from policy changes. The process of authorization is verifying that a given principal has the right privileges given their user context and the context of the privilege (the requested scope). The process of authentication is verifying the user's identity and then bootstrapping the user context.

If you use JWTs or similar technology, you can actually serialize that user context, sign it, and pass it around. That's why they are so popular in micro services. It also leads to an unfortunate tendency of developers to confuse authentication and authorization. Checking the signature is valid would be authenticating; using the signed information for granting or denying access would be authorizing.

Where it gets hard is that some authorization logic is conditional on things outside the user context. Like the request they are making, the time of day, some business context they have, or the state of a particular thing. This is where bad things happen when developers who don't understand this topic deeply get a little bit creative to meet the requirements.

stonecharioteer · on Sept 16, 2021

I've used Oso at Visa and it was a fascinating product. Until it clicked Polar was super confusing, but after speaking to the team over Slack. I realized how easy it was to use.

Their documentation has improved a lot since January, and while I haven't used Oso at Merkle Science (my current employer), I plan to. I owe the folks at Oso a blog article, which has been sitting in my drafts folder for 9 months now. I am going to dust it off this weekend and get back to it. Its the least I can do for them.

alex-olivier · on Sept 16, 2021

Disclaimer: I am Product Lead @ Cerbos[0] - an open-source authorization service

This article is a great summary of exactly why this area is rife for innovation and we love seeing different approaches to solve the headache of authz. Coincidentally I have just published a write up of why authorization has been so hard from a Product Management and requirements perspective. https://alexolivier.me/posts/the-never-ending-product-requir...

With Cerbos we have taken the approach of having an opinion of how things should be done to help start building out policies. One area particularly with other solutions is that you can do anything with them which is great but with authorization there a few themes that come up time and time again - application permissions, product packaging, enterprise readiness and multi-tenancy.

By giving you a structure and an open-source[1] service to run in your own stack that can work with anything that can make an http/grpc call we hope to simplify the whole system.

[0] https://cerbos.dev [1] https://github.com/cerbos/cerbos

motohagiography · on Sept 15, 2021

Dealing with this issue now with some institutional clients. Authz as a service is the right way to go, and it's probably the most interesting topic in security, imo.

There is an emerging security architecure role I'm seeing that is basically application governance design, and I don't know whether it will go down the stack into a security dev/ops role using a highly expressive policy DSL, or up the stack into a kind of in-house technical counsel who specifies it to a provider, who in turn implements it as a service and is compensated for taking on risk for enterprise app authz decisions. e.g. is there the right level of risk to externalize it to a provider, or little enough to keep it buried in dev?

I worked on the design of some authN products (universe of UMA2, SAML, OATH, EMV, etc.) and did some early xacml design for a policy engine, and the enterprise market is just catching up to decade old federation technologies now, and that's just for IAM. If that trajectory is any clue, federated authorization as a service probably has a 7-10 year runway in front of it as well.

The article is so valuable because it articulates how inextricable business logic and authorization often are. Personally I think the main reason that's hard is because of poorly thought out abstractions in the business logic and unwinding these systems will only progress one enterprise architect funeral at a time.

If I were making a bet, for the above reason I would ask whether it may be reasonable to treat enterprises as a sales tarpit for something this important and cool, and focus on new companies with clear growth that will be huge in 10 years instead of waiting for one that's already big enough to want this to roll over.

thinrich · on Sept 24, 2021

(OPA [0] creator and Styra [1] founder here.). Love how this article calls out 3 key challenges for authz. I realized by the end that whenever I talk through this OPA diagram [2] for folks, it's those same three key ideas that we cover.

One bit of context I'll add here is that there's a broad spectrum of authorization use cases: kubernetes admission control, database access control, microservice/application authorization, etc. Despite them all being authorization, they each have their own requirements around enforcement points, data dependencies, modeling/expressiveness, performance, etc. So it's not surprising that with such a broad space of requirements we end up with such an interesting and rich landscape of technology choices.

The other bit of context to add is that this article seems to focus primarily on the custom application use case from the perspective of the software engineer (e.g. who can change the code, pull in libraries, and/or rearchitect the app). Other teams in orgs (security, compliance teams, and operators) have their own challenges around authorization, in part because they can't change the code but are responsible for its health nevertheless.

And I totally agree that there are plenty of folks who believe your quote: "authorization is a topic as cool as moving to Kubernetes"

[0]: https://www.openpolicyagent.org/

[1]: https://www.styra.com/

[2]: https://www.openpolicyagent.org/docs/latest/

defanor · on Sept 16, 2021

These "why X is hard" articles seem to be based heavily on one's experience. My immediate thought about authorization's complexities was that the rules tend to be poorly defined and changing: users usually want to have full access to everything, non-users tend to restrict them too much, and then there are some odd exceptions like users being between roles, or belonging to a role but with some exceptions.

This article lists complexity of enforcement, mentioning that it's done in many places. That's not quite my experience: it's indeed hard to do and easy to mess up doing that way. Given that most of the projects where authorization is needed are about database access/interfaces, a DBMS itself is a good candidate for security policy enforcement in a single place.

The next point is decision architecture, talking about lack of database access when an authorization-related decision needs to be made, but it won't arise with authorization happening in the database. The article mentions that it becomes a problem if you try to use authorization in multiple microservices or something like that, but pretty sure that in most cases it is unnecessary.

The last point, modelling, is the closest to how I'd answer the title question. Oso claims to solve it by introducing a declarative policy language, which is what DBMSes provide too; PostgreSQL, for instance, even allows rather advanced (arbitrary SQL queries) programming of policies (via row security policies). Still doesn't quite solve the issue of requirements being a vague and changing mess.

For the first two points, I'd rather call it "how to make authorization hard".

Edit: To be fair, the third problem is man-made too, and could be avoided; it's just rarely under the programmer's control, unlike the other two.

anonymousDan · on Sept 16, 2021

Interesting - so essentially you are saying to push all authorization logic into the DB? What if your app also needs to access other services (e.g. a message queue, S3 etc)?

defanor · on Sept 16, 2021

With multiple data sources requiring authorization, which don't support it on their own, it indeed makes sense to handle at least some of the authorization in the software accessing them all, and then it's indeed more complex. I just doubt it's commonly needed, to the point of listing it under things making authorization in general hard.

MaxRS · on Sept 15, 2021

Implementation of Google's Zanzibar: https://github.com/ory/keto

chromatin · on Sept 15, 2021

I am definitely watching the entire Ory stack. Too immature for me to adopt now, but I’m trying to design with hooks and processes in mind for future integration!

jackliusr · on Sept 16, 2021

I listed the three areas of casbin based on "Why Authorization Is Hard". I think casbin is more interesting.

Enforcing

   Separating concerns is particularly hard for enforcement:

      casbin support multiple models and you can shift to advanced models as your applications grows.

   Data Filtering

     filtered policy, domain

Decision architecture

   library

   distributed:  Casbin Service[0]

Modeling

   casbin is super flexible and it support many models[1]

[0]https://casbin.org/docs/en/service

[1]https://casbin.org/docs/en/supported-models

newusertoday · on Sept 16, 2021

any idea about performance? if rules grow linearly with users how does it affect performance? as per casbin performance docs https://casbin.org/docs/en/performance if there are more than 100 rules it might require tuning however if model is such that we need to check if each user has permission to access some resource say route than it would easily grow linearly with number of users. Any idea how to solve this? i am not able to get any good documentation on this.

jackliusr · on Sept 16, 2021

The result of benchmark shows it supports 11000 rules (10000 users, 1000 roles) in 2.258262 ms. It is very performant.

[0] https://casbin.org/docs/en/benchmark

lmeyerov · on Sept 16, 2021

you can partition what it considers, such as per-tenant where you index on org id and only consider same-org interactions for most things. likewise, we picked it because it can reuse your existing sql db, so you can get around problems like auth-dependent view all queries by doing your own sql.

if others have scaling+perf tricks, am curious :)

xcskier56 · on Sept 15, 2021

Question if anyone from the Oso team is here: From what I've read, the Oso core is built in rust, but then is it called from Ruby or is it re-implemented in Ruby? I couldn't see any native extension like code in the ruby portion of the repo but could easily have missed it.

gkaemmer · on Sept 15, 2021

Hey, Oso engineer here. Good question.

The rust core is indeed called from the ruby library (as it is with all of our 5 other host libraries). The core itself is pretty complex (there's a whole parser/interpreter in there), so maintaining it in a bunch of languages would be a bit hectic.

There are some files inside `lib/oso/polar/ffi` that define the C bindings used by the rest of the library. Here's an example: https://github.com/osohq/oso/blob/main/languages/ruby/lib/os...

We use the ffi gem to make that work: https://github.com/ffi/ffi

EDIT: hobofan beat me to it! :)

xcskier56 · on Sept 16, 2021

Thanks for the explanation and the code locations!

hobofan · on Sept 15, 2021

It is a wrapper around the Rust code:

- It triggers the rust build in the Makefile: https://github.com/osohq/oso/blob/1d3bf5a4a997a574c2b19084a0...

- There is a bunch of FFI code (also in the ffi directory): https://github.com/osohq/oso/blob/1d3bf5a4a997a574c2b19084a0...

xcskier56 · on Sept 16, 2021

I figured it was in there somewhere. Thanks for pointing me to the spots! Gonna dig in now and try and grok it (hopefully)

ogazitt · on Sept 17, 2021

Great article, thanks for sharing! We completely share your point of view that authorization is (at least) as interesting as moving to k8s :)

Disclaimer: I'm a co-founder of Aserto, a developer API for authorization.

I particularly like the section about the architectural options, and the evolution from a monolith into the various combinatorics of { service architecture, decision logic, data }. Many people gloss over those details, but they are critical.

Most developers we talk to want to have the best of both worlds - an authorization system with latency and availability characteristics akin to a library, but a managed experience around the artifacts that are involved in an authorization decision - the policy, the user attributes, and the data.

As you've described, there are many ways to approach authorization, and the challenge is to tune the system to be opinionated in some areas while being general-purpose enough to fit many use-cases. We've chosen to be less opinionated about the data model and more opinionated about the architecture. Our philosophy is described here [0].

You mentioned OPA as a general-purpose decision engine, and since we use it as our decision engine, we have some experience to share. You noted that one can compile policies to WASM and execute them on the client/browser, bringing tighter coupling between the server and the client. But there's a different/better way to do this - namely to define different decisions for allowing an operation at the policy enforcement point and for making a UI element visible or enabled. You can package (and evaluate) all of these decisions in a single policy file, which helps you keep all your authorization logic in a single place, rather than have to update it in many places. We describe this in more detail here [1].

[0] https://www.aserto.com/blog/five-principles-of-authorization

[1] https://www.aserto.com/blog/addressing-challenges-with-githu...

bullen · on Sept 16, 2021

One solution is to build all features as equal privileges = user generated content. It benefits everyone in the long run! I spent 20 years building new user handling before I finally settled on the solution used by my own database: http://root.rupy.se (see User.java for details) it uses https://datatracker.ietf.org/doc/html/rfc2289 for security.

kkajla · on Sept 16, 2021

A great introductory read on such an expansive topic. Authz is a long-standing problem without a real standardized solution yet. I think improving everyone's understanding of the core problems that authz presents is a nice first step to building better standards/best practices.

Whether you're building something in-house or evaluating a third party library/service, I've found that OWASP has great content, guidelines, and best practices around authz[0] and access control[1]. They've been a go-to reference for me throughout my software engineering career.

We also followed a lot of OWASP's guidelines/best practices while building Warrant[2] (YC S21 - I'm one of the co-founders), so that developers don't need to think about it and can follow best practices just by integrating our authz service into their application.

[0]: https://cheatsheetseries.owasp.org/cheatsheets/Authorization...

[1]: https://cheatsheetseries.owasp.org/cheatsheets/Access_Contro...

[2]: https://warrant.dev/

brabel · on Sept 15, 2021

Seems to be an alternative to OPA: https://www.openpolicyagent.org/

xtracto · on Sept 15, 2021

From my limited research looking for authorization systems [1] it is but it is not: OPA can get actually quite slow for "real world" production scenarios. For example until recently Ory Keto used OPA and had that issue https://github.com/open-policy-agent/opa/issues/1443 . It seems to me that Oso although is based in Opa and Zanzibar ideas, it may have some performance improvements and ease of use.

[1] (we are planning to change a half-assed internal auth system to a pre-existing one)

awinter-py · on Sept 16, 2021

> It's embedded in your application, it uses your data models

(from their homepage)

'uses your data models' as an alternative to API is really interesting -- risky in some ways (you're including code), less risky in others (you own the data). Also leans on sophisticated ORM + migration features that aren't standard in every language / framework, but probably will be some day

lifeisstillgood · on Sept 16, 2021

I have been noodling a few things in my head, trying to work out what my next projects will be.

I think Authentication is going to have to move over to FIDO-like solutions only. That's an effort but all the pieces are in place.

But Authorisation strikes me as needing a rethink on how everyone everywhere handles data. I cannot work out how to handle "can this person see this data" unless all data is, well, labelled.

Having little pieces of custom code written in each app to do custom checking just seems like it's the wrong way round.

I like the idea of Twitter's Strato (mentioned here I think) - which roughly seems to be "we labelled every field in every database" and then we have a data access layer that makes accessing those and validating the permissions

I get that enforcement still needs other things - but without that data access layer i think complexity will kill you.

tdrdt · on Sept 16, 2021

At the moment I am building a toy PHP framework where you must pass a user into the service/business layer that will go all the way up to the data access layer.

In my opinion authorization should be done at read/write level because then all layers will benefit from the same authorization security.

It seems Oso is doing just that, which is great.

In fact this is how linux/unix has been working for years. But on the internet sometimes it feels like every wheel is being reinvented. Most frameworks behave as if you would open a file in Word and then Word will decide if you are authorized to view or edit the file.

greeklish · on Sept 19, 2021

I'm working on a project to combine other access control attributes, such as rate limiting and user privacy lists (white & black-lists), based on Groups, Users & Nodes.

Reverse indexing would be very hard e.g. for an ever-growing threaded discussion where participants may be of different groups, and where sub-thread access would need to be moderated (moving posts, restricting access to specific people etc.)

xarope · on Sept 16, 2021

This article really hits the mark. I had to spend 2 months writing an auth system for my current project, which had to deliver both ACL permissions (am I allowed to call this API) as well as data filtering (am I allowed to work on the data I'm requesting), and even then barely scratched the data filtering part (albeit "good enough" for now to get the team moving on other parts).

spopejoy · on Sept 17, 2021

If TFA is true, why not do like blockchain does (well, when done correctly) and have no stateful authorization at all, combined with a yubikey or similar to sign every request? Taking state out of the picture simplifies things tremendously, and techniques like multisig offer lots of room for managing risk.

I just wonder if a lot of session-based auth is that way because of historical bias.

nixpulvis · on Sept 15, 2021

Is there anything that CanCan can't can?

mijkal · on Sept 16, 2021

Reminds me of Hawaiian Pidgin :-)

'If can can. If no can no can'

n42 · on Sept 15, 2021

how much can can CanCan can; can CanCan can more can?

forks · on Sept 16, 2021

buffalo

titive · on Sept 15, 2021

Wish we had this when we just spent 4 months building this 3 separate times across 3 separate products.

rad_gruchalski · on Sept 15, 2021

What did you end up using?

gambler · on Sept 16, 2021

I've fought with several auth systems a long while ago. Eventually came up with a pattern that worked well for all my needs. Here is what I did.

Big picture:

  user <-> something <-> ... <-> something <-> action

You give the permission system the starting point and the ending point, asking it whether this user can perform this action. The system loads or generates the graph in the middle and tries to find a path from one side to the other.

...

More details follow.

API for checking permissions was implemented as a boolean function of the following form:

  CanDo(user, action)

Action was represented by a string of a particular format, e.g. "articles.edit.987". This is semi-arbitrary, as long as you stick with your convention.

The function expanded user into a list of appropriate "roles".

Roles were handled in such a way that they could represent real users (user.12434) or groups (group.admins) or virtual groups (virtual.loggedIn).

The database stored information about what various roles could do in a table that had two essential columns: role id and permission pattern. This can be done in a variety of ways, depending on what storage mechanisms you use.

The resolution was as follows.

1. Find all rules for all the roles the user is in.

2. Take each pattern and match it against action supplied to canDo() function.

3. If there is a match, return true. If you exhaust the list, return false.

This setup is surprisingly flexible, as long as you understand how to use it. The trick is that you can add more entities between "user" and "action" and that will not change the API, just the resolution process. Moreover, since you effectively searching for a path in an acyclic graph between user and actions, you can search in any order and in any direction. And you can store permissions in all kinds of formats, again, without changing the application API.

All the business logic unrelated to authentication is handled in user code. E.g., if you want to allow someone to post articles on Tuesdays only, you write something like the following code:

  if (TodayIsTuesday() && CanDo(currentUser, "articles.editOnTuesdays")){
    //do stuff
  } else {
    //error
  }

Technically, "pattern" for permissions could be anything. You could use a regex, but I would advice against it. If you use a simple hierarchy, you could expand it and run the match in a single DB query.

neandrake · on Sept 16, 2021

Do you know where there is more information about this type of design or what it might be called, in blog or book format? I ended up designing a very simplified (and modified) version of this same thing a decade ago for a web application:

Data is owned by some account (user or group). Users have membership/roles associated with an account (either a share or a membership) which enumerate all the operations that user is allowed to perform on data owned by that account. During a request a graph is built mapping the current user to the data's owning account via user->account shares/memberships or group->account shares/memberships. The graph is traversed ensuring that the action being taken exists on every role through the path (e.g. a user can share data with another user but can only grant a maximum of the same permissions that user has).

The big challenge I had was getting others to expand on this system as the product grew. Particularly as new features were added there was reluctance or ignorance that the actions introduced by features had to be enumerated in the roles/permissions. Though, when adding new actions it is difficult to know how to insert those into existing defined roles (as our roles are entirely customizable and not some constant set). That required account managers to go in and update their role definitions to specify where new actions should be allowed. The desire to avoid doing that led others to "just do an admin check" that would proliferate through the codebase as a means to simplify the logic, defeating the purpose of this flexible system. Early on I had eliminated the isAdmin check by ensuring all accounts had an un-editable Admin role whose list of permitted actions always contained everything. However an isAdmin check was added later which looked for the user having the role with all the permissions, instead of doing a check for that specific action.

Edit: Apparently this type of design is called Role-Based Access Control and information about it can be found by searching this term. And oso (the product made by the people who wrote the parent article) is a library for implementing an RBAC.

https://en.wikipedia.org/wiki/Role-based_access_control