Hacker News new | past | comments | ask | show | jobs | submit login
Token revocation (biscuitsec.org)
105 points by geal on July 4, 2023 | hide | past | favorite | 34 comments



> It is now recommended to have a refresh token with an expiration date, that can be long, and have that refresh token be single use. When it is sent to the authorization server to get a new access token, the authorization server will revoke the old refresh token and issue new refresh and access tokens.

I've been wondering briefly about this specific flow. It seems prone to a problem: if the refresh request gets sent successfully (and then invalidated), but the reply is not received for any reason, then the authorization chain is broken. There's no way to get a new token. Is there a good 'standard' way to handle this problem?

On brief reflection, while writing this comment, it seems like the only solution is to fall back to some other (long-lived) identity (from which the original OAuth token was derived). But that can be an inconvenient fallback.

> The interesting property here is that if the authorization server sees the same refresh token twice, it means that the token was stolen: either the thief or the legitimate client already used the refresh token, and the other one is now requesting an access token too. In that case, the authorization server must revoke all current refresh and access tokens for this client.

That seems like an invalid conclusion to draw though. Multiple requests could simply be caused by failures (including race conditions). Should the refresh token really be single-use?

One alternative that I could imagine is that the refresh token can be used multiple times, but whenever it's used, then only the most-recently created tokens remain valid -- with all prior tokens (that were created from it) being invalidated. This would enable token refresh to survive failures, while also making any tampering evident (due to the appearance of unexpectedly invalidated tokens).

Edit: Another strategy might be to link together all access tokens into one "session": whenever a refresh token is used to create new tokens, those all count as the same "session" and the session is what's invalidated. (The 'session' would be established during the process issues the first tokens).


> On brief reflection, while writing this comment, it seems like the only solution is to fall back to some other (long-lived) identity (from which the original OAuth token was derived).

I have been working on an OAuth service provider for the past few months, ran into this scenario. We came up with a solution of not immediately expiring the refresh token after it's used but set its expiry to X seconds (<30s) in the future and put it in a leeway state. If another call with the same refresh token is received within the X seconds and the refresh token is in the leeway state, a new token pair is created. If the refresh token is used after X seconds, it's no longer valid and a new authorization has to be generated.

Of course this isn't a foolproof solution, it has its own caveats but it's better than the alternative of forcing the user to go through the authorization process again or keeping the refresh token alive forever.


One option could be to require that the access token needs to used within X seconds after it has been issued. During this period:

* Old refresh token can be used. Using it will revoke previous "new" tokens (and possibly generate some warnings, especially if new tokens are used after this)

* Using new access token will revoke old refresh token and access tokens, possibly requiring access on dedicated path (something like lookup, essentially confirming that you received it)

This probably should be rate limited to avoid malicious/buggy client filling the revoke list with junk. And it may also make sense to decrease the lifetime of the old refresh token (e.g. if it was originally going to expire in 24h make it expire in 30 minutes) or set maximum of times this swap can happen.

Of course this would still cause issues if e.g. database server where these were stored failed and client had to restore from backups.


I do think it's a practical control if you are responsible for both the server AND all the client(s) code, i.e, it's feasible for you to micro-manage the fine detail of the auth dance.

However if you are providing an oauth / oidc API endpoint to be consumed by arbitrary developers I wouldn't advise this.

The way many clients work is that refresh token stuff happens in the background as needed, "piggybacking" off the thread using the access token. Depending on how everything is set up, parallel requests can be generated.

Providing support for oidc / oauth token flows is already extremely difficult because customers will usually be using an ecosystem-specific library and usually don't understand the spec, let alone whatever stricter "best practices" you might be enforcing.


This is an ongoing struggle with providers like Azure AD. In our setup, we have function apps that are receiving OIDC tokens from our AAD provider - either from our tenant our others via B2B collab. In this arrangement, it is possible for a user to pass authn for up to 1 hour before revocation in the hosting tenant takes effect. You can reduce this expiration but it can make the experience really bad across the board for all apps.

My current strategy for dealing with this is to add application-specific safeguards that re-verify the assigned roles of whatever user principal is present during more sensitive operations. If we detect that the user principal+token is no longer authorized, we can revoke our session bound to the AAD token and any further access is effectively restricted.

I've seen some other approaches, but I don't think you will get one that fits like a glove without making some alterations to the actual application which is consuming these tokens.


You're still stuck with a delay of you fetching the list of revoked tokens and someone getting to your service.

And that delay allows someone to come in with a compromised token.

However, with short expiration dates you achieve the same. You shorten the duration that a token is valid for.

Assume a token is revoked. It can be used against any target until they refresh their revocation list.

Same goes with just tokens with a short expiration date. It can be used against any target until it expires.

I guess the difference is users extending their token lifetime more often (more load for your signing server) vs offering an API that you can use to share revoked tokens which needs to be checked every time (in which case it's no longer stateless) or checked every once in a while.

Pick your poison.


You can do something like the True time trick. Services refresh the revocation list every 1s. If they can't refresh for more than 10s they reject authorization. Then on revocation you just need to wait for 15s or so to ensure that the token will no longer be accepted.

This of course means that your downtime tolerance for the token distribution is quite low, but still better than checking revocation on each request.

You can also flip it and track the state of every app server, revocation waits until they have all updated (or been confirmed dead) but now revocation time is unbounded. (And you are tracking more mutable state)


Unless you push the revocation list changes immediately to the service instances.


Which makes it stateful, but instead of a punch of systems pulling your revocation list you're pushing it.


How about short-lived access tokens and refresh tokens?


Depends on your usecase. On a device it's harder to steal a token, especially on iOS when you put all that stuff in the secured enclave.

Fortify that with certificate pinning on your application and it suddenly becomes REALLY hard to intercept traffic.


Biscuit has been one of those pieces of tech that makes me re-think the way we organize our stack, with a couple of key ideas that make us more user-centric and enable decentralized systems.

Granted I haven't used it, on toy project or in anger, but it sounds really neat.


It depends on the scenario. I can think of the following, although they're pretty similar.

1. Lost/stolen access token. 2. Lost/stolen refresh token. 3. Disabled account.

In the case of systems like Azure where access tokens have an "audience", they could theoretically send a revocation message to the audience endpoint (which would only need to care about revocations younger than the duration of access tokens, much like a CRL).

Revoking a refresh token would probably need to revoke all access (and perhaps identity) tokens associated with it.

Disabling an account would just need to revoke all tokens associated with it.


I'm not sure I understand the argument that expirations are inadequate: what's the likelihood of ending up with a token compromise without the client machine also being compromised? Further, if the client machine is compromised, isn't token revocation not much more than a fig leaf? If the machine is working correctly, and the user wants to explicitly terminate a session, why is erasing / forgetting the session secret not adequate?

I feel like I'm missing some critical, but non-obvious threat model here.


One threat model that people worry about is whether credentials can be lifted from a compromised machine once and then used to have permanent ongoing access -- without requiring ongoing access to the compromised machine.

If you have ongoing access to a compromised machine, then all bets are off. However, one security goal in these kinds of situations is to be able to rapidly "lock down" and quarantine a suspected breach; which in this case means revoking all of the credentials that the machine had access to. You want to be able to do this, and then once you've done this, be confident that the attacker has no further access.

If an attacker can lift a 'refresh token' from the machine, and use it to generate their own unlimited number of new credentials (that can be periodically refreshed indefinitely), then the challenge of revoking compromised credentials is more difficult; by the time you add a compromised token to a refresh list, it may have already been used to create another.

So you can't just say: "What credentials were on the machine? Revoke them all." That's not enough if the attacker can create their own new credentials using the refresh token.

If access tokens can be used to create additional access credentials, then it's more difficult to track and revoke all of them -- you'd need to revoke some kind of 'session' that all of the access tokens can be attributed to.


They mention lifecycle at the top: logout now, log out specific devices (I forgot to logout of Netflix after I left the AirBnB), log out unknown/old devices. I don't know Biscuit, but this is just normal security design.

Besides normal lifecycle, compromises can happen any number of ways. You won't know them all ahead of time, but having something like revocation designed into your system means you have one more mitigation option when something goes wrong.

Examples: all user-side compromises would be covered by "lifecycle" (give them a way to logout if they accidentally pasted a cookie somewhere, logged in to a public computer, etc). On the application side: discovered a security flaw (CVSS bug in a library or you designed a flaw), discovered suspicious activity in your network, suspect some browser or other exploit is allowing tokens to be stolen. After deploying fixes, you might want to revoke some set of token immediately instead of waiting for them to expire out.

Having revocation as an option might mean that you can make default expirations longer. Maybe we're ok with letting users be logged in for one week instead of 24 hours if we have the option to revoke. Otherwise if a token is compromised, an attacker has a whole week to play with the token.


Instead of assuming tokens are good and looking up blacklisted tokens, how about look up whitelisted tokens?


Snarky peer comment, but if you can do this, then yeah, it sidesteps all of the complexity of a distributed authorization system. The trade off is the single point of failure: when your token verifier goes down, so does the entire system.


A blacklist checker is also a single point of failure.


An unavailable whitelist checker fails deadly for all items.

An unavailable blacklist checker fails deadly for only blacklisted items.

Only if your system design must fail safe in all scenarios (and most do, to be fair), only then does it become a single point of failure.


Defeats the purpose of a token. You've just reinvented the session cookie.


What do we call a token that's stored in a cookie, sent via the HTTP 'Authorization' header to the API with each request, and Redis-cached on the server for say 5 minutes after looking it up in the users/token service? I still call it a token, just not a JWT. Maybe I should change my terminology?


I think that's reasonable terminology. "Token" is an overloaded term.

I'd be careful about "stored in a cookie" (really "sent in a cookie") because that would not be how an auth token would be sent or received. Not in a literal cookie, but another HTTP header.

I think it's fair to say that all cookies are tokens. The distinction between a typical cookie and a token in this context (i.e. a token that is difficult to revoke) is:

If a token needs to be looked up to know its authorization scope, it is easy to revoke (just update it or clear it in the lookup database). This is equivalent to a session cookie.

The challenge is when the token contains the auth scope. This might be used when the two systems do not share a lookup mechanism. These can be difficult to revoke before their built-in expiration time. This (token revocation) is the "hard part" about JWTs.


Don't reference tokens solve this problem?


At the cost of having to lookup the validity of the reference token AND its claims on every request.

People use bearer tokens (with bearer claims) to improve system performance and availability … at the cost of increased complexity as now bearer tokens need both expiration and revocation mechanisms.


Yes but you can cache the lookup for as short a period as is desired :P


I'm still skeptical whether expiration times wouldn't be adequate for many applications, assuming these times are short enough, like five minutes?


A malicious actor can do quite a lot in 5 minutes. And now you've got to have your users/services renew their authentication at least every 5 minutes, meaning there has to be some central authentication authority to be renewing through... which completely defeats the whole decentralization thing and is more complicated than just issuing randomized tokens and keeping hashes of those in Redis.

At best, you've got a system where a malicious actor doesn't think to renew their token fast enough.


You can make a lot of decentralized requests with an access token, before needing one centralized request with a refresh token.


This approach doesn't really solve anything. If you have expiration times that short you will need a mechanism for renewing tokens and a compromised token can be renewed all the same. All you have is slightly higher server load because your regular users need to renew their tokens all the time.


If your access token is compromised, you would normally need your refresh token to get a new access token? So it would increase security, but if you lose your refresh token, you def have the same problem.

Or am I missing some context?


Depends. Some systems allow for access tokens to be extended, some don't.

We only use refresh tokens for mobile devices as those can be security stored.

Access token renewal is allowed for browsers for as long as we detect a valid session.

And that session cannot be extended. Every 8 hours it's back to the authentication page with your YubiKey.


Same idea with certificates right? No-one checks certificate revocation lists, so Google is shortening maximum lifetimes reducing chance of long-time malicious use.


Right. SSL certification revocation lists have been called "broken in practice". In perfect practice, any time you want to use a cert you have to check the CRL, which means you have to pull the whole CRL or have it on a short enough refresh to satisfy your risk profile. If the attempt to access the CRL fails, then what? Do you trust the cert or not? https://en.wikipedia.org/wiki/Certificate_revocation_list#Pr...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: