Why Does OAuth v2 Have Both Access and Refresh Tokens?

R

22

836

Section 4.2 of the draft OAuth 2.0 protocol indicates that an authorization server can return both an access_token (which is used to authenticate oneself with a resource) as well as a refresh_token, which is used purely to create a new access_token:

https://www.rfc-editor.org/rfc/rfc6749#section-4.2

Why have both? Why not just make the access_token last as long as the refresh_token and not have a refresh_token?

Rubie answered 15/8, 2010 at 15:25 Comment(0)

S

551

The idea of refresh tokens is that if an access token is compromised, because it is short-lived, the attacker has a limited window in which to abuse it.

Refresh tokens, if compromised, are useless because the attacker requires the client id and secret in addition to the refresh token in order to gain an access token.

Having said that, because every call to both the authorization server and the resource server is done over SSL - including the original client id and secret when they request the access/refresh tokens - I am unsure as to how the access token is any more "compromisable" than the long-lived refresh token and clientid/secret combination.

This of course is different to implementations where you don't control both the authorization and resource servers.

Here is a good thread talking about uses of refresh tokens: OAuth Archives.

A quote from the above, talking about the security purposes of the refresh token:

Refresh tokens... mitigates the risk of a long-lived access_token leaking (query param in a log file on an insecure resource server, beta or poorly coded resource server app, JS SDK client on a non https site that puts the access_token in a cookie, etc)

Stickup answered 26/8, 2011 at 18:52 Comment(14)

Catchdave is right but thought I would add that things have evolved since his initial reply. The use of SSL is now optional (this was probably still being debated when catchdave answered). For example, MAC tokens (currently under development), provide the ability to sign the request with a private key so that SSL is not required. Refresh tokens thus become very important since you want to have short-lived mac tokens. – Glisson 12/7, 2012 at 2:33

"Refresh tokens, if compromised, are useless because the attacker requires the client id and secret in addition to the refresh token in order to gain an access token." But the client id and secret is also stored in the device, isn't it? So an attacker with access to the device can get them. Then why? Here, github.com/auth0/lock/wiki/Using-a-Refresh-Token , It is written that loosing a Refresh token means, he can requests as many auth tokens as he want, may be not in the googles scenario, but what if I am implementing my own oauth2 server? – Sabin 4/1, 2015 at 18:33

"The attacker requires the client id and secret in addition to the refresh token in order to gain an access token": then what's the difference between using a refresh token and simply resigning in? – Bagasse 2/10, 2015 at 8:54

Refresh token can be used by a third party that can renew the access token without any knowledge of user credentials. – Octastyle 22/12, 2015 at 15:58

@MarekDec I thought the client id+secret and users credentials were the same thing? Do you need the id+secret/user credentials to get a new access token or not? – Fleeman 22/8, 2016 at 0:43

@KevinWheeler No, the client ID and secret are credentials for the OAuth client, not the user. When talking about OAuth the "client" is usually a server (for example the stackoverflow web server) which interfaces with an authorization or resource API server (for example the facebook auth provider). The user's credentials are only passed between the user and the OAuth API server, and never known to the client. The client secret is only passed from the client to the OAuth API server, and is never known to the user. – Candlemas 14/9, 2016 at 17:56

@machineyearning If my understanding is right, isn't knowing the shared secret all that's needed for the auth server to sign and give out a new, valid access token? If the client is already sending it's shared secret along with the refresh token, then what role does the refresh token play in the process? Is there any data inside a refresh token that becomes beneficial in the refresh step, other than I guess the auth server knowing if it's an expired token or not? – Burkes 20/10, 2016 at 21:30

@Burkes You need to pass 2 challenges to be issued an access token. First you need to prove you're a registered client, which always happens by specifying your client ID and secret. Second you need to prove you have permission to access the user's resource on his behalf. This second part happens in 2 ways in the auth code flow. At first you get the user to log in directly to the auth provider, and you get back a code to your redirect URI. Subsequently you use the refresh token instead of the auth code, so the user doesn't have to login again once the original access token expires. – Candlemas 20/10, 2016 at 21:40

@machineyearning I kind of get the initial auth part that involves logins/credentials to retrieve access and refresh tokens. So the refresh token just doubles as your credentials, without actually requiring your credentials again but client id/secrets are still required? Does the refresh step usually involve dealing with state? Like checking if the refresh token is still valid/not revoked if you have such an option implemented? – Burkes 20/10, 2016 at 22:0

@Burkes yeah you can kinda think of it as a reference to the previous user access grant, which would imply not only the user's credentials have been entered but also specifically which resources they granted you access to. I've seen the refresh token be issued along with some metadata before, such as TTL, if that's what you're wondering. But usually the auth server MUST give you a relevant error response when you try to use an expired, revoked, or otherwise invalid refresh token. Look at the oauth2 spec section 4.1 "authorization code grant". It's pretty readable as far as specs go – Candlemas 20/10, 2016 at 22:9

If the access token was compromised, we can most likely assume the refresh token and the rest of the credentials were too, this whole overly unnecessarily complex authentication flow is absolutely pointless. – Cyrstalcyrus 4/7, 2020 at 16:42

I have a question, do we expire the old refresh token on every login by user ? Lets suppose I have an expired Auth token, so I will call refresh token to get new auth token, but if the user has logged in again and the refresh that I have is expired, won't this whole concept of refresh token fail ? Should I not mark refresh token expired on every login by user – Ungrudging 5/10, 2020 at 15:38

Can a Public Client refresh token? Since it has no secret, what does it include in refresh request for authentication? – Kohl 12/10, 2020 at 9:52

"I am unsure as to how the access token is any more "compromisable" than the long-lived refresh token and clientid/secret combination" - I can imagine this difference being real in cases where the "resource server" (which never sees refresh tokens) is less secure than the authorisation server. For instance, one can imagine the "resource server" at a tech giant really being many different resource servers implemented with different tech stacks, any one of which could be leaking access tokens to a weakly-defended log server or something. The authorisation server is likely easier to harden. – Auliffe 19/6, 2021 at 15:4

L

673

The link to discussion, provided by Catchdave, has another valid point (original, dead link) made by Dick Hardt, which I believe is worth to be mentioned here in addition to what's been written above:

My recollection of refresh tokens was for security and revocation. <...>

revocation: if the access token is self contained, authorization can be revoked by not issuing new access tokens. A resource does not need to query the authorization server to see if the access token is valid.This simplifies access token validation and makes it easier to scale and support multiple authorization servers. There is a window of time when an access token is valid, but authorization is revoked.

Indeed, in the situation where Resource Server and Authorization Server is the same entity, and where the connection between user and either of them is (usually) equally secure, there is not much sense to keep refresh token separate from the access token.

Although, as mentioned in the quote, another role of refresh tokens is to ensure the access token can be revoked at any time by the User (via the web-interface in their profiles, for example) while keeping the system scalable at the same time.

Generally, tokens can either be random identifiers pointing to the specific record in the Server's database, or they can contain all information in themselves (certainly, this information have to be signed, with MAC, for example).

How the system with long-lived access tokens should work

The server allows the Client to get access to User's data within a pre-defined set of scopes by issuing a token. As we want to keep the token revocable, we must store in the database the token along with the flag "revoked" being set or unset (otherwise, how would you do that with self-contained token?) Database can contain as much as len(users) x len(registered clients) x len(scopes combination) records. Every API request then must hit the database. Although it's quite trivial to make queries to such database performing O(1), the single point of failure itself can have negative impact on the scalability and performance of the system.

How the system with long-lived refresh token and short-lived access token should work

Here we issue two keys: random refresh token with the corresponding record in the database, and signed self-contained access token, containing among others the expiration timestamp field.

As the access token is self-contained, we don't have to hit the database at all to check its validity. All we have to do is to decode the token and to validate the signature and the timestamp.

Nonetheless, we still have to keep the database of refresh tokens, but the number of requests to this database is generally defined by the lifespan of the access token (the longer the lifespan, the lower the access rate).

In order to revoke the access of Client from a particular User, we should mark the corresponding refresh token as "revoked" (or remove it completely) and stop issuing new access tokens. It's obvious though that there is a window during which the refresh token has been revoked, but its access token may still be valid.

Tradeoffs

Refresh tokens partially eliminate the SPoF (Single Point of Failure) of Access Token database, yet they have some obvious drawbacks.

The "window". A timeframe between events "user revokes the access" and "access is guaranteed to be revoked".
The complication of the Client logic.

without refresh token
- send API request with access token
- if access token is invalid, fail and ask user to re-authenticate
with refresh token
- send API request with access token
- If access token is invalid, try to update it using refresh token
- if refresh request passes, update the access token and re-send the initial API request
- If refresh request fails, ask user to re-authenticate

I hope this answer does make sense and helps somebody to make more thoughtful decision. I'd like to note also that some well-known OAuth2 providers, including github and foursquare adopt protocol without refresh tokens, and seem happy with that.

Labannah answered 14/10, 2012 at 19:38 Comment(5)

@RomannImankulov If I understand it correctly refreshe token we can save into db and delete them any time we want to revoke the access, so why don't save acces tokens it self ? – Afra 12/11, 2014 at 7:52

@Afra the short version of my post is, if you save the access token in the database, you hit the database on every request to your API (which may or may not be a problem in your particular case). If you save refresh tokens and keep access tokens "self-contained", you hit the database only when the client decides to refresh the access token. – Labannah 14/11, 2014 at 9:42

Personally I don't like this approach of not hitting the database to gain performance if it is going to compromise security (even if only for the timespan of the window). One should be able to revoke an access_token immediately if necessary as almost always we are dealing with sensitive user information (otherwise we would likely not be using OAuth in the first place). I wonder which approach bigger companies like Facebook and Google use. – Antitoxin 20/9, 2015 at 21:0

"Nonetheless, we still have to keep the database of refresh tokens" -> No? We can just keep a database of access tokens, but only hit it once we receive an access token that is expired! Right? Or I'm missing something. – Spindrift 27/2, 2023 at 7:26

@Spindrift What would the purpose of storing expired access tokens be? The reason we store refresh tokens in a database is for a blacklist - to invalidate future access tokens from being created using a specific refresh token. Storing refresh tokens is specifically for invalidation, not validation. If you're implying that access tokens can make other access tokens - they can't, only refresh tokens are able to generate new access tokens without requiring the user to manually reauthenticate. – Almagest 16/3 at 21:27