iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🦁

Why the sub Claim is Required in ID Tokens

に公開

Introduction

The other day, I was looking into ID Tokens.
While checking the mandatory claims for ID Tokens, the reason for the existence of the sub claim didn't quite click for me.
This was because I couldn't think of any scenarios where the sub attribute would be utilized when considering the purpose of an ID Token at that time.
After some research, I found a way it is used and the significance of the sub claim became clearer, so I decided to write this article.
I am still learning, so I would appreciate any feedback if there are errors or omissions.
Note that this article is written assuming the use of the Authorization Code Flow.
Please keep that in mind.

About ID Tokens

What is an ID Token?

Before diving into the sub claim, I will first explain what an ID Token is. *
*From here on, I will write about ID Tokens, but to be honest, reading Mr. Kawasaki's Qiita article or the specifications provided by the OpenID Foundation is more organized and accurate.
Therefore, the following descriptions are strongly colored by my own learning process.
According to OpenID Connect Core 1.0, an ID Token is described as follows:

The ID Token is a security token that contains Claims about the authentication of an End-User by an Authorization Server when using a Client.

When the authentication/authorization flow is completed as shown below, you receive a token from the authorization server that includes information about the user who performed step ②. This token is called an ID Token.
Quoted from https://qiita.com/TakahikoKawasaki/items/4ee9b55db9f7ef352b47
Quoted from https://qiita.com/TakahikoKawasaki/items/4ee9b55db9f7ef352b47
OpenID Connect defines the specifications for user authentication by defining this ID Token.

Structure of an ID Token

In authorization flows using OAuth 2.0, we often? see the issuance of access tokens using JWT.
However, in OAuth 2.0, there is no mention in the RFCs or other documents that requires the use of JWT for the token format.
On the other hand, for ID Tokens, it is clearly stated that they must be JWTs.

The ID Token is a JSON Web Token (JWT) [JWT].

And there is also the following statement:

The ID Token MUST be signed using JWS [JWS].

From this, we know that an ID Token is a JWT signed by JWS.
Regarding the structure of JWS, RFC-7515 states that it takes the following structure:

o JOSE Header
o JWS Payload
o JWS Signature

From the above, the structure of an ID Token has become clear.
I was wondering if it would be okay if the token containing user information was tampered with, but we can see that it is mandatory to protect it through signatures.

Rules for the Payload Section

We have confirmed that the format of the ID Token structure is defined. Next, let's look at the rules for the payload section.
That being said, since Mr. Kawasaki's Qiita article covers this in detail, I won't go into specifics here.
However, please keep in mind that it is clearly stated that the sub attribute is mandatory.

sub
REQUIRED. Subject Identifier. A locally unique and never reassigned identifier within the Issuer for the End-User, intended to be consumed by the Client (e.g., 24400320 or AItOawmwtWwcT0k51BayewNvutrJUqsvl6qs7A4, etc.). This value MUST NOT exceed 255 ASCII characters. The sub value is case-sensitive.

It is required that an identifier for the End-User is always included.

Verification of the ID Token

When you obtain an ID Token, you need to verify whether it is valid. The specifications also describe what needs to be verified.
Below are the verification requirements when an ID Token is obtained via the Authorization Code Flow.

  1. If the ID Token is encrypted, decrypt it using the keys and algorithms that the Client specified during Registration and that the OP used to encrypt the ID Token. If encryption was negotiated with the OP during Registration but the ID Token is not encrypted, the RP SHOULD reject it.
  2. The Issuer Identifier for the OpenID Provider (typically obtained through Discovery) MUST exactly match the value of the iss (issuer) Claim.
  3. The Client MUST validate that the aud (audience) Claim contains its client_id, registered with the Issuer identified by the iss (issuer) Claim, as an audience. The aud (audience) Claim MAY contain an array with multiple elements. If the ID Token does not list the Client as a valid audience, or if it contains additional audiences not trusted by the Client, the ID Token MUST be rejected.
  4. If the ID Token contains multiple audiences, the Client SHOULD verify that an azp Claim is present.
  5. If an azp (authorized party) Claim is present, the Client SHOULD verify that its value is the Client's client_id.
  6. If the ID Token is received via direct communication between the Client and the Token Endpoint (in this flow), TLS server validation MAY be used to validate the issuer instead of checking the token signature. The Client MUST validate the signature of all ID Tokens according to JWS [JWS] using the JWT alg Header Parameter. The Client MUST use the keys provided by the Issuer.
  7. The alg value SHOULD be the default of RS256 or the algorithm specified by the Client as the id_token_signed_response_alg parameter during Registration.
  8. If the JWT alg Header Parameter uses a MAC-based algorithm such as HS256, HS384, or HS512, the UTF-8 representation of the client_secret corresponding to the client_id contained in the aud (audience) Claim is used for signature validation. For MAC-based algorithms, behavior is not defined if aud has multiple values or if there is an azp value different from the aud value.
  9. The current time MUST be before the time represented by the exp Claim.
  10. The iat Claim can be used to reject tokens issued too far in the past and limits the time a nonce must be stored to prevent attacks. The acceptable range is a matter of Client policy.
  11. If a nonce value was sent in the Authentication Request, a nonce Claim MUST be present and its value checked to ensure it matches the value sent in the Authentication Request. The Client SHOULD check the nonce value for replay attacks. The precise method for detecting replay attacks is a matter of Client policy.
  12. If an acr Claim was requested, the Client SHOULD check whether the asserted Claim value is appropriate. The value and meaning of the acr Claim are outside the scope of this specification.
  13. If an auth_time Claim was requested, either through a specific request for this Claim or via the max_age parameter, the Client SHOULD check the auth_time Claim value and request re-authentication if it determines that too much time has elapsed since the last user authentication.

I have quoted the verification requirements as they are, but what I want you to notice is that there is no mention of verifying the sub attribute anywhere.
Even though it is mandatory to include the sub claim in an ID Token, it isn't mentioned at all during the verification stage.
This is quite mysterious.
Well, since specifications are often written with minimum requirements, it might be verified more in actual usage.
So, let's look at a library called auth0-spa-js provided by Auth0, one of the leading IDaaS providers.
This library implements the behavior for a client when a SPA frontend acts as an OpenID Connect client.
Let's look at the part that verifies sub in jwt.ts, which handles ID Token verification in this library.

if (!decoded.user.sub) {
  throw new Error(
    "Subject (sub) claim must be a string present in the ID token"
  );
}

There is a branch for it, but it's strictly an existence check.
It doesn't check if the sub is actually the user's identifier, and it won't trigger a verification error even if an arbitrary value is used.
In this way, despite the sub attribute being mandatory for ID Tokens, it doesn't seem to be particularly focused on.
So, when is this sub claim used?
The answer lies in the UserInfo Endpoint, which I'll introduce next.

What is the UserInfo Endpoint?

Checking OpenID Connect Core 1.0, it is defined as follows:

The UserInfo Endpoint is an OAuth 2.0 Protected Resource that returns Claims about the authenticated End-User. To obtain the requested Claims about the End-User, the Client makes a request to the UserInfo Endpoint using an Access Token obtained through OpenID Connect Authentication.

First, the UserInfo Endpoint is a resource for obtaining information about the End-User.
Through this UserInfo Endpoint, you can retrieve additional user information that is not included in the ID Token.
And it is clearly stated that an access token, not an ID Token, is used when requesting user information from the UserInfo Endpoint.
From this, we can understand that it is an endpoint for retrieving user information from a resource server. *
*This part is not something I understood solely by reading the specifications. I wrote this with reference to the following tweet. While I have written it as if I found it myself, I would like to clarify that there is a reference source.
https://twitter.com/ritou/status/1804186193372025233
Furthermore, it is not the case that the desired user information can only be obtained from the UserInfo Endpoint.
As seen in the following description, it seems perfectly fine to include it in the ID Token as well.

6 These can be requested to be returned either in the UserInfo Response as shown in Section 5.3.2, or in the ID Token as shown in Section 2.

Since the timing of acquisition and the issuing server may differ between the UserInfo Endpoint and the ID Token, it seems necessary to use them appropriately according to the use case.

Examining the Necessity of the sub Claim

We have confirmed the existence of ID Tokens and the UserInfo Endpoint. Now, let's finally dive into the main topic: the sub claim. The reason why the sub claim is mandatory in an ID Token can be understood by looking at the Successful UserInfo Response section of OpenID Connect Core 1.0.

The UserInfo Response MUST include the sub (subject) Claim.

Note: Due to the possibility of token substitution attacks (see Section 16.11), the UserInfo Response is not guaranteed to be about the End-User identified by the sub Claim of the ID Token. Therefore, the sub Claim in the UserInfo Response MUST be verified to match the sub Claim in the ID Token. If they do not match, the UserInfo Response MUST NOT be used.

The UserInfo Endpoint, which is used to retrieve user information, must always return a user identifier.
And it must be guaranteed that this user identifier belongs to the End-User.
In such cases, the sub claim of the ID Token is used.
Recalling why the sub claim of the ID Token can be used, the ID Token was a token containing information about the authenticated End-User.
If so, it can be said that in OpenID Connect, only the ID Token guarantees that the user is the authenticated End-User.
On the other hand, the UserInfo Response retrieves user information via an access token.
Since access tokens do not guarantee that they hold information about the authenticated End-User, it cannot be guaranteed that the UserInfo Response is always fetching information about the authenticated End-User.
Therefore, by comparing the obtained user information with the sub claim of the ID Token, we ensure that it is the authenticated End-User.
From the above, we can understand why the sub attribute is necessary in an ID Token.
Considering the existence of the UserInfo Endpoint, it makes sense that the sub claim must be mandatory in the ID Token.
I previously didn't understand the necessity of the sub claim because I had overlooked the UserInfo Endpoint, so finding this out has been very clarifying.

A Slight Doubt: Do Token Substitution Attacks Occur in the Authorization Code Flow as Well?

I have confirmed the reasons why the sub claim is necessary in an ID Token.
What was written certainly made sense to me.
However, a small question has also arisen.
That is whether a token substitution attack actually occurs when a token is obtained via the Authorization Code Flow.
Currently, by setting the state parameter and PKCE, we ensure that the access token is properly delivered to the resource owner.
If that's the case, I suspect that token substitution attacks are already mitigated in the Authorization Code Flow.
This leads me to feel that even without verifying the ID Token during the Authorization Code Flow, the user information obtained at the UserInfo Endpoint could be guaranteed to belong to the End-User.
Well, OpenID Connect isn't just about the Authorization Code Flow, and there's no harm in performing the verification, so I don't think verification is unnecessary.
However, I felt a bit curious about whether token substitution attacks occur when the Authorization Code Flow is properly implemented, so I wrote this as a side note.

Conclusion

In this post, we looked into the sub claim of the ID Token.
Initially, I started researching to resolve my doubt about whether the sub claim was necessary, but I'm glad I could deepen my understanding of the UserInfo Endpoint in the process.
There are still many things I don't fully understand, so I will continue to look into them.
Thank you for reading this far.

Discussion