Using the Persistent ID as a user identifier attribute

The Persistent ID is a privacy-preserving user identifier shared between the Identity Provider (IdP) and the Service Provider (SP). It is generated by the IdP when the user for the first time accesses a specific SP. The Persistent ID is stored in a relational database when the IdP is configured according to the SWITCHaai deployment guides. If no database is configured, the Persistent ID is computed every time anew using the predefined salt. As it is persistent, the Persistent ID remains the same for all further sessions between the same user and the same Service Provider.

The Persistent ID is a triple with the format:

<name for the source of the identifier>!<name for the intended audience of the identifier >!<opaque identifier for the principal >
Exposed as an environment variable to a web application looks like:
https://aai-logon.switch.ch/idp/shibboleth!https://attribute-viewer.aai.switch.ch/shibboleth!a6c2c4d4-08b9-4ca7-8ff9-43d83e6e1d35

Privacy: For different Service Provider, different Persistent IDs are generated for a given user. Therefore, the Persistent IDs cannot be used to correlate user data, even if several Service Providers tried to aggregate data. This results in better user privacy.

Length of the Persistent ID

According to the eduPerson schema, the length of the Persistent ID MUST NOT exceed 2306 characters (Per the SAML format definition, the identifier portion MUST NOT exceed 256 characters, and the source and audience URI values MUST NOT exceed 1024 characters). The database definition of Shibboleth IdP to store the Persistent ID consists out of three parts (local entityID <name for the source of the identifier>, peer entityID <name for the intended audience of the identifier> and the computed string called persistentId <opaque identifier for the principal >):

localEntity VARCHAR(1024) NOT NULL,
peerEntity VARCHAR(1024) NOT NULL,
persistentId VARCHAR(36) NOT NULL

Depending on the application, the Persistent ID is too long to be handled correctly. To easily shorten the Persistent ID, the peerEntity part can be skipped, as it is the entity ID of the Service Provider that protects the application. The shortening can be easily configured in the attribute-map.xml of the Shibboleth configuration.

formatter="$NameQualifier!$Name" 
instead of 
formatter="$NameQualifier!$SPNameQualifier!$Name" 
This
https://aai-logon.switch.ch/idp/shibboleth!https://attribute-viewer.aai.switch.ch/shibboleth!a6c2c4d4-08b9-4ca7-8ff9-43d83e6e1d35
gets reduced to
https://aai-logon.switch.ch/idp/shibboleth!a6c2c4d4-08b9-4ca7-8ff9-43d83e6e1d35

Prefer Persistent ID over Unique ID

When AAI-enabling a new application, we recommend using the Persistent ID as identifier attribute instead of the Unique ID. All IdPs within the SWITCHaai Federation are supporting it. So, there is no reason to wait. The sooner an application starts collecting Unique ID-Persistent ID pairs, the sooner you will be ready for migration. "Targeted ID/Persistent ID" needs to be defined as a required attribute in the Ressource Registry. To clarify: the attribute eduPersonTargetedID contains the same value as Persistent ID. It is recommended to use Persistent ID as it is always available in the environment.

Account Checking Two tools can check whether an account still exists or not, using the Persistent ID of a user. Although the tool is called Account Checker, it also allows to update the Account Information as it returns the current values. This feature can only be used with the Persistent ID, not with the Unique ID. And only if the identity provider stores the Persistent ID in a mysql database as it is deployed in SWITCHaai.
SP Resolvertest
AccountChecker

How to convert an Application from using Unique ID to Persistent ID

Collecting IDs

In order to collect uniqueID and Persistent ID for a later migration to the Persistent ID, the Apache webserver can be used to log both attributes and store them in a mapping file. The logging can be enabled only for the AAI-protected locations of the webserver, e.g. only the protected part where the user actually is required to have a Shibboleth session and not for the public webserver. Customlog helps to log only certain requests. In the example only requests where the url contains the word "secure".
With %{FOOBAR}e, the contents of the environment variable FOOBAR can be logged, that that this environment variable gets logged to a file. Basically, the log file needs to log only the uniqueid and the persistentid id. In some cases it may be preferable to also log the name or email address which may be useful when debugging a problem after the transition to the Persistent ID. But do not forget to treat this personal information adequately, e.g. regarding protection and deletion. Finally, it must be defined how the log file should be named, where it should be saved and what environment variable should be set for logging.

  SetEnvIf Request_URI secure mypath
  LogFormat " %{persistent-id}e %{uniqueID}e \"%{surname}e\" \"%{givenName}e\" \
  %{%Y%m%d-%H:%M:%S}t" uid-persistent 
  CustomLog ${APACHE_LOG_DIR}/migration.log uid-persistent env=mypath

The above example is very comprehensive. It would be sufficient to log only Persistent ID and Unique ID, but logging the user's name and the time the user has logged into your Application is useful for debugging purposes as well. Start collecting the Persistent IDs straight from the beginning if there are applications behind which are still using Unique IDs as identifiers.

On most UNIX-based operating systems the directory /etc/logrotate.d contains a file for configuring the logrotation of the apache logfiles. Disabling the logrotation for the migration.log ensures that all data is stored in a single file.

Migrate

Before switching the identification attribute from Unique ID to Persistent ID, it should be ensured that the Unique ID/Persistent ID pairs were collected for most of the active users. Therefore collect the Persistent IDs for an extended period of time (e.g. a few months). The collection process can be accelerated by asking the users to login to the application if they still need their account. At a certain point, the transition can be started even though some Persistent IDs might still be missing. It is best to create a transition script that displays the Unique IDs not converted. Maybe you have never collected all Unique IDs as some users will never login. Probably the application has stored some information as email address or name. In this case you could try to motivate the users to log in or you try to get the person’s identity via the Home Organizations IDP.

Clean-up

Make sure to disable the Apache logging and deleted the collected data after a successful migration. It is also the opportunity to tidy up the application and remove unused accounts.

A very simple migration example

Find below the code of a small PHP program using a file to store the number of visits of a user. All it does is to display the number of visists, first name and last name of a user by reading the user attributes from the environment. Below is a small example application to perform the transition of the user database file. It searches for the corresponding Persistent ID in the migration.log and replaces the Unique IDs with the Persistent IDs. To finish the transition, change the environment variable in the application that identifies the user.

$identifier = $_SERVER["uniqueID"];
becomes
$identifier = $_SERVER["persistent-id"];

In shibboleth2.xml define the order of precedence for REMOTE_USER (only the first available value will be used) if the application uses this variable as user identification.

REMOTE_USER="persistent-id targeted-id uniqueID"

If there is no way to change the application code, it still is possible to rename attribute names in the attribute-map.xml. However, remember that this special configuration might be overwritten when Shibboleth is updated.