Introduction to Kanidm

Kanidm is an identity management server, acting as an authority on account information, authentication and authorisation within a technical environment.

The intent of the Kanidm project is to:

  • Provide a single source of truth for authorisation and authentication.
  • Make system, network, application and web authentication easy and accessible.
  • Secure and reliable by default, aiming for the highest levels of quality and stability.

Why do I want Kanidm?

Whether you work in a business, a volunteer organisation, or are an enthusiast who manages their personal services, you need methods of authenticating and identifying to your systems. These systems also need to determine what authorisation and privileges you have while accessing them.

We've probably all been in workplaces where you end up with multiple accounts on various systems - one for a workstation, different SSH keys for different tasks, maybe some shared account passwords. Not only is it difficult for people to manage all these different credentials and what they have access to, but it also means that sometimes these credentials have more access or privilege than they require. In the worst case this can lead to weak credentials (corpname123 is a common example) or credentials that are disclosed via git repos.

Kanidm solves this problem by acting as a central authority of accounts in your organisation. This allows each account to associate many devices and strong credentials with different privileges. An example of how this looks:

Kanidm Use Case Diagram Kanidm Use Case Diagram

A key design goal is that you authenticate with your device in some manner, and then your device will continue to authenticate you in the future. Each of these different types of credentials, from SSH keys, application passwords, RADIUS passwords and others, are "things your device knows" or "things your device has". Each credential has limited capability and scope, and can only access that exact service or resource.

This helps improve security; a compromise of the service or the network transmission does not grant you unlimited access to your account and all its privileges. As the credentials are specific to a device, if a device is compromised you can revoke its associated credentials. If a specific service is compromised, only the credentials for that service need to be revoked.

Due to this model, and the design of Kanidm to centre the device and to have more per-service credentials, workflows and automation are added or designed to reduce human handling.

For Developers

Looking for the rustdoc documentation for the libraries themselves? Click here!

Supported Features

This is a list of supported features and standards within Kanidm.

Authorisation

Cryptography

Data Import

Database

LDAP

OAuth2 / OpenID Connect

RADIUS

Replication

Unix Client

  • PAM/nsswitch client authentication

Webauthn

Evaluation Quickstart

This section will guide you through a quick setup of Kanidm for evaluation. It's recommended that for a production deployment you follow the steps in the installation chapter instead as there are a number of security considerations you should be aware of for production deployments.

Requirements

  • docker or podman
  • x86_64 cpu supporting x86_64_v2 OR aarch64 cpu supporting neon

Get the software

docker pull kanidm/server:latest

Configure the container

docker volume create kanidmd
docker create --name kanidmd \
  -p 443:8443 \
  -p 636:3636 \
  -v kanidmd:/data \
  kanidm/server:latest

Configure the server

Create server.toml

#   The webserver bind address. Requires TLS certificates.
#   If the port is set to 443 you may require the
#   NET_BIND_SERVICE capability.
#   Defaults to "127.0.0.1:8443"
bindaddress = "[::]:8443"
#
#   The read-only ldap server bind address. Requires
#   TLS certificates. If set to 636 you may require
#   the NET_BIND_SERVICE capability.
#   Defaults to "" (disabled)
# ldapbindaddress = "[::]:3636"
#
#   HTTPS requests can be reverse proxied by a loadbalancer.
#   To preserve the original IP of the caller, these systems
#   will often add a header such as "Forwarded" or
#   "X-Forwarded-For". If set to true, then this header is
#   respected as the "authoritative" source of the IP of the
#   connected client. If you are not using a load balancer
#   then you should leave this value as default.
#   Defaults to false
# trust_x_forward_for = false
#
#   The path to the kanidm database.
db_path = "/data/kanidm.db"
#
#   If you have a known filesystem, kanidm can tune the 
#   database page size to match. Valid choices are:
#   [zfs, other]
#   If you are unsure about this leave it as the default
#   (other). After changing this
#   value you must run a vacuum task.
#   - zfs:
#     * sets database pagesize to 64k. You must set
#       recordsize=64k on the zfs filesystem.
#   - other:
#     * sets database pagesize to 4k, matching most
#       filesystems block sizes.
# db_fs_type = "zfs"
#
#   The number of entries to store in the in-memory cache.
#   Minimum value is 256. If unset
#   an automatic heuristic is used to scale this.
#   You should only adjust this value if you experience
#   memory pressure on your system.
# db_arc_size = 2048
#
#   TLS chain and key in pem format. Both must be present
tls_chain = "/data/chain.pem"
tls_key = "/data/key.pem"
#
#   The log level of the server. May be one of info, debug, trace
#
#   NOTE: this can be overridden by the environment variable
#   `KANIDM_LOG_LEVEL` at runtime
#   Defaults to "info"
# log_level = "info"
#
#   The DNS domain name of the server. This is used in a
#   number of security-critical contexts
#   such as webauthn, so it *must* match your DNS
#   hostname. It is used to create
#   security principal names such as `william@idm.example.com`
#   so that in a (future) trust configuration it is possible
#   to have unique Security Principal Names (spns) throughout
#   the topology.
#
#   ⚠️  WARNING ⚠️
#
#   Changing this value WILL break many types of registered
#   credentials for accounts including but not limited to
#   webauthn, oauth tokens, and more.
#   If you change this value you *must* run
#   `kanidmd domain_name_change` immediately after.
domain = "idm.example.com"
#
#   The origin for webauthn. This is the url to the server,
#   with the port included if it is non-standard (any port
#   except 443). This must match or be a descendent of the
#   domain name you configure above. If these two items are
#   not consistent, the server WILL refuse to start!
#   origin = "https://idm.example.com"
origin = "https://idm.example.com:8443"
#
[online_backup]
#   The path to the output folder for online backups
path = "/data/kanidm/backups/"
#   The schedule to run online backups (see https://crontab.guru/)
#   every day at 22:00 UTC (default)
schedule = "00 22 * * *"
#    four times a day at 3 minutes past the hour, every 6th hours
# schedule = "03 */6 * * *"
#   We also support non standard cron syntax, with the following format:
#   sec  min   hour   day of month   month   day of week   year
#   (it's very similar to the standard cron syntax, it just allows to specify the seconds
#   at the beginning and the year at the end)
#   Number of backups to keep (default 7)
# versions = 7

Add configuration to container

docker cp server.toml kanidmd:/data/server.toml

Generate evaluation certificates

docker run --rm -i -t -v kanidmd:/data \
  kanidm/server:latest \
  kanidmd cert-generate

Start Kanidmd Container

docker start kanidmd

Recover the Admin Role Passwords

The admin account is used to configure Kanidm itself.

docker exec -i -t kanidmd \
  kanidmd recover-account admin

The idm_admin account is used to manage persons and groups.

docker exec -i -t kanidmd \
  kanidmd recover-account idm_admin

Setup the client configuration

# ~/.config/kanidm

uri = "https://localhost:443"
verify_ca = false

Check you can login

kanidm login --name idm_admin

Create an account for yourself

kanidm person create <your username> <Your Displayname>

Setup your account credentials

kanidm person credential create-reset-token <your username>

Then follow the presented steps.

What next?

You can now follow the steps in the administration section

Installing the Server

This chapter will describe how to plan, configure, deploy and update your Kanidm instances.

Choosing a Domain Name

Through out this book, Kanidm will make reference to a "domain name". This is your chosen DNS domain name that you intend to use for Kanidm. Choosing this domain name however is not simple as there are a number of considerations you need to be careful of.

Kani Warning Take note!
Incorrect choice of the domain name may have security impacts on your Kanidm instance, not limited to credential phishing, theft, session leaks and more. It is critical you follow the advice in this chapter.

Considerations

Domain Ownership

It is recommended you use a domain name within a domain that you own. While many examples list example.com throughout this book, it is not recommended to use this outside of testing. Another example of risky domain to use is local. While it seems appealing to use these, because you do not have unique ownership of these domains, if you move your machine to a foreign network, it is possible you may leak credentials or other cookies to these domains. TLS in a majority of cases can and will protect you from such leaks however, but it should not always be relied upon as a sole line of defence.

Failure to use a unique domain you own, may allow DNS hijacking or other credential leaks in some circumstances.

Subdomains

Due to how web browsers and webauthn work, any matching domain name or subdomain of an effective domain may have access to cookies within a browser session. An example is that host.a.example.com has access to cookies from a.example.com and example.com.

For this reason your kanidm host (or hosts) should be on a unique subdomain, with no other services registered under that subdomain. For example, consider idm.example.com as a subdomain for exclusive use of kanidm. This is inverse to Active Directory which often has it's domain name selected to be the parent (toplevel) domain (example.com).

Failure to use a unique subdomain may allow cookies to leak to other entities within your domain, and may allow webauthn to be used on entities you did not intend for which may or may not lead to some phishing scenarioes.

Examples

Good Domain Names

Consider we own kanidm.com. If we were to run geographical instances, and have testing environments the following domain and hostnames could be used.

Production Domain Name

  • origin: https://idm.kanidm.com
  • domain name: idm.kanidm.com
  • host names: australia.idm.kanidm.com, newzealand.idm.kanidm.com

This allows us to have named geographical instances such as https://australia.idm.kanidm.com which still works with webauthn and cookies which are transferable between instances.

It is critical no other hosts are registered under this domain name.

Testing Domain Name

  • origin: https://idm.dev.kanidm.com
  • domain name: idm.dev.kanidm.com
  • host names: australia.idm.dev.kanidm.com, newzealand.idm.dev.kanidm.com

Note that due to the name being idm.dev.kanidm.com vs idm.kanidm.com, the testing instance is not a subdomain of production, meaning the cookies and webauthn tokens can NOT be transferred between them. This provides proper isolation between the instances.

Bad Domain Names

idm.local - This is a bad example as .local is an mDNS domain name suffix which means that client machines if they visit another network may try to contact idm.local believing they are on their usual network. If TLS certificate verification were disabled, this would allow leaking of credentials.

kanidm.com - This is bad because the use of the top level domain means that any subdomain can access the cookies issued by kanidm.com, effectively leaking them to all other hosts.

Second instance overlap:

Production

  • origin: https://idm.kanidm.com
  • domain name: idm.kanidm.com

Testing

  • origin: https://dev.idm.kanidm.com
  • domain name: dev.idm.kanidm.com

While the production instance has a valid and well defined subdomain that doesn't conflict, because the dev instance is a subdomain of production, it allows production cookies to leak to dev. Dev instances may have weaker security controls in some cases which can then allow compromise of the production instance.

Preparing for your Deployment

Software Installation Method

We provide docker images for the server components. They can be found at:

You can fetch these by running the commands:

docker pull kanidm/server:latest
docker pull kanidm/radius:latest
docker pull kanidm/tools:latest

NOTE Our preferred deployment method is in containers, and this documentation assumes you're running in docker. Kanidm will alternately run as a daemon/service, and server builds are available for multiple platforms if you prefer this option. You may need to adjust the example commands throughout this document to suit your desired server type if you choose not to use containers.

Development Version

If you are interested in running the latest code from development, you can do this by changing the docker tag to kanidm/server:devel instead. Many people run the development version, and it is extremely reliable, but occasional rough patches may occur. If you report issues, we will make every effort to help resolve them.

System Requirements

CPU

Kanidm relies on modern CPU optimisations for many operations. As a result your cpu must be either:

  • x86_64 supporting x86_64_v2 operations.
  • aarch64 supporting neon_v8 operations.

Older or unsupported CPUs may raise a SIGILL (Illegal Instruction) on hardware that is not supported by the project.

Kani Alert Tip
You can check your cpu flags on Linux with the command `lscpu`

Memory

Kanidm extensively uses memory caching, trading memory consumption to improve parallel throughput. You should expect to see 64KB of ram per entry in your database, depending on cache tuning and settings.

Disk

You should expect to use up to 8KB of disk per entry you plan to store. At an estimate 10,000 entry databases will consume 40MB, 100,000 entry will consume 400MB.

For best performance, you should use non-volatile memory express (NVME), or other Flash storage media.

TLS

You'll need a volume where you can place configuration, certificates, and the database:

docker volume create kanidmd

You should have a chain.pem and key.pem in your kanidmd volume. The reason for requiring Transport Layer Security (TLS, which replaces the deprecated Secure Sockets Layer, SSL) is explained in why tls. In summary, TLS is our root of trust between the server and clients, and a critical element of ensuring a secure system.

The key.pem should be a single PEM private key, with no encryption. The file content should be similar to:

-----BEGIN PRIVATE KEY-----
MII...<base64>
-----END PRIVATE KEY-----

The chain.pem is a series of PEM formatted certificates. The leaf certificate, or the certificate that matches the private key should be the first certificate in the file. This should be followed by the series of intermediates, and the final certificate should be the CA root. For example:

-----BEGIN CERTIFICATE-----
<leaf certificate>
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
<intermediate certificate>
-----END CERTIFICATE-----
[ more intermediates if needed ]
-----BEGIN CERTIFICATE-----
<ca/croot certificate>
-----END CERTIFICATE-----

HINT If you are using Let's Encrypt the provided files "fullchain.pem" and "privkey.pem" are already correctly formatted as required for Kanidm.

You can validate that the leaf certificate matches the key with the command:

# ECDSA
openssl ec -in key.pem -pubout | openssl sha1
1c7e7bf6ef8f83841daeedf16093bda585fc5bb0
openssl x509 -in chain.pem -noout -pubkey | openssl sha1
1c7e7bf6ef8f83841daeedf16093bda585fc5bb0

# RSA
# openssl rsa -noout -modulus -in key.pem | openssl sha1
d2188932f520e45f2e76153fbbaf13f81ea6c1ef
# openssl x509 -noout -modulus -in chain.pem | openssl sha1
d2188932f520e45f2e76153fbbaf13f81ea6c1ef

If your chain.pem contains the CA certificate, you can validate this file with the command:

openssl verify -CAfile chain.pem chain.pem

If your chain.pem does not contain the CA certificate (Let's Encrypt chains do not contain the CA for example) then you can validate with this command.

openssl verify -untrusted fullchain.pem fullchain.pem

NOTE Here "-untrusted" flag means a list of further certificates in the chain to build up to the root is provided, but that the system CA root should be consulted. Verification is NOT bypassed or allowed to be invalid.

If these verifications pass you can now use these certificates with Kanidm. To put the certificates in place you can use a shell container that mounts the volume such as:

docker run --rm -i -t -v kanidmd:/data -v /my/host/path/work:/work opensuse/leap:latest \
    /bin/sh -c "cp /work/* /data/"

OR for a shell into the volume:

docker run --rm -i -t -v kanidmd:/data opensuse/leap:latest /bin/sh

Configuring the Server

In this section we will configure your server and create its container instance.

Configuring server.toml

You need a configuration file in the volume named server.toml. (Within the container it should be /data/server.toml) The following is a commented example configuration.

The full options and explanations are in the kanidmd_core::config::ServerConfig for your particular build.

#   The webserver bind address. Requires TLS certificates.
#   If the port is set to 443 you may require the
#   NET_BIND_SERVICE capability.
#   Defaults to "127.0.0.1:8443"
bindaddress = "[::]:8443"
#
#   The read-only ldap server bind address. Requires
#   TLS certificates. If set to 636 you may require
#   the NET_BIND_SERVICE capability.
#   Defaults to "" (disabled)
# ldapbindaddress = "[::]:3636"
#
#   HTTPS requests can be reverse proxied by a loadbalancer.
#   To preserve the original IP of the caller, these systems
#   will often add a header such as "Forwarded" or
#   "X-Forwarded-For". If set to true, then this header is
#   respected as the "authoritative" source of the IP of the
#   connected client. If you are not using a load balancer
#   then you should leave this value as default.
#   Defaults to false
# trust_x_forward_for = false
#
#   The path to the kanidm database.
db_path = "/data/kanidm.db"
#
#   If you have a known filesystem, kanidm can tune the 
#   database page size to match. Valid choices are:
#   [zfs, other]
#   If you are unsure about this leave it as the default
#   (other). After changing this
#   value you must run a vacuum task.
#   - zfs:
#     * sets database pagesize to 64k. You must set
#       recordsize=64k on the zfs filesystem.
#   - other:
#     * sets database pagesize to 4k, matching most
#       filesystems block sizes.
# db_fs_type = "zfs"
#
#   The number of entries to store in the in-memory cache.
#   Minimum value is 256. If unset
#   an automatic heuristic is used to scale this.
#   You should only adjust this value if you experience
#   memory pressure on your system.
# db_arc_size = 2048
#
#   TLS chain and key in pem format. Both must be present
tls_chain = "/data/chain.pem"
tls_key = "/data/key.pem"
#
#   The log level of the server. May be one of info, debug, trace
#
#   NOTE: this can be overridden by the environment variable
#   `KANIDM_LOG_LEVEL` at runtime
#   Defaults to "info"
# log_level = "info"
#
#   The DNS domain name of the server. This is used in a
#   number of security-critical contexts
#   such as webauthn, so it *must* match your DNS
#   hostname. It is used to create
#   security principal names such as `william@idm.example.com`
#   so that in a (future) trust configuration it is possible
#   to have unique Security Principal Names (spns) throughout
#   the topology.
#
#   ⚠️  WARNING ⚠️
#
#   Changing this value WILL break many types of registered
#   credentials for accounts including but not limited to
#   webauthn, oauth tokens, and more.
#   If you change this value you *must* run
#   `kanidmd domain_name_change` immediately after.
domain = "idm.example.com"
#
#   The origin for webauthn. This is the url to the server,
#   with the port included if it is non-standard (any port
#   except 443). This must match or be a descendent of the
#   domain name you configure above. If these two items are
#   not consistent, the server WILL refuse to start!
#   origin = "https://idm.example.com"
origin = "https://idm.example.com:8443"
#
[online_backup]
#   The path to the output folder for online backups
path = "/data/kanidm/backups/"
#   The schedule to run online backups (see https://crontab.guru/)
#   every day at 22:00 UTC (default)
schedule = "00 22 * * *"
#    four times a day at 3 minutes past the hour, every 6th hours
# schedule = "03 */6 * * *"
#   We also support non standard cron syntax, with the following format:
#   sec  min   hour   day of month   month   day of week   year
#   (it's very similar to the standard cron syntax, it just allows to specify the seconds
#   at the beginning and the year at the end)
#   Number of backups to keep (default 7)
# versions = 7

This example is located in examples/server_container.toml.

Kani Warning Warning!
You MUST set the `domain` name correctly, aligned with your `origin`, else the server may refuse to start or some features (e.g. webauthn, oauth) may not work correctly!

Check the configuration is valid

You should test your configuration is valid before you proceed. This defaults to using -c /data/server.toml.

docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd configtest

Run the Server

Now we can run the server so that it can accept connections. This defaults to using -c /data/server.toml.

docker run -p 443:8443 -v kanidmd:/data kanidm/server:latest

Using the NET_BIND_SERVICE capability

If you plan to run without using docker port mapping or some other reverse proxy, and your bindaddress or ldapbindaddress port is less than 1024 you will need the NET_BIND_SERVICE in docker to allow these port binds. You can add this with --cap-add in your docker run command.

docker run --cap-add NET_BIND_SERVICE --network [host OR macvlan OR ipvlan] \
    -v kanidmd:/data kanidm/server:latest
Kani Alert Tip
However you choose to run your server, you should document and keep note of the docker run / create command you chose to start the instance. This will be used in the upgrade procedure.

Default Admin Accounts

Now that the server is running, you can initialise the default admin accounts. There are two parallel admin accounts that have seperate functions. admin which manages Kanidm's configuration, and idm_admin which manages accounts and groups in Kanidm.

You should consider these as "break-glass" accounts. They exist to allow the server to be bootstrapped and accessed in emergencies. They are not intended for day-to-day use.

These commands will generate a new random password for the admin accounts. You must run the commands as the same user as the kanidmd process or as root. This defaults to using -c /data/server.toml.

docker exec -i -t <container name> \
  kanidmd recover-account admin
#  new_password: "xjgG4..."
docker exec -i -t <container name> \
  kanidmd recover-account idm_admin
#  new_password: "9Eux1..."

Security Hardening

Kanidm ships with a secure-by-default configuration, however that is only as strong as the environment that Kanidm operates in. This means the security of your container environment and server is extremely important when running Kanidm.

This chapter will detail a number of warnings and security practices you should follow to ensure that Kanidm operates in a secure environment.

The main server is a high-value target for a potential attack, as Kanidm serves as the authority on identity and authorisation in a network. Compromise of the Kanidm server is equivalent to a full-network take over, also known as "game over".

The unixd resolver is also a high value target as it can be accessed to allow unauthorised access to a server, to intercept communications to the server, or more. This also must be protected carefully.

For this reason, Kanidm's components must be secured and audited. Kanidm avoids many classic attacks by being developed in a memory safe language, but risks still exist in the operating environment.

Startup Warnings

At startup Kanidm will warn you if the environment it is running in is suspicious or has risks. For example:

kanidmd server -c /tmp/server.toml
WARNING: permissions on /tmp/server.toml may not be secure. Should be readonly to running uid. This could be a security risk ...
WARNING: /tmp/server.toml has 'everyone' permission bits in the mode. This could be a security risk ...
WARNING: /tmp/server.toml owned by the current uid, which may allow file permission changes. This could be a security risk ...
WARNING: permissions on ../insecure/ca.pem may not be secure. Should be readonly to running uid. This could be a security risk ...
WARNING: permissions on ../insecure/cert.pem may not be secure. Should be readonly to running uid. This could be a security risk ...
WARNING: permissions on ../insecure/key.pem may not be secure. Should be readonly to running uid. This could be a security risk ...
WARNING: ../insecure/key.pem has 'everyone' permission bits in the mode. This could be a security risk ...
WARNING: DB folder /tmp has 'everyone' permission bits in the mode. This could be a security risk ...

Each warning highlights an issue that may exist in your environment. It is not possible for us to prescribe an exact configuration that may secure your system. This is why we only present possible risks and you must make informed decisions on how to resolve them.

Should be Read-only to Running UID

Files, such as configuration files, should be read-only to the UID of the Kanidm daemon. If an attacker is able to gain code execution, they are then unable to modify the configuration to write, or to over-write files in other locations, or to tamper with the systems configuration.

This can be prevented by changing the files ownership to another user, or removing "write" bits from the group.

'everyone' Permission Bits in the Mode

This means that given a permission mask, "everyone" or all users of the system can read, write or execute the content of this file. This may mean that if an account on the system is compromised the attacker can read Kanidm content and may be able to further attack the system as a result.

This can be prevented by removing "everyone: execute bits from parent directories containing the configuration, and removing "everyone" bits from the files in question.

Owned by the Current UID, Which May Allow File Permission Changes

File permissions in UNIX systems are a discretionary access control system, which means the named UID owner is able to further modify the access of a file regardless of the current settings. For example:

[william@amethyst 12:25] /tmp > touch test
[william@amethyst 12:25] /tmp > ls -al test
-rw-r--r--  1 william  wheel  0 29 Jul 12:25 test
[william@amethyst 12:25] /tmp > chmod 400 test
[william@amethyst 12:25] /tmp > ls -al test
-r--------  1 william  wheel  0 29 Jul 12:25 test
[william@amethyst 12:25] /tmp > chmod 644 test
[william@amethyst 12:26] /tmp > ls -al test
-rw-r--r--  1 william  wheel  0 29 Jul 12:25 test

Notice that even though the file was set to "read only" to william, and no permission to any other users, user "william" can change the bits to add write permissions back or permissions for other users.

This can be prevent by making the file owner a different UID than the running process for kanidm.

A Secure Example

Between these three issues it can be hard to see a possible strategy to secure files, however one way exists - group read permissions. The most effective method to secure resources for Kanidm is to set configurations to:

[william@amethyst 12:26] /etc/kanidm > ls -al server.toml
-r--r-----   1 root           kanidm      212 28 Jul 16:53 server.toml

The Kanidm server should be run as "kanidm:kanidm" with the appropriate user and user private group created on your system. This applies to unixd configuration as well.

For the database your data folder should be:

[root@amethyst 12:38] /data/kanidm > ls -al .
total 1064
drwxrwx---   3 root     kanidm      96 29 Jul 12:38 .
-rw-r-----   1 kanidm   kanidm  544768 29 Jul 12:38 kanidm.db

This means 770 root:kanidm. This allows Kanidm to create new files in the folder, but prevents Kanidm from being able to change the permissions of the folder. Because the folder does not have "everyone" mode bits, the content of the database is secure because users can now cd/read from the directory.

Configurations for clients, such as /etc/kanidm/config, should be secured with read-only permissions and owned by root:

[william@amethyst 12:26] /etc/kanidm > ls -al config
-r--r--r--    1 root  root    38 10 Jul 10:10 config

This file should be "everyone"-readable, which is why the bits are defined as such.

Running as Non-root in docker

The commands provided in this book will run kanidmd as "root" in the container to make the onboarding smoother. However, this is not recommended in production for security reasons.

You should allocate unique UID and GID numbers for the service to run as on your host system. In this example we use 1000:1000

You will need to adjust the permissions on the /data volume to ensure that the process can manage the files. Kanidm requires the ability to write to the /data directory to create the database files. This UID/GID number should match the above. You could consider the following changes to help isolate these changes:

docker run --rm -i -t -v kanidmd:/data opensuse/leap:latest /bin/sh
mkdir /data/db/
chown 1000:1000 /data/db/
chmod 750 /data/db/
sed -i -e "s/db_path.*/db_path = \"\/data\/db\/kanidm.db\"/g" /data/server.toml
chown root:root /data/server.toml
chmod 644 /data/server.toml

Note that the example commands all run inside the docker container.

You can then use this to run the Kanidm server in docker with a user:

docker run --rm -i -t -u 1000:1000 -v kanidmd:/data kanidm/server:latest /sbin/kanidmd ...

HINT You need to use the UID or GID number with the -u argument, as the container can't resolve usernames from the host system.

Minimum TLS key lengths

We enforce a minimum RSA and ECDSA key sizes. If your key is insufficently large, the server will refuse to start and inform you of this.

Currently accepted key sizes are minimum 2048 bit RSA and 224 bit ECDSA.

Updating the Server

Docker doesn't follow a "traditional" method of updates. Rather you remove the old version of the container and recreate it with a newer version. This document will help walk you through that process.

Kani Alert Tip
You should have documented and preserved your kanidm container create / run command from the server preparation guide. If not, you'll need to use "docker inspect" to work out how to recreate these parameters.

Preserving the Previous Image

You may wish to preserve the previous image before updating. This is useful if an issue is encountered in upgrades.

docker tag kanidm/server:latest kanidm/server:<DATE>
docker tag kanidm/server:latest kanidm/server:2022-10-24

Update your Image

Pull the latest version of Kanidm.

docker pull kanidm/server:latest
docker pull kanidm/radius:latest
docker pull kanidm/tools:latest

Perform a backup

See backup and restore

Update your Instance

Kani Warning WARNING
Downgrades are not possible. It is critical you know how to backup and restore before you proceed with this step.

Docker updates operate by deleting and recreating the container. All state that needs to be preserved is within your storage volume.

docker stop <previous instance name>

You can test that your configuration is correct with the new version, and the server should correctly start.

docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd configtest

You can then follow through with the upgrade by running the create / run command with your existing volume.

docker run [Your Arguments Here] -v kanidmd:/data \
    OTHER_CUSTOM_OPTIONS \
    kanidm/server:latest

Once you confirm the upgrade is successful you can delete the previous instance

docker rm <previous instance name>

If you encounter an issue you can revert to the previous version.

docker stop <new instance name>
docker start <previous instance name>

If you deleted the previous instance, you can recreate it from your preserved tag instead.

docker run [Your Arguments Here] -v kanidmd:/data \
    OTHER_CUSTOM_OPTIONS \
    kanidm/server:<DATE>

If the server from your previous version fails to start, you will need to restore from backup.

Client tools

To interact with Kanidm as an administrator, you'll need to use our command line tools. If you haven't installed them yet, install them now.

Kanidm configuration

You can configure kanidm to help make commands simpler by modifying ~/.config/kanidm or /etc/kanidm/config.

uri = "https://idm.example.com"
ca_path = "/path/to/ca.pem"

The full configuration reference is in the definition of KanidmClientConfig.

Once configured, you can test this with:

kanidm self whoami --name anonymous

Session Management

To authenticate as a user (for use with the command line), you need to use the login command to establish a session token.

kanidm login --name USERNAME
kanidm login --name admin
kanidm login -D USERNAME
kanidm login -D admin

Once complete, you can use kanidm without re-authenticating for a period of time for administration.

You can list active sessions with:

kanidm session list

Sessions will expire after a period of time. To remove these expired sessions locally you can use:

kanidm session cleanup

To log out of a session:

kanidm logout --name USERNAME
kanidm logout --name admin

Installing Client Tools

NOTE Running different release versions will likely present incompatibilities. Ensure you're running matching release versions of client and server binaries. If you have any issues, check that you are running the latest version of Kanidm.

From packages

Kanidm currently is packaged for the following systems:

  • OpenSUSE Tumbleweed
  • OpenSUSE Leap 15.4/15.5/15.6
  • MacOS
  • Arch Linux
  • NixOS
  • Fedora 38
  • CentOS Stream 9

The kanidm client has been built and tested from Windows, but is not (yet) packaged routinely.

OpenSUSE Tumbleweed / Leap 15.6

Kanidm is available in Tumbleweed and Leap 15.6. You can install the clients with:

zypper ref
zypper in kanidm-clients

OpenSUSE Leap 15.4/15.5

Using zypper you can add the Kanidm leap repository with:

zypper ar -f obs://network:idm network_idm

Then you need to refresh your metadata and install the clients.

zypper ref
zypper in kanidm-clients

MacOS - Brew

Homebrew allows addition of third party repositories for installing tools. On MacOS you can use this to install the Kanidm tools.

brew tap kanidm/kanidm
brew install kanidm

Arch Linux

Kanidm on AUR

NixOS

Kanidm in NixOS

Fedora / Centos Stream

Kani Warning Take Note!
Kanidm frequently uses new Rust versions and features, however Fedora and Centos frequently are behind in Rust releases. As a result, they may not always have the latest Kanidm versions available.

Fedora has limited support through the development repository. You need to add the repository metadata into the correct directory:

# Fedora
wget https://download.opensuse.org/repositories/network:/idm/Fedora_38/network:idm.repo
# Centos Stream 9
wget https://download.opensuse.org/repositories/network:/idm/CentOS_9_Stream/network:idm.repo

You can then install with:

dnf install kanidm-clients

Tools Container

In some cases if your distribution does not have native kanidm-client support, and you can't access cargo for the install for some reason, you can use the cli tools from a docker container instead.

This is a "last resort" and we don't really recommend this for day to day usage.

echo '{}' > ~/.cache/kanidm_tokens
chmod 666 ~/.cache/kanidm_tokens
docker pull kanidm/tools:latest
docker run --rm -i -t \
    --network host \
    -v /etc/kanidm/config:/etc/kanidm/config:ro \
    -v ~/.config/kanidm:/home/kanidm/.config/kanidm:ro \
    -v ~/.cache/kanidm_tokens:/home/kanidm/.cache/kanidm_tokens \
    kanidm/tools:latest \
    /sbin/kanidm --help

If you have a ca.pem you may need to bind mount this in as required as well.

TIP You can alias the docker run command to make the tools easier to access such as:

alias kanidm="docker run ..."

Cargo

The tools are available as a cargo download if you have a rust tool chain available. To install rust you should follow the documentation for rustup. These will be installed into your home directory. To update these, re-run the install command. You will likely need to install additional development libraries, specified in the Developer Guide.

cargo install kanidm_tools

Administration Tasks

This chapter describes some of the routine administration tasks for running a Kanidm server, such as making backups and restoring from backups, testing server configuration, reindexing, verifying data consistency, and renaming your domain.

Backup and Restore

With any Identity Management (IDM) software, it's important you have the capability to restore in case of a disaster - be that physical damage or a mistake. Kanidm supports backup and restore of the database with three methods.

Method 1 - Automatic Backup

Automatic backups can be generated online by a kanidmd server instance by including the [online_backup] section in the server.toml. This allows you to run regular backups, defined by a cron schedule, and maintain the number of backup versions to keep. An example is located in examples/server.toml.

Method 2 - Manual Backup

This method uses the same process as the automatic process, but is manually invoked. This can be useful for pre-upgrade backups

To take the backup (assuming our docker environment) you first need to stop the instance:

docker stop <container name>
docker run --rm -i -t -v kanidmd:/data -v kanidmd_backups:/backup \
    kanidm/server:latest /sbin/kanidmd database backup -c /data/server.toml \
    /backup/kanidm.backup.json
docker start <container name>

You can then restart your instance. DO NOT modify the backup.json as it may introduce data errors into your instance.

To restore from the backup:

docker stop <container name>
docker run --rm -i -t -v kanidmd:/data -v kanidmd_backups:/backup \
    kanidm/server:latest /sbin/kanidmd database restore -c /data/server.toml \
    /backup/kanidm.backup.json
docker start <container name>

Method 3 - Manual Database Copy

This is a simple backup of the data volume containing the database files. Ensure you copy the whole folder, rather than individual files in the volume!

docker stop <container name>
# Backup your docker's volume folder
# cp -a /path/to/my/volume /path/to/my/backup-volume
docker start <container name>

Restoration is the reverse process where you copy the entire folder back into place.

Database Maintenance

Reindexing

In some (rare) cases you may need to reindex. Please note the server will sometimes reindex on startup as a result of the project changing its internal schema definitions. This is normal and expected - you may never need to start a reindex yourself as a result!

You only need to reindex if you add custom schema elements and you see a message in your logs such as:

Index EQUALITY name not found
Index {type} {attribute} not found

This indicates that an index of type equality has been added for name, but the indexing process has not been run. The server will continue to operate and the query execution code will correctly process the query - however it will not be the optimal method of delivering the results as we need to disregard this part of the query and act as though it's un-indexed.

Reindexing will resolve this by forcing all indexes to be recreated based on their schema definitions.

docker stop <container name>
docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd reindex -c /data/server.toml
docker start <container name>

Vacuum

Vacuuming is the process of reclaiming un-used pages from the database freelists, as well as performing some data reordering tasks that may make some queries more efficient. It is recommended that you vacuum after a reindex is performed or when you wish to reclaim space in the database file.

Vacuum is also able to change the pagesize of the database. After changing db_fs_type (which affects pagesize) in server.toml, you must run a vacuum for this to take effect:

docker stop <container name>
docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd vacuum -c /data/server.toml
docker start <container name>

Verification

The server ships with a number of verification utilities to ensure that data is consistent such as referential integrity or memberof.

Note that verification really is a last resort - the server does a lot to prevent and self-heal from errors at run time, so you should rarely if ever require this utility. This utility was developed to guarantee consistency during development!

You can run a verification with:

docker stop <container name>
docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd verify -c /data/server.toml
docker start <container name>

If you have errors, please contact the project to help support you to resolve these.

Rename the domain

There are some cases where you may need to rename the domain. You should have configured this initially in the setup, however you may have a situation where a business is changing name, merging, or other needs which may prompt this needing to be changed.

WARNING: This WILL break ALL u2f/webauthn tokens that have been enrolled, which MAY cause accounts to be locked out and unrecoverable until further action is taken. DO NOT CHANGE the domain name unless REQUIRED and have a plan on how to manage these issues.

WARNING: This operation can take an extensive amount of time as ALL accounts and groups in the domain MUST have their Security Principal Names (SPNs) regenerated. This WILL also cause a large delay in replication once the system is restarted.

You should make a backup before proceeding with this operation.

When you have a created a migration plan and strategy on handling the invalidation of webauthn, you can then rename the domain.

First, stop the instance.

docker stop <container name>

Second, change domain and origin in server.toml.

Third, trigger the database domain rename process.

docker run --rm -i -t -v kanidmd:/data \
    kanidm/server:latest /sbin/kanidmd domain rename -c /data/server.toml

Finally, you can now start your instance again.

docker start <container name>

Monitoring the platform

The monitoring design of Kanidm is still very much in its infancy - take part in the discussion at github.com/kanidm/kanidm/issues/216.

kanidmd status endpoint

kanidmd currently responds to HTTP GET requests at the /status endpoint with a JSON object of either "true" or "false". true indicates that the platform is responding to requests.

URL<hostname>/status
Example URLhttps://example.com/status
Expected responseOne of either true or false (without quotes)
Additional Headersx-kanidm-opid
Content Typeapplication/json
Cookieskanidm-session

OpenTelemetry Tracing

Configure OTLP trace exports by setting a otel_grpc_endpoint in the server configuration. This'll enable OpenTelemetry traces to be sent for observability use cases.

Troubleshooting

Max Span Size Exceeded

On startup, we run some big processes that might hit a "max trace size" in certain configurations. Grafana Tempo defaults to 5MB, which is sensible for most things, but ... 😁

Grafana Tempo config to allow larger spans:

distributor:
  receivers: 
    otlp:
      protocols:
        grpc:
          max_recv_msg_size_mib: 20

Recycle Bin

The recycle bin is a storage of deleted entries from the server. This allows recovery from mistakes for a period of time.

Kani Warning Warning!
The recycle bin is a best effort - when recovering in some cases not everything can be "put back" the way it was. Be sure to check your entries are valid once they have been revived.

Where is the Recycle Bin?

The recycle bin is stored as part of your main database - it is included in all backups and restores, just like any other data. It is also replicated between all servers.

How do Things Get Into the Recycle Bin?

Any delete operation of an entry will cause it to be sent to the recycle bin. No configuration or specification is required.

How Long Do Items Stay in the Recycle Bin?

Currently they stay up to 1 week before they are removed. This may change in the future though.

Managing the Recycle Bin

You can display all items in the Recycle Bin with:

kanidm recycle-bin list --name admin

You can show a single item with:

kanidm recycle-bin get --name admin <uuid>

An entry can be revived with:

kanidm recycle-bin revive --name admin <uuid>

Edge Cases

The recycle bin is a best effort to restore your data - there are some cases where the revived entries may not be the same as their were when they were deleted. This generally revolves around reference types such as group membership, or when the reference type includes supplemental map data such as the oauth2 scope map type.

An example of this data loss is the following steps:

add user1
add group1
add user1 as member of group1
delete user1
delete group1
revive user1
revive group1

In this series of steps, due to the way that referential integrity is implemented, the membership of user1 in group1 would be lost in this process. To explain why:

add user1
add group1
add user1 as member of group1 // refint between the two established, and memberof added
delete user1 // group1 removes member user1 from refint
delete group1 // user1 now removes memberof group1 from refint
revive user1 // re-add groups based on directmemberof (empty set)
revive group1 // no members

These issues could be looked at again in the future, but for now we think that deletes of groups is rare - we expect recycle bin to save you in "oops" moments, and in a majority of cases you may delete a group or a user and then restore them. To handle this series of steps requires extra code complexity in how we flag operations. For more, see This issue on github.

Accounts and groups

Accounts and Groups are the primary reasons for Kanidm to exist. Kanidm is optimised as a repository for these data. As a result, there are many concepts and important details to understand.

Service Accounts vs Person Accounts

Kanidm separates accounts into two types. Person accounts (or persons) are intended for use by humans that will access the system in an interactive way. Service accounts are intended for use by computers or services that need to identify themself to Kanidm. Generally a person or group of persons will be responsible for and will manage service accounts. Because of this distinction these classes of accounts have different properties and methods of authentication and management.

Groups

Groups represent a collection of entities. This generally is a collection of persons or service accounts. Groups are commonly used to assign privileges to the accounts that are members of a group. This allows easier administration over larger systems where privileges can be assigned to groups in a logical manner, and then only membership of the groups need administration, rather than needing to assign privileges to each entity directly and uniquely.

Groups may also be nested, where a group can contain another group as a member. This allows hierarchies to be created again for easier administration.

Default Accounts and Groups

Kanidm ships with a number of default service accounts and groups. This is to give you the best out-of-box experience possible, as well as supplying best practice examples related to modern Identity Management (IDM) systems.

There are two "break-glass" system administration accounts.

admin is the default service account which has privileges to configure and administer Kanidm as a whole. This account can manage access controls, schema, integrations and more. However the admin can not manage persons by default.

idm_admin is the default service account which has privileges to create persons and to manage these accounts and groups. They can perform credential resets and more.

Both the admin and the idm_admin user should NOT be used for daily activities - they exist for initial system configuration, and for disaster recovery scenarios. You should delegate permissions as required to named user accounts instead.

The majority of the builtin groups are privilege groups that provide rights over Kanidm administrative actions. These include groups for account management, person management (personal and sensitive data), group management, and more.

admin and idm_admin both inherit their privileges from these default groups. This allows you to assign persons to these roles instead.

Reauthentication and Session Privilege

Kanidm sessions have a concept of session privilege. Conceptually you can consider this like sudo on unix systems or uac on windows. This allows a session to briefly access its write permissions by reauthentication with the identical credential they logged in with.

This allows safe assignment of high privilege roles to persons since their sessions do not have access to their write privileges by default. They must reauthenticate and use their privileges within a short time window.

However, these sessions always retain their read privileges - meaning that they can still access and view high levels of data at any time without reauthentication.

In high risk environments you should still consider assigning seperate administration accounts to users if this is considered a risk.

Recovering the Initial Admin Accounts

By default the admin and idm_admin accounts have no password, and can not be accessed. They need to be "recovered" from the server that is running the kanidmd server.

You should have already recovered the admin account during your setup process. If not, refer to the server configuration chapter on how to recover these accounts.

These accounts will be used through the remainder of this document for managing the server.

Viewing Default Groups

You should take some time to inspect the default groups which are related to default roles and permissions. Each group has a description to explain its purpose. These can be viewed with:

kanidm group list --name idm_admin
kanidm group get <name>

People Accounts

A person represents a human's account in Kanidm. The majority of your users will be a person who will use this account in their daily activities. These entries may contain personally identifying information that is considered by Kanidm to be sensitive. Because of this, there are default limits to who may access these data.

Creating Person Accounts

Members of the idm_people_admins group have the privileges to create new persons in the system. By default idm_admin has this permission.

kanidm login --name idm_admin
kanidm person create demo_user "Demonstration User" --name idm_admin
kanidm person get demo_user --name idm_admin

Kanidm allows person accounts to include personally identifying attributes, such as their legal name and email address.

Initially, a person does not have these attributes. If desired, a person may be modified to have these attributes.

# Note, both the --legalname and --mail flags may be omitted
kanidm person update demo_user --legalname "initial name" --mail "initial@email.address"

You can also use anonymous to view accounts - note that you won't see certain fields due to the limits of the anonymous access control profile.

kanidm login --name anonymous
kanidm person get demo_user --name anonymous

NOTE: only members of idm_people_pii_read and idm_people_admins may read personal information by default.

Kani Warning Warning!
Persons may change their own displayname, name, and legal name at any time. You MUST NOT use these values as primary keys in external systems. You MUST use the `uuid` attribute present on all entries as an external primary key.

Account Validity

Kanidm supports accounts that are only able to authenticate between a pair of dates and times; the "valid from" and "expires" timestamps define these points in time. By default members of idm_people_admins may change these values.

The account validity can be displayed with:

kanidm person validity show demo_user --name idm_admin
user: demo_user
valid after: any time
expire: never
kanidm person validity show demo_user --name idm_admin
valid after: 2020-09-25T21:22:04+10:00
expire: 2020-09-25T01:22:04+10:00

These datetimes are stored in the server as UTC, but presented according to your local system time to aid correct understanding of when the events will occur.

You may set these time and date values in any timezone you wish (such as your local timezone), and the server will transform these to UTC. These time values are in ISO8601 format, and you should specify this as:

YYYY-MM-DDThh:mm:ssZ+-hh:mm
Year-Month-Day T hour:minutes:seconds Z +- timezone offset

Set the earliest time the account can start authenticating:

kanidm person validity begin-from demo_user '2020-09-25T11:22:04+00:00' --name idm_admin

Set the expiry or end date of the account:

kanidm person validity expire-at demo_user '2020-09-25T11:22:04+00:00' --name idm_admin

To unset or remove these values the following can be used, where any|clear means you may use either any or clear.

kanidm person validity begin-from demo_user any|clear --name idm_admin
kanidm person validity expire-at demo_user clear|epoch|now --name idm_admin

To "lock" an account, you can set the expire_at value to now or epoch. Even in the situation where the "valid from" is after the expire_at, the expire_at will be respected.

These validity settings impact all authentication functions of the account (kanidm, ldap, radius).

Allowing people accounts to change their mail attribute

By default, Kanidm allows an account to change some attributes, but not their mail address.

Adding the user to the idm_people_self_write_mail group, as shown below, allows the user to edit their own mail.

kanidm group add-members idm_people_self_write_mail_priv demo_user --name idm_admin

Authentication and Credentials

A primary job of a system like Kanidm is to manage credentials for persons. This can involve a range of operations from new user onboarding, credential resets, and self service.

Types of Credentials

Passkeys

This is the preferred method of authentication in Kanidm. Passkeys represent "all possible cryptographic" authenticators that support Webauthn. Examples of this include Yubikeys, TouchID, Windows Hello, TPM's and more.

These devices are unphishable, self contained multifactor authenticators and are considered the most secure method of authentication in Kanidm.

Kani Warning Warning!
Kanidm's definition of Passkeys may differ from that of other systems. This is because we adopted the term very early, before it has changed and evolved.

Attested Passkeys

These are the same as Passkeys, except that the device must present a cryptographic certificate or origin during registration. This allows account policy to be defined to only allow the use of certain models of authenticator. In general only FIDO2 keys or TPM's are capable of meeting attestation requirements.

Password + TOTP

This is a classic Time-based One Time Password combined with a password. Different to other systems Kanidm will prompt for the TOTP first before the password. This is to prevent drive by bruteforce against the password of the account and testing if the password is vulnerable.

While this authentication method is mostly secure, we do not advise it for high security environments due to the fact it is still possible to perform realtime phishing attacks.

Resetting Person Account Credentials

Members of the groups idm_people_admins, idm_people_on_boarding and idm_service_desk have the rights to initiate a credential reset for a person.

NOTE: If the person is a member of idm_high_privilege then these resets are not allowed. This is to prevent idm_service_desk and similar roles from privilege escalation by resetting the credentials of a higher privileged account. If a person who is a member of idm_high_privilege requires a credential reset, this must be initiated by a member of idm_people_admins.

Onboarding a New Person / Resetting Credentials

These processes are very similar. You can send a credential reset link to a user so that they can directly enroll their own credentials. To generate this link or qrcode:

kanidm person credential create-reset-token <account_id> [<time to live in seconds>]
kanidm person credential create-reset-token demo_user --name idm_admin
kanidm person credential create-reset-token demo_user 86400 --name idm_admin
# The person can use one of the following to allow the credential reset
#
# Scan this QR Code:
#
# █████████████████████████████████████████████
# █████████████████████████████████████████████
# ████ ▄▄▄▄▄ █▄██ ▀▀▀▄▀▀█ ▄▀▀▀▀▄▀▀▄█ ▄▄▄▄▄ ████
# ████ █   █ █▀   ▄▄▄▀█  █▀ ██ ▀ ▀▄█ █   █ ████
# ████ █▄▄▄█ █ █▄█  ▀   ▄███▄ ▀▄▀▄ █ █▄▄▄█ ████
# ████▄▄▄▄▄▄▄█ █▄▀▄█▄█ █▄▀▄▀▄█▄█ █▄█▄▄▄▄▄▄▄████
# ████ ▀█▀ ▀▄▄▄ ▄▄▄▄▄▄▄█▀ ▄█▀█▀  ▄▀ ▄   █▀▄████
# ████▄ █ ▀ ▄█▀█ ▀█   ▀█▄ ▀█▀ ▄█▄ █▀▄▀██▄▀█████
# ████ ▀▀▀█▀▄██▄▀█ ▄▀█▄▄█▀▄▀▀▀▀▀▄▀▀▄▄▄▀ ▄▄ ████
# ████ █▄▀ ▄▄ ▄▀▀ ▀ █▄█ ▀▀ █▀▄▄█▄   ▀  ▄ ▀▀████
# ████ █▀▄ █▄▄  █ █▀▀█▀█▄ ▀█▄█▄█▀▄▄ ▀▀ ▄▄ ▄████
# █████ ▀█▄▀▄▄▀▀ ██▀▀█▄█▄█▄█ █▀▄█ ▄█  ▄▄▀▀█████
# ████▄▄▀  ▄▄ ▀▀▄▀▀ ▄▄█ ▄ █▄ ▄▄ ▀▀▀▄▄ ▀▄▄██████
# ████▄▄▀ ▀▀▄▀▄  ▀▀▀▀█▀█▄▀▀ ▄▄▄ ▄ ▄█▀  ▄ ▄ ████
# ████▀▄  ▀▄▄█▀█▀▄ ▄██▄█▀ ▄█▀█ ▀▄ ███▄█ ▄█▄████
# ██████ ▀▄█▄██▀ ▀█▄▀ ▀▀▄ ▀▀█ ██▀█▄▄▀██  ▀▀████
# ████▄▄██▄▄▄▄  ▀▄██▀█ ███▀ ██▄▀▀█ ▄▄▄ ███ ████
# ████ ▄▄▄▄▄ █▄ ▄▄  ▀█▀ ▀▀ █▀▄▄▄▄█ █▄█ ▀▀ ▀████
# ████ █   █ █▄█▄▀  ██▀█▄ ▀█▄▀▄ ▀▀▄   ▄▄▄▀ ████
# ████ █▄▄▄█ ██▀█ ▀▄▀█▄█▄█▄▀▀▄▄ ▀ ▄▄▄█▀█  █████
# ████▄▄▄▄▄▄▄█▄█▄▄▄▄▄▄█▄█▄██▄█▄▄▄█▄██▄███▄▄████
# █████████████████████████████████████████████
# ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
#
# This link: https://localhost:8443/ui/reset?token=8qDRG-AE1qC-zjjAT-0Fkd6
# Or run this command: kanidm person credential use-reset-token 8qDRG-AE1qC-zjjAT-0Fkd6

If the user wishes you can direct them to https://idm.mydomain.name/ui/reset where they can manually enter their token value.

Once the credential reset has been committed the token is immediately invalidated and can never be used again. By default the token is valid for 1 hour. You can request a longer token validity time when creating the token. Tokens are only allowed to be valid for a maximum of 24 hours.

Resetting Credentials Directly

You can perform a password reset on the demo_user, for example as the idm_admin user, who is a default member of this group. The lines below prefixed with # are the interactive credential update interface. This allows the user to directly manage the credentials of another account.

Kani Warning Warning!
Don't use the direct credential reset to lock or invalidate an account. You should expire the account instead.
kanidm person credential update demo_user --name idm_admin
# spn: demo_user@idm.example.com
# Name: Demonstration User
# Primary Credential:
# uuid: 0e19cd08-f943-489e-8ff2-69f9eacb1f31
# generated password: set
# Can Commit: true
#
# cred update (? for help) # : pass
# New password:
# New password: [hidden]
# Confirm password:
# Confirm password: [hidden]
# success
#
# cred update (? for help) # : commit
# Do you want to commit your changes? yes
# success
kanidm login --name demo_user
kanidm self whoami --name demo_user

Credential Deletion

When a person deletes a credential, all sessions that were created by that credential are immediately logged out and invalidated.

Reauthentication / Privilege Access Mode

To allow for longer lived sessions in Kanidm, by default sessions are issued in a "privilege capable" but read-only mode. In order to access privileges for a short time, you must re-authenticate. This re-issues your session with a small time limited read-write session internally. You can consider this to be like sudo on a unix system or UAC on windows where you reauthenticate for short periods to access higher levels of privilege.

When using a user command that requires these privileges you will be warned:

kanidm person credential update william
# Privileges have expired for william@idm.example.com - you need to re-authenticate again.

To reauthenticate

kanidm reauth -D william

NOTE During reauthentication an account must use the same credential that was used to initially authenticate to the session. The reauth flow will not allow any other credentials to be used!

Groups

Groups are a collection of other entities that exist within Kanidm.

Creating Groups

Members of idm_group_admins can create new groups. idm_admin by default has these privileges.

kanidm group create demo_group --name idm_admin
kanidm group add-members demo_group demo_user --name idm_admin
kanidm group list-members demo_group --name idm_admin

After addition, you will see a reverse link from our demo_user showing that it is now a member of the group demo_group. Kanidm makes all group membership determinations by inspecting an entry's "memberof" attribute.

kanidm person get demo_user --name idm_admin

Nested Groups

Kanidm supports groups being members of groups, allowing nested groups. These nesting relationships are shown through the "memberof" attribute on groups and accounts. This allows nested groups to reflect on accounts.

An example can be easily shown with:

kanidm group create group_1 --name idm_admin
kanidm group create group_2 --name idm_admin
kanidm person create nest_example "Nesting Account Example" --name idm_admin
kanidm group add-members group_1 group_2 --name idm_admin
kanidm group add-members group_2 nest_example --name idm_admin
kanidm person get nest_example --name anonymous

This should result in output similar to:

memberof: idm_all_persons@localhost
memberof: idm_all_accounts@localhost
memberof: group_2@localhost
memberof: group_1@localhost
name: nest_example

Delegated Administration

Kanidm supports delegated administration though the "entry managed by" field. This allows specifying a group or user account that is the "entry manager" of a group. This allows the entry manager to modify the group without the need to define new access controls.

The entry_managed_by attribute of a group may only be modified by members of idm_access_control_admins. During entry creation idm_group_admins may set entry_managed_by, but may not change it post creation.

kanidm group create <NAME> [ENTRY_MANAGED_BY]
kanidm group create delegated_access_group demo_group --name idm_admin
kanidm group get delegated_access_group --name idm_admin

Now, as our demo_user is a member of demo_group they have delegated administration of delegated_access_group.

kanidm login --name demo_user

                                note the use of demo_user --\
                                                            v
kanidm group add-members delegated_access_group admin --name demo_user
kanidm group get delegated_access_group --name demo_user

Service Accounts

Creating Service Accounts

Members of idm_service_account_admins have the privileges to create new service accounts. By default idm_admin has this access.

When creating a service account you must delegate entry management to another group or account. This allows other users or groups to update the service account.

The entry_managed_by attribute of a service account may be created and modified by members of idm_service_account_admins.

NOTE: If a service account is a member of idm_high_privilege its entry_managed_by may only be modified by members of idm_access_control_admins

kanidm service-account create <ACCOUNT_ID> <display-name> <entry-managed-by>
kanidm service-account create demo_service "Demonstration Service" demo_group --name idm_admin
kanidm service-account get demo_service --name idm_admin

By delegating the administration of this service account to demo_group this allows our demo_user to administer the service account.

Using API Tokens with Service Accounts

Service accounts can have API tokens generated and associated with them. These tokens can be used for identification of the service account, and for granting extended access rights where the service account may previously have not had the access. Additionally service accounts can have expiry times and other auditing information attached.

To show API tokens for a service account:

kanidm service-account api-token status --name ENTRY_MANAGER ACCOUNT_ID
kanidm service-account api-token status --name demo_user demo_service

By default API tokens are issued to be "read only", so they are unable to make changes on behalf of the service account they represent. To generate a new read only API token with optional expiry time:

kanidm service-account api-token generate --name ENTRY_MANAGER ACCOUNT_ID LABEL [EXPIRY]
kanidm service-account api-token generate --name demo_user demo_service "Test Token"
kanidm service-account api-token generate --name demo_user demo_service "Test Token" 2020-09-25T11:22:02+10:00

If you wish to issue a token that is able to make changes on behalf of the service account, you must add the --rw flag during the generate command. It is recommended you only add --rw when the API token is performing writes to Kanidm.

kanidm service-account api-token generate --name ENTRY_MANAGER ACCOUNT_ID LABEL [EXPIRY] --rw
kanidm service-account api-token generate --name demo_user demo_service "Test Token" --rw
kanidm service-account api-token generate --name demo_user demo_service "Test Token" 2020-09-25T11:22:02+10:00 --rw

To destroy (revoke) an API token you will need its token id. This can be shown with the "status" command.

kanidm service-account api-token status --name ENTRY_MANAGER ACCOUNT_ID
kanidm service-account api-token status --name demo_user demo_service
kanidm service-account api-token destroy --name ENTRY_MANAGER ACCOUNT_ID TOKEN_ID
kanidm service-account api-token destroy --name demo_user demo_service 4de2a4e9-e06a-4c5e-8a1b-33f4e7dd5dc7

API Tokens with LDAP

API tokens can also be used to gain extended search permissions with LDAP. To do this you can bind with a dn of dn=token and provide the API token as the password.

ldapwhoami -H ldaps://URL -x -D "dn=token" -w "TOKEN"
ldapwhoami -H ldaps://idm.example.com -x -D "dn=token" -w "..."
# u: demo_service@idm.example.com

Anonymous Account

Within Kanidm there is a single "special" account. This is the anonymous service account. This allows clients without any credentials to perform limited read actions against Kanidm.

The anonymous account is primarily used by stateless unix clients to read account and group information.

Authentication

Even though anonymous does not have credentials it still must authenticate to establish a session to access Kanidm. To achieve this there is a special anonymous credential method. Anonymous is the only account that may use this credential method.

OAuth2 / OIDC

Anonymous is a service account which prevents it from using OAuth2/OIDC to access other applications.

Access

By default anonymous has limited access to information in Kanidm. Anonymous may read the following data.

NOTE: The Name attribute is the user's public username. This is different to their private and sensitive LegalName attribute.

People

  • Name
  • DisplayName
  • MemberOf
  • Uuid
  • GidNumber
  • LoginShell
  • SshPublicKey

Groups

  • Name
  • Member
  • DynMember
  • GidNumber

Disabling the Anonymous Account

The anonymous is like any other and can be expired to prevent its use. See the account validity section

When disabled, this will prevent stateless unix clients from authenticating to Kanidm.

Account Policy

Account Policy defines the security requirements that accounts must meet and influences users sessions.

Policy is defined on groups so that membership of a group influences the security of its members. This allows you to express that if you can access a system or resource, then the account must also meet the policy requirements.

All account policy settings may be managed by members of idm_account_policy_admins. This is assigned to idm_admin by default.

Default Account Policy

A default Account Policy is applied to idm_all_accounts. This provides the defaults that influence all accounts in Kanidm. This policy can be modified the same as any other group's policy.

Enforced Attributes

Auth Expiry

The maximum length in seconds that an authentication session may exist for.

Password Minimum Length

The minimum length for passwords (if they are allowed).

Privilege Expiry

The maximum length in seconds that privileges will exist after reauthentication for to a read/write session.

Webauthn Attestation

The list of certificate authorities and device aaguids that must be used by members of this policy. This allows limiting devices to specific models.

To generate this list you should use fido-mds-tool.

Policy Resolution

When an account is affected by multiple policies, the strictest component from each policy is applied. This can mean that two policies interact and make their combination stricter than their parts.

valueordering
auth-expirysmallest value
password-minimum-lengthlargest value
privilege-expirysmallest value
webauthn-attestation-ca-listintersection of equal values

Example Resolution

If we had two policies where the first defined:

auth-session: 86400
password-minimum-length: 10
privilege-expiry: 600
webauthn-attestation-ca-list: [ "yubikey 5ci", "yubikey 5fips" ]

And the second

auth-session: 3600
password-minimum-length: 15
privilege-expiry: 3600
webauthn-attestation-ca-list: [ "yubikey 5fips", "feitian epass" ]

As the value of auth-session from the second is smaller we would take that. We would take the smallest value of privilege-expiry from the first. We would take the largest value of password-minimum-length. From the intersection of the webauthn attestation CA lists we would take only the elements that are in both. This leaves:

auth-session: 3600
password-minimum-length: 15
privilege-expiry: 600
webauthn-attestation-ca-list: [ "yubikey 5fips" ]

Enabling Account Policy

Account Policy is enabled on a group with the command:

kanidm group account-policy enable <group name>
kanidm group account-policy enable my_admin_group

Setting Maximum Session Time

The auth-session value influences the maximum time in seconds that an authenticated session can exist. After this time, the user must reauthenticate.

This value provides a difficult balance - forcing frequent re-authentications can frustrate and annoy users. However extremely long sessions allow a stolen or disclosed session token/device to read data for an extended period. Due to Kanidm's read/write separation this mitigates the risk of disclosed sessions as they can only read data, not write it.

To set the maximum authentication session time

kanidm group account-policy auth-expiry <group name> <seconds>
kanidm group account-policy auth-expiry my_admin_group 86400

Setting Minimum Password Length

The password-minimum-length value defines the character length of passwords that are acceptable. There are no other tunables for passwords in account policy. Other settings such as complexity, symbols, numbers and so on, have been proven to not matter in any real world attacks.

To set this value:

kanidm group account-policy password-minimum-length <group name> <length>
kanidm group account-policy password-minimum-length my_admin_group 12

Setting Maximum Privilege Time

The privilege-expiry time defines how long a session retains its write privileges after a reauthentication. After this time, the session returns to read-only mode.

To set the maximum privilege time

kanidm group account-policy privilege-expiry <group name> <seconds>
kanidm group account-policy privilege-expiry my_admin_group 900

Setting Webauthn Attestation CA Lists

The list should be generated with fido-mds-tool. This will emit JSON that can be directly used with Kanidm.

kanidm group account-policy webauthn-attestation-ca-list <group name> <attestation ca list json>
kanidm group account-policy webauthn-attestation-ca-list idm_all_persons '{"cas":{"D6E4b4Drh .... }'

NOTE: fido-mds-tool is available in the kanidm:tools container.

Global Settings

There are a small number of account policy settings that are set globally rather than on a per group basis.

Denied Names

Users of Kanidm can change their name at any time. However, there are some cases where you may wish to deny some name values from being usable. This can be due to conflicting system account names or to exclude insulting or other abusive terms.

To achieve this you can set names to be in the denied-name list:

kanidm system denied-names append <name> [<name> ...]

You can display the currently denied names with:

kanidm system denied-names show

To allow a name to be used again it can be removed from the list:

kanidm system denied-names remove <name> [<name> ...]

Password Quality

Kanidm enforces that all passwords are checked by the library "zxcvbn". This has a large number of checks for password quality. It also provides constructive feedback to users on how to improve their passwords if they are rejected.

Some things that zxcvbn looks for is use of the account name or email in the password, common passwords, low entropy passwords, dates, reverse words and more.

This library can not be disabled - all passwords in Kanidm must pass this check.

Password Badlisting

This is the process of configuring a list of passwords to exclude from being able to be used. This is especially useful if a specific business has been notified of compromised accounts, allowing you to maintain a list of customised excluded passwords.

The other value to this feature is being able to badlist common passwords that zxcvbn does not detect, or from other large scale password compromises.

By default we ship with a preconfigured badlist that is updated over time as new password breach lists are made available.

The password badlist by default is append only, meaning it can only grow, but will never remove passwords previously considered breached.

You can display the current badlist with:

kanidm system pw-badlist show

You can update your own badlist with:

kanidm system pw-badlist upload "path/to/badlist" [...]

Multiple bad lists can be listed and uploaded at once. These are preprocessed to identify and remove passwords that zxcvbn and our password rules would already have eliminated. That helps to make the bad list more efficient to operate over at run time.

Password Rotation

Kanidm will never support this "anti-feature". Password rotation encourages poor password hygiene and is not shown to prevent any attacks - rather it significantly weakens password security.

POSIX Accounts and Groups

Kanidm has features that enable its accounts and groups to be consumed on POSIX-like machines, such as Linux, FreeBSD, or others. Both service accounts and person accounts can be used on POSIX systems.

Notes on POSIX Features

Many design decisions have been made in the POSIX features of Kanidm that are intended to make distributed systems easier to manage and client systems more secure.

UID and GID Numbers

In Kanidm there is no difference between a UID and a GID number. On most UNIX systems a user will create all files with a primary user and group. The primary group is effectively equivalent to the permissions of the user. It is very easy to see scenarios where someone may change the account to have a shared primary group (ie allusers), but without changing the umask on all client systems. This can cause users' data to be compromised by any member of the same shared group.

To prevent this, many systems create a "user private group", or UPG. This group has the GID number matching the UID of the user, and the user sets their primary group ID to the GID number of the UPG.

As there is now an equivalence between the UID and GID number of the user and the UPG, there is no benefit in separating these values. As a result Kanidm accounts only have a GID number, which is also considered to be its UID number as well. This has the benefit of preventing the accidental creation of a separate group that has an overlapping GID number (the uniqueness attribute of the schema will block the creation).

UPG Generation

Due to the requirement that a user have a UPG for security, many systems create these as two independent items. For example in /etc/passwd and /etc/group:

# passwd
william:x:654401105:654401105::/home/william:/bin/zsh
# group
william:x:654401105:

Other systems like FreeIPA use a plugin that generates a UPG as a separate group entry on creation of the account. This means there are two entries for an account, and they must be kept in lock-step. This poses a risk of desynchronisation that can and will happen on these systems leading to possible issues.

Kanidm does neither of these. As the GID number of the user must be unique, and a user implies the UPG must exist, we can generate UPG's on-demand from the account. This has an important side effect - that you are unable to add any members to a UPG - given the nature of a user private group, this is the point.

GID Number Generation

Kanidm will have asynchronous replication as a feature between writable database servers. In this case, we need to be able to allocate stable and reliable GID numbers to accounts on replicas that may not be in continual communication.

To do this, we use the last 32 bits of the account or group's UUID to generate the GID number.

A valid concern is the possibility of duplication in the lower 32 bits. Given the birthday problem, if you have 77,000 groups and accounts, you have a 50% chance of duplication. With 50,000 you have a 20% chance, 9,300 you have a 1% chance and with 2900 you have a 0.1% chance.

We advise that if you have a site with >10,000 users you should use an external system to allocate GID numbers serially or consistently to avoid potential duplication events.

This design decision is made as most small sites will benefit greatly from the auto-allocation policy and the simplicity of its design, while larger enterprises will already have IDM or business process applications for HR/People that are capable of supplying this kind of data in batch jobs.

Enabling POSIX Attributes

Enabling POSIX Attributes on Accounts

To enable POSIX account features and IDs on an account, you require the permission idm_unix_admins. This is provided to idm_admins by default.

You can then use the following command to enable POSIX extensions on a person or service account.

kanidm [person OR service-account] posix set --name idm_admin <account_id> [--shell SHELL --gidnumber GID]

kanidm person posix set --name idm_admin demo_user
kanidm person posix set --name idm_admin demo_user --shell /bin/zsh
kanidm person posix set --name idm_admin demo_user --gidnumber 2001

kanidm service-account posix set --name idm_admin demo_account
kanidm service-account posix set --name idm_admin demo_account --shell /bin/zsh
kanidm service-account posix set --name idm_admin demo_account --gidnumber 2001

You can view the accounts POSIX token details with:

kanidm person posix show --name anonymous demo_user
kanidm service-account posix show --name anonymous demo_account

Enabling POSIX Attributes on Groups

To enable POSIX group features and IDs on an account, you require the permission idm_unix_admins. This is provided to idm_admins by default.

You can then use the following command to enable POSIX extensions:

kanidm group posix set --name idm_admin <group_id> [--gidnumber GID]
kanidm group posix set --name idm_admin demo_group
kanidm group posix set --name idm_admin demo_group --gidnumber 2001

You can view the accounts POSIX token details with:

kanidm group posix show --name anonymous demo_group

POSIX-enabled groups will supply their members as POSIX members to clients. There is no special or separate type of membership for POSIX members required.

Troubleshooting Common Issues

subuid conflicts with Podman

Due to the way that Podman operates, in some cases using the Kanidm client inside non-root containers with Kanidm accounts may fail with an error such as:

ERROR[0000] cannot find UID/GID for user NAME: No subuid ranges found for user "NAME" in /etc/subuid

This is a fault in Podman and how it attempts to provide non-root containers, when UID/GIDs are greater than 65535. In this case you may manually allocate your users GID number to be between 1000 - 65535, which may not trigger the fault.

Service Integrations

This chapter describes interfaces that Kanidm provides that allows external services and applications to integrate with and trust Kanidm as an authentication and identity provider.

PAM and nsswitch

PAM and nsswitch are the core mechanisms used by Linux and BSD clients to resolve identities from an IDM service like Kanidm into accounts that can be used on the machine for various interactive tasks.

The UNIX Daemon

Kanidm provides a UNIX daemon that runs on any client that wants to use PAM and nsswitch integration. The daemon can cache the accounts for users who have unreliable networks, or who leave the site where Kanidm is hosted. The daemon is also able to cache missing-entry responses to reduce network traffic and Kanidm server load.

Additionally, running the daemon means that the PAM and nsswitch integration libraries can be small, helping to reduce the attack surface of the machine. Similarly, a tasks daemon is available that can create home directories on first login and supports several features related to aliases and links to these home directories.

We recommend you install the client daemon from your system package manager:

# OpenSUSE
zypper in kanidm-unixd-clients
# Fedora
dnf install kanidm-unixd-clients

You can check the daemon is running on your Linux system with:

systemctl status kanidm-unixd

You can check the privileged tasks daemon is running with:

systemctl status kanidm-unixd-tasks

NOTE The kanidm_unixd_tasks daemon is not required for PAM and nsswitch functionality. If disabled, your system will function as usual. It is however strongly recommended due to the features it provides supporting Kanidm's capabilities.

Both unixd daemons use the connection configuration from /etc/kanidm/config. This is the covered in client_tools.

You can also configure some unixd-specific options with the file /etc/kanidm/unixd:

## Kanidm Unixd Service Configuration - /etc/kanidm/unixd

# Defines a set of POSIX groups where membership of any of these groups
# will be allowed to login via PAM. All POSIX users and groups can be
# resolved by nss regardless of PAM login status. This may be a group
# name, spn, or uuid.
#
# Default: empty set (no access allowed)

pam_allowed_login_groups = ["posix_group"]

# Kanidm unix will bind all cached credentials to a local Hardware Security
# Module (HSM) to prevent exfiltration and attacks against these. In addition,
# any internal private keys will also be stored in this HSM.
#
# * soft: A software hsm that encrypts all local key material
# * tpm: Use a tpm for all key storage and binding
#
# Default: soft

# hsm_type = "tpm"


# If using `hsm_type = "tpm"`, this allows configuration of the TCTI name of
# the tpm to use. For more, see: https://tpm2-tools.readthedocs.io/en/latest/man/common/tcti/
#
# You should leave this value as the default kernel resource manager.
#
# Default: device:/dev/tpmrm0

# tpm_tcti_name = "device:/dev/tpmrm0"


# Default shell for users if no value is set.
#
# Default: /bin/sh

# default_shell = "/bin/sh"


# The prefix prepended to where home directories are stored. Must end with a trailing `/`.
#
# Default: /home/

# home_prefix = "/home/"


# The attribute to use for the stable home directory path. Valid choices are
# `uuid`, `name`, `spn`.

# > **NOTICE:** All users in Kanidm can change their name (and their spn) at any time. If you change
# > `home_attr` from `uuid` you _must_ have a plan on how to manage these directory renames in your
# > system. We recommend that you have a stable ID (like the UUID), and symlinks from the name to the
# > UUID folder. Automatic support is provided for this via the unixd tasks daemon, as documented
# > here.

# Default: uuid

# home_attr = "uuid"


# The default token attribute used for generating symlinks pointing to the user's home
# directory. If set, this will become the value of the home path to nss calls. It is recommended you
# choose a "human friendly" attribute here. Valid choices are `none`, `uuid`, `name`, `spn`. Defaults
# to `spn`.
#
# Default: spn

# home_alias = "spn"


# Controls if home directories should be prepopulated with the contents of `/etc/skel`
# when first created.
#
# Default: false

# use_etc_skel = false


# Chooses which attribute is used for domain local users in presentation of the uid value.
#
# Default: spn
# NOTE: Users from a trust will always use spn.

# uid_attr_map = "spn"


# Chooses which attribute is used for domain local groups in presentation of the gid value.

# Default: spn
# NOTE: Groups from a trust will always use spn.

# gid_attr_map = "spn"


# `selinux` controls whether the `kanidm_unixd_tasks` daemon should detect and enable SELinux runtime
# compatibility features to ensure that newly created home directories are labeled correctly. This
# setting as no bearing on systems without SELinux, as these features will automatically be disabled
# if SELinux is not detected when the daemon starts. Note that `kanidm_unixd_tasks` must also be built
# with the SELinux feature flag for this functionality.
#
# Default: true

# selinux = true


# allows kanidm to "override" the content of a user or group that is defined locally when a name
# collision occurs. By default kanidm will detect when a user/group conflict with their entries from
# `/etc/passwd` or `/etc/group` and will ignore the kanidm entry. However if you want kanidm to
# override users or groups from the local system, you must list them in this field. Note that this can
# have many unexpected consequences, so it is not recommended to enable this.
#
# Default: Empty set (no overrides)

# allow_local_account_override = ["admin"]


NOTICE: All users in Kanidm can change their name (and their spn) at any time. If you change home_attr from uuid you must have a plan on how to manage these directory renames in your system. We recommend that you have a stable ID (like the UUID), and symlinks from the name to the UUID folder. Automatic support is provided for this via the unixd tasks daemon, as documented here.

NOTE: Ubuntu users please see: Why aren't snaps launching with home_alias set?

You can then check the communication status of the daemon:

kanidm-unix status

If the daemon is working, you should see:

working!

If it is not working, you will see an error message:

[2020-02-14T05:58:10Z ERROR kanidm-unix] Error ->
   Os { code: 111, kind: ConnectionRefused, message: "Connection refused" }

For more information, see the Troubleshooting section.

nsswitch

When the daemon is running you can add the nsswitch libraries to /etc/nsswitch.conf

passwd: compat kanidm
group: compat kanidm

You can create a user then enable POSIX feature on the user.

You can then test that the POSIX extended user is able to be resolved with:

getent passwd <account name>
getent passwd testunix
testunix:x:3524161420:3524161420:testunix:/home/testunix:/bin/sh

You can also do the same for groups.

getent group <group name>
getent group testgroup
testgroup:x:2439676479:testunix

HINT Remember to also create a UNIX password with something like kanidm account posix set_password --name idm_admin demo_user. Otherwise there will be no credential for the account to authenticate with.

PAM

WARNING: Modifications to PAM configuration may leave your system in a state where you are unable to login or authenticate. You should always have a recovery shell open while making changes (for example, root), or have access to single-user mode at the machine's console.

Pluggable Authentication Modules (PAM) is the mechanism a UNIX-like system that authenticates users, and to control access to some resources. This is configured through a stack of modules that are executed in order to evaluate the request, and then each module may request or reuse authentication token information.

Before You Start

You should backup your /etc/pam.d directory from its original state as you may change the PAM configuration in a way that will not allow you to authenticate to your machine.

cp -a /etc/pam.d /root/pam.d.backup

Configuration Examples

Documentation examples for the following Linux distributions are available:

SUSE / OpenSUSE

To configure PAM on SUSE you must modify four files, which control the various stages of authentication:

/etc/pam.d/common-account
/etc/pam.d/common-auth
/etc/pam.d/common-password
/etc/pam.d/common-session

IMPORTANT By default these files are symlinks to their corresponding -pc file, for example common-account -> common-account-pc. If you directly edit these you are updating the inner content of the -pc file and it WILL be reset on a future upgrade. To prevent this you must first copy the -pc files. You can then edit the files safely.

# These steps must be taken as root
rm /etc/pam.d/common-account
rm /etc/pam.d/common-auth
rm /etc/pam.d/common-session
rm /etc/pam.d/common-password
cp /etc/pam.d/common-account-pc  /etc/pam.d/common-account
cp /etc/pam.d/common-auth-pc     /etc/pam.d/common-auth
cp /etc/pam.d/common-session-pc  /etc/pam.d/common-session
cp /etc/pam.d/common-password-pc /etc/pam.d/common-password

The content should look like:

# /etc/pam.d/common-account
# Controls authorisation to this system (who may login)
account    [default=1 ignore=ignore success=ok] pam_localuser.so
account    sufficient    pam_unix.so
account    [default=1 ignore=ignore success=ok]  pam_succeed_if.so uid >= 1000 quiet_success quiet_fail
account    sufficient    pam_kanidm.so ignore_unknown_user
account    required      pam_deny.so

# /etc/pam.d/common-auth
# Controls authentication to this system (verification of credentials)
auth        required      pam_env.so
auth        [default=1 ignore=ignore success=ok] pam_localuser.so
auth        sufficient    pam_unix.so nullok try_first_pass
auth        requisite     pam_succeed_if.so uid >= 1000 quiet_success
auth        sufficient    pam_kanidm.so ignore_unknown_user
auth        required      pam_deny.so

# /etc/pam.d/common-password
# Controls flow of what happens when a user invokes the passwd command. Currently does NOT
# push password changes back to kanidm
password    [default=1 ignore=ignore success=ok] pam_localuser.so
password    required    pam_unix.so use_authtok nullok shadow try_first_pass
password    [default=1 ignore=ignore success=ok]  pam_succeed_if.so uid >= 1000 quiet_success quiet_fail
password    required    pam_kanidm.so

# /etc/pam.d/common-session
# Controls setup of the user session once a successful authentication and authorisation has
# occurred.
session optional    pam_systemd.so
session required    pam_limits.so
session optional    pam_unix.so try_first_pass
session optional    pam_umask.so
session [default=1 ignore=ignore success=ok] pam_succeed_if.so uid >= 1000 quiet_success quiet_fail
session optional    pam_kanidm.so
session optional    pam_env.so

WARNING: Ensure that pam_mkhomedir or pam_oddjobd are not present in any stage of your PAM configuration, as they interfere with the correct operation of the Kanidm tasks daemon.

Fedora / CentOS

WARNING: Kanidm currently has no support for SELinux policy - this may mean you need to run the daemon with permissive mode for the unconfined_service_t daemon type. To do this run: semanage permissive -a unconfined_service_t. To undo this run semanage permissive -d unconfined_service_t.

You may also need to run audit2allow for sshd and other types to be able to access the UNIX daemon sockets.

These files are managed by authselect as symlinks. You can either work with authselect, or remove the symlinks first.

Without authselect

If you just remove the symlinks:

Edit the content.

# /etc/pam.d/password-auth
auth        required                                     pam_env.so
auth        required                                     pam_faildelay.so delay=2000000
auth        [default=1 ignore=ignore success=ok]         pam_usertype.so isregular
auth        [default=1 ignore=ignore success=ok]         pam_localuser.so
auth        sufficient                                   pam_unix.so nullok try_first_pass
auth        [default=1 ignore=ignore success=ok]         pam_usertype.so isregular
auth        sufficient                                   pam_kanidm.so ignore_unknown_user
auth        required                                     pam_deny.so

account     sufficient                                   pam_unix.so
account     sufficient                                   pam_localuser.so
account     sufficient                                   pam_usertype.so issystem
account     sufficient                                   pam_kanidm.so ignore_unknown_user
account     required                                     pam_permit.so

password    requisite                                    pam_pwquality.so try_first_pass local_users_only
password    sufficient                                   pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient                                   pam_kanidm.so
password    required                                     pam_deny.so

session     optional                                     pam_keyinit.so revoke
session     required                                     pam_limits.so
-session    optional                                     pam_systemd.so
session     [success=1 default=ignore]                   pam_succeed_if.so service in crond quiet use_uid
session     required                                     pam_unix.so
session     optional                                     pam_kanidm.so

-

# /etc/pam.d/system-auth
auth        required                                     pam_env.so
auth        required                                     pam_faildelay.so delay=2000000
auth        sufficient                                   pam_fprintd.so
auth        [default=1 ignore=ignore success=ok]         pam_usertype.so isregular
auth        [default=1 ignore=ignore success=ok]         pam_localuser.so
auth        sufficient                                   pam_unix.so nullok try_first_pass
auth        [default=1 ignore=ignore success=ok]         pam_usertype.so isregular
auth        sufficient                                   pam_kanidm.so ignore_unknown_user
auth        required                                     pam_deny.so

account     sufficient                                   pam_unix.so
account     sufficient                                   pam_localuser.so
account     sufficient                                   pam_usertype.so issystem
account     sufficient                                   pam_kanidm.so ignore_unknown_user
account     required                                     pam_permit.so

password    requisite                                    pam_pwquality.so try_first_pass local_users_only
password    sufficient                                   pam_unix.so sha512 shadow nullok try_first_pass use_authtok
password    sufficient                                   pam_kanidm.so
password    required                                     pam_deny.so

session     optional                                     pam_keyinit.so revoke
session     required                                     pam_limits.so
-session    optional                                     pam_systemd.so
session     [success=1 default=ignore]                   pam_succeed_if.so service in crond quiet use_uid
session     required                                     pam_unix.so
session     optional                                     pam_kanidm.so

With authselect

To work with authselect:

You will need to create a new profile.

First run the following command:

authselect create-profile kanidm -b sssd

A new folder, /etc/authselect/custom/kanidm, should be created. Inside that folder, create or overwrite the following three files: nsswitch.conf, password-auth, system-auth. password-auth and system-auth should be the same as above. nsswitch should be modified for your use case. A working example looks like this:

passwd: compat kanidm sss files systemd
group: compat kanidm sss files systemd
shadow:     files
hosts:      files dns myhostname
services:   sss files
netgroup:   sss files
automount:  sss files

aliases:    files
ethers:     files
gshadow:    files
networks:   files dns
protocols:  files
publickey:  files
rpc:        files

Then run:

authselect select custom/kanidm

to update your profile.

Troubleshooting PAM/nsswitch

Check POSIX-status of Group and Configuration

If authentication is failing via PAM, make sure that a list of groups is configured in /etc/kanidm/unixd:

pam_allowed_login_groups = ["example_group"]

Check the status of the group with kanidm group posix show example_group. If you get something similar to the following example:

> kanidm group posix show example_group
Using cached token for name idm_admin
Error -> Http(500, Some(InvalidAccountState("Missing class: account && posixaccount OR group && posixgroup")),
    "b71f137e-39f3-4368-9e58-21d26671ae24")

POSIX-enable the group with kanidm group posix set example_group. You should get a result similar to this when you search for your group name:

> kanidm group posix show example_group
[ spn: example_group@kanidm.example.com, gidnumber: 3443347205 name: example_group, uuid: b71f137e-39f3-4368-9e58-21d26671ae24 ]

Also, ensure the target user is in the group by running:

>  kanidm group list_members example_group

Increase Logging

For the unixd daemon, you can increase the logging with:

systemctl edit kanidm-unixd.service

And add the lines:

[Service]
Environment="RUST_LOG=kanidm=debug"

Then restart the kanidm-unixd.service.

The same pattern is true for the kanidm-unixd-tasks.service daemon.

To debug the pam module interactions add debug to the module arguments such as:

auth sufficient pam_kanidm.so debug

Check the Socket Permissions

Check that the /var/run/kanidm-unixd/sock has permissions mode 777, and that non-root readers can see it with ls or other tools.

Ensure that /var/run/kanidm-unixd/task_sock has permissions mode 700, and that it is owned by the kanidm unixd process user.

Verify that You Can Access the Kanidm Server

You can check this with the client tools:

kanidm self whoami --name anonymous

Ensure the Libraries are Correct

You should have:

/usr/lib64/libnss_kanidm.so.2
/usr/lib64/security/pam_kanidm.so

The exact path may change depending on your distribution, pam_unixd.so should be co-located with pam_kanidm.so. Look for it with the find command:

find /usr/ -name 'pam_unix.so'

For example, on a Debian machine, it's located in /usr/lib/x86_64-linux-gnu/security/.

Increase Connection Timeout

In some high-latency environments, you may need to increase the connection timeout. We set this low to improve response on LANs, but over the internet this may need to be increased. By increasing the conn_timeout, you will be able to operate on higher latency links, but some operations may take longer to complete causing a degree of latency.

By increasing the cache_timeout, you will need to refresh less often, but it may result in an account lockout or group change until cache_timeout takes effect. Note that this has security implications:

# /etc/kanidm/unixd
# Seconds
conn_timeout = 8
# Cache timeout
cache_timeout = 60

Invalidate or Clear the Cache

You can invalidate the kanidm_unixd cache with:

kanidm-unix cache-invalidate

You can clear (wipe) the cache with:

kanidm-unix cache-clear

There is an important distinction between these two - invalidated cache items may still be yielded to a client request if the communication to the main Kanidm server is not possible. For example, you may have your laptop in a park without wifi.

Clearing the cache, however, completely wipes all local data about all accounts and groups. If you are relying on this cached (but invalid) data, you may lose access to your accounts until other communication issues have been resolved.

Home directories are not created via SSH

Ensure that UsePAM yes is set in sshd_config. Without this the pam session module won't be triggered which prevents the background task being completed.

SSSD

SSSD is an alternative PAM and nsswitch provider that is commonly available on Linux.

Kani Warning WARNING
SSSD should be considered a "last resort". If possible, always use the native Kanidm pam and nsswitch tools instead.

Limitations

SSSD has many significant limitations compared to Kanidm's native PAM and nsswitch provider.

Performance

Kanidm's native provider outperforms SSSD significantly for both online and offline user resolving and operations. Because of this, SSSD can cause higher load on the Kanidm server due to its design limitations.

Features

SSSD is not able to access all of the features of Kanidm, limiting the integration options available to you.

Security

By default Kanidm uses state of the art cryptographic methods with configurable TPM binding of cached local credentials. SSSD uses significantly weaker methods to cache passwords. This means that you should not be caching credentials with SSSD, limiting deployment flexibility.

In addition, Kanidm's providers are written in Rust rather than C, meaning they have less surface area for attack and compromise. These providers have been through multiple security audits performed by the SUSE product security teams.

Support

If you choose to use the SSSD provider the Kanidm project will only provide "best effort" for compatibility and issue resolution.

Configuration

An example configuration for SSSD is provided.

# Example configuration for SSSD to resolve accounts via Kanidm
#
# This should always be a "last resort". If possible you should always use the
# kanidm pam and nsswitch resolver as these will give you a better and more
# reliable setup.
#
# Changing the values of this config is not recommended.
#
# Support for environments using SSSD is "best effort".

# Setup for ssh keys
# Inside /etc/ssh/sshd_config add the lines:
#   AuthorizedKeysCommand /usr/bin/sss_ssh_authorizedkeys %u
#   AuthorizedKeysCommandUser nobody
# You can test with the command: sss_ssh_authorizedkeys <username>

[sssd]
services = nss, pam, ssh
config_file_version = 2

domains = ldap

[nss]
homedir_substring = /home

[domain/ldap]
# Uncomment this for more verbose logging.
# debug_level=3

id_provider = ldap
auth_provider = ldap
access_provider = ldap
chpass_provider = ldap
ldap_schema = rfc2307bis
ldap_search_base = o=idm

# Your URI must be LDAPS. Kanidm does not support StartTLS.
ldap_uri = ldaps://idm.example.com

# These allow SSSD to resolve user primary groups, which in Kanidm are implied by
# the existence of the user. Ensure you change the search base to your ldap_search_base.
ldap_group_object_class = object
ldap_group_search_base = o=idm?subtree?(|(objectClass=posixAccount)(objectClass=posixGroup))

# To use cacert dir, place *.crt files in this path then run:
# /usr/bin/openssl rehash /etc/openldap/certs
# or (for older versions of openssl)
# /usr/bin/c_rehash /etc/openldap/certs
# ldap_tls_cacertdir = /etc/openldap/certs

# Path to the cacert
# ldap_tls_cacert = /etc/openldap/certs/ca.crt

# Only users who match this filter can login and authorise to this machine. Note
# that users who do NOT match, will still have their uid/gid resolve, but they
# can't login.
#
# Note that because of how Kanidm presents group names, this value SHOULD be an SPN
ldap_access_filter = (memberof=idm_all_accounts@idm.example.com)

# Set the home dir override. Kanidm does not support configuration of homedirs as an
# attribute, and will use the uid number of the account. This is because users can
# change their uid at anytime, so you must have home directories configured in a stable
# way that does not change.
#
# Beware, than SSSD will incorrectly treat this value as a signed integer rather than unsigned
# so some users will have a -uidnumber instead.
override_homedir = /home/%U

# This prevents an issue where SSSD incorrectly attempts to recursively walk all
# entries in Kanidm.
#
# ⚠️  NEVER CHANGE THIS VALUE ⚠️
ignore_group_members = False

# Disable caching of credentials by SSSD. SSSD uses less secure local password storage
# mechanisims, and is a risk for credential disclosure.
#
# ⚠️  NEVER CHANGE THIS VALUE ⚠️
cache_credentials = False

SSH Key Distribution

To support SSH authentication securely to a large set of hosts running SSH, we support distribution of SSH public keys via the Kanidm server. Both persons and service accounts support SSH public keys on their accounts.

Configuring Accounts

To view the current SSH public keys on accounts, you can use:

kanidm person|service-account \
    ssh list-publickeys --name <login user> <account to view>
kanidm person|service-account \
    ssh list-publickeys --name idm_admin william

All users by default can self-manage their SSH public keys. To upload a key, a command like this is the best way to do so:

kanidm person|service-account \
    ssh add-publickey --name william william 'test-key' "`cat ~/.ssh/id_ecdsa.pub`"

To remove (revoke) an SSH public key, delete them by the tag name:

kanidm person|service-account ssh delete-publickey --name william william 'test-key'

Security Notes

As a security feature, Kanidm validates all public keys to ensure they are valid SSH public keys. Uploading a private key or other data will be rejected. For example:

kanidm person|service-account ssh add-publickey --name william william 'test-key' "invalid"
Enter password:
  ... Some(SchemaViolation(InvalidAttributeSyntax)))' ...

Server Configuration

Public Key Caching Configuration

If you have kanidm_unixd running, you can use it to locally cache SSH public keys. This means you can still SSH into your machines, even if your network is down, you move away from Kanidm, or some other interruption occurs.

The kanidm_ssh_authorizedkeys command is part of the kanidm-unix-clients package, so should be installed on the servers. It communicates to kanidm_unixd, so you should have a configured PAM/nsswitch setup as well.

You can test this is configured correctly by running:

kanidm_ssh_authorizedkeys <account name>

If the account has SSH public keys you should see them listed, one per line.

To configure servers to accept these keys, you must change their /etc/ssh/sshd_config to contain the lines:

PubkeyAuthentication yes
UsePAM yes
AuthorizedKeysCommand /usr/sbin/kanidm_ssh_authorizedkeys %u
AuthorizedKeysCommandUser nobody

Restart sshd, and then attempt to authenticate with the keys.

It's highly recommended you keep your client configuration and sshd_configuration in a configuration management tool such as salt or ansible.

NOTICE: With a working SSH key setup, you should also consider adding the following sshd_config options as hardening.

PermitRootLogin no
PasswordAuthentication no
PermitEmptyPasswords no
GSSAPIAuthentication no
KerberosAuthentication no

Direct Communication Configuration

In this mode, the authorised keys commands will contact Kanidm directly.

NOTICE: As Kanidm is contacted directly there is no SSH public key cache. Any network outage or communication loss may prevent you accessing your systems. You should only use this version if you have a requirement for it.

The kanidm_ssh_authorizedkeys_direct command is part of the kanidm-clients package, so should be installed on the servers.

To configure the tool, you should edit /etc/kanidm/config, as documented in clients

You can test this is configured correctly by running:

kanidm_ssh_authorizedkeys_direct -D anonymous <account name>

If the account has SSH public keys you should see them listed, one per line.

To configure servers to accept these keys, you must change their /etc/ssh/sshd_config to contain the lines:

PubkeyAuthentication yes
UsePAM yes
AuthorizedKeysCommand /usr/bin/kanidm_ssh_authorizedkeys_direct -D anonymous %u
AuthorizedKeysCommandUser nobody

Restart sshd, and then attempt to authenticate with the keys.

It's highly recommended you keep your client configuration and sshd_configuration in a configuration management tool such as salt or ansible.

OAuth2

OAuth is a web authorisation protocol that allows "single sign on". It's key to note OAuth only provides authorisation, as the protocol in its default forms do not provide identity or authentication information. All that Oauth2 provides is information that an entity is authorised for the requested resources.

OAuth can tie into extensions allowing an identity provider to reveal information about authorised sessions. This extends OAuth from an authorisation only system to a system capable of identity and authorisation. Two primary methods of this exist today: RFC7662 token introspection, and OpenID connect.

How Does OAuth2 Work?

A user wishes to access a service (resource, resource server). The resource server does not have an active session for the client, so it redirects to the authorisation server (Kanidm) to determine if the client should be allowed to proceed, and has the appropriate permissions (scopes) for the requested resources.

The authorisation server checks the current session of the user and may present a login flow if required. Given the identity of the user known to the authorisation sever, and the requested scopes, the authorisation server makes a decision if it allows the authorisation to proceed. The user is then prompted to consent to the authorisation from the authorisation server to the resource server as some identity information may be revealed by granting this consent.

If successful and consent given, the user is redirected back to the resource server with an authorisation code. The resource server then contacts the authorisation server directly with this code and exchanges it for a valid token that may be provided to the user's browser.

The resource server may then optionally contact the token introspection endpoint of the authorisation server about the provided OAuth token, which yields extra metadata about the identity that holds the token from the authorisation. This metadata may include identity information, but also may include extended metadata, sometimes referred to as "claims". Claims are information bound to a token based on properties of the session that may allow the resource server to make extended authorisation decisions without the need to contact the authorisation server to arbitrate.

It's important to note that OAuth2 at its core is an authorisation system which has layered identity-providing elements on top.

Resource Server

This is the server that a user wants to access. Common examples could be Nextcloud, a wiki, or something else. This is the system that "needs protecting" and wants to delegate authorisation decisions to Kanidm.

It's important for you to know how your resource server supports OAuth2. For example, does it support RFC 7662 token introspection or does it rely on OpenID connect for identity information?

In general Kanidm requires that your resource server supports:

  • HTTP basic authentication to the authorisation server
  • PKCE S256 code verification
  • OIDC only - JWT ES256 for token signatures

Kanidm will expose its OAuth2 APIs at the following URLs:

  • user auth url: https://idm.example.com/ui/oauth2
  • api auth url: https://idm.example.com/oauth2/authorise
  • token url: https://idm.example.com/oauth2/token
  • rfc7662 token introspection url: https://idm.example.com/oauth2/token/introspect
  • rfc7009 token revoke url: https://idm.example.com/oauth2/token/revoke

Oauth2 Server Metadata - you need to substitute your OAuth2 :client_id: in the following urls:

  • Oauth2 issuer uri: https://idm.example.com/oauth2/openid/:client_id:/
  • Oauth2 rfc8414 discovery: https://idm.example.com/oauth2/openid/:client_id:/.well-known/oauth-authorization-server

OpenID Connect discovery - you need to substitute your OAuth2 :client_id: in the following urls:

  • OpenID connect issuer uri: https://idm.example.com/oauth2/openid/:client_id:/
  • OpenID connect discovery: https://idm.example.com/oauth2/openid/:client_id:/.well-known/openid-configuration

For manual OpenID configuration:

  • OpenID connect userinfo: https://idm.example.com/oauth2/openid/:client_id:/userinfo
  • token signing public key: https://idm.example.com/oauth2/openid/:client_id:/public_key.jwk

Scope Relationships

For an authorisation to proceed, the resource server will request a list of scopes, which are unique to that resource server. For example, when a user wishes to login to the admin panel of the resource server, it may request the "admin" scope from Kanidm for authorisation. But when a user wants to login, it may only request "access" as a scope from Kanidm.

As each resource server may have its own scopes and understanding of these, Kanidm isolates scopes to each resource server connected to Kanidm. Kanidm has two methods of granting scopes to accounts (users).

The first is scope mappings. These provide a set of scopes if a user is a member of a specific group within Kanidm. This allows you to create a relationship between the scopes of a resource server, and the groups/roles in Kanidm which can be specific to that resource server.

For an authorisation to proceed, all scopes requested by the resource server must be available in the final scope set that is granted to the account.

The second is supplemental scope mappings. These function the same as scope maps where membership of a group provides a set of scopes to the account. However these scopes are NOT consulted during authorisation decisions made by Kanidm. These scopes exists to allow optional properties to be provided (such as personal information about a subset of accounts to be revealed) or so that the resource server may make it's own authorisation decisions based on the provided scopes.

This use of scopes is the primary means to control who can access what resources. These access decisions can take place either on Kanidm or the resource server.

For example, if you have a resource server that always requests a scope of "read", then users with scope maps that supply the read scope will be allowed by Kanidm to proceed to the resource server. Kanidm can then provide the supplementary scopes into provided tokens, so that the resource server can use these to choose if it wishes to display UI elements. If a user has a supplemental "admin" scope, then that user may be able to access an administration panel of the resource server. In this way Kanidm is still providing the authorisation information, but the control is then exercised by the resource server.

Configuration

Create the Kanidm Configuration

After you have understood your resource server requirements you first need to configure Kanidm. By default members of system_admins or idm_hp_oauth2_manage_priv are able to create or manage OAuth2 resource server integrations.

You can create a new resource server with:

kanidm system oauth2 create <name> <displayname> <origin>
kanidm system oauth2 create nextcloud "Nextcloud Production" https://nextcloud.example.com

You can create a scope map with:

kanidm system oauth2 update-scope-map <name> <kanidm_group_name> [scopes]...
kanidm system oauth2 update-scope-map nextcloud nextcloud_admins admin
Kani Warning WARNING
If you are creating an OpenID Connect (OIDC) resource server you MUST provide a scope map named openid. Without this, OpenID Connect clients WILL NOT WORK!

HINT OpenID connect allows a number of scopes that affect the content of the resulting authorisation token. If one of the following scopes are requested by the OpenID client, then the associated claims may be added to the authorisation token. It is not guaranteed that all of the associated claims will be added.

  • profile - (name, family_name, given_name, middle_name, nickname, preferred_username, profile, picture, website, gender, birthdate, zoneinfo, locale, and updated_at)
  • email - (email, email_verified)
  • address - (address)
  • phone - (phone_number, phone_number_verified)

You can create a supplemental scope map with:

kanidm system oauth2 update-sup-scope-map <name> <kanidm_group_name> [scopes]...
kanidm system oauth2 update-sup-scope-map nextcloud nextcloud_admins admin

Once created you can view the details of the resource server.

kanidm system oauth2 get nextcloud
---
class: oauth2_resource_server
class: oauth2_resource_server_basic
class: object
displayname: Nextcloud Production
oauth2_rs_basic_secret: hidden
oauth2_rs_name: nextcloud
oauth2_rs_origin: https://nextcloud.example.com
oauth2_rs_token_key: hidden

You can see "oauth2_rs_basic_secret" with:

kanidm system oauth2 show-basic-secret nextcloud
---
<secret>

Configure the Resource Server

On your resource server, you should configure the client ID as the oauth2_rs_name from Kanidm, and the password to be the value shown in oauth2_rs_basic_secret. Ensure that the code challenge/verification method is set to S256.

You should now be able to test authorisation.

Resetting Resource Server Security Material

In the case of disclosure of the basic secret, or some other security event where you may wish to invalidate a resource servers active sessions/tokens, you can reset the secret material of the server with:

kanidm system oauth2 reset-secrets

Each resource server has unique signing keys and access secrets, so this is limited to each resource server.

Custom Claim Maps

Some OIDC clients may consume custom claims from an id token for access control or other policy decisions. Each custom claim is a key:values set, where there can be many values associated to a claim name. Different applications may expect these values to be formatted (joined) in different ways.

Claim values are mapped based on membership to groups. When an account is a member of multiple groups that would recieve the same claim, the values of these maps are merged.

To create or update a claim map on a client:

kanidm system oauth2 update-claim-map <name> <claim_name> <kanidm_group_name> [values]...
kanidm system oauth2 update-claim-map nextcloud account_role nextcloud_admins admin login ...

To change the join strategy for a claim name. Valid strategies are csv (comma separated value), ssv (space separated value) and array (a native json array). The default strategy is array.

kanidm system oauth2 update-claim-map-join <name> <claim_name> [csv|ssv|array]
kanidm system oauth2 update-claim-map-join nextcloud account_role csv
# Example claim formats
# csv
claim: "value_a,value_b"

# ssv
claim: "value_a value_b"

# array
claim: ["value_a", "value_b"]

To delete a group from a claim map

kanidm system oauth2 delete-claim-map <name> <claim_name> <kanidm_group_name>
kanidm system oauth2 delete-claim-map nextcloud account_role nextcloud_admins

Public Client Configuration

Some applications are unable to provide client authentication. A common example is single page web applications that act as the OAuth2 client and its corresponding webserver that is the resource server. In this case the SPA is unable to act as a confidential client since the basic secret would need to be embedded in every client.

Another common example is native applications that use a redirect to localhost. These can't have a client secret embedded, so must act as public clients.

Public clients for this reason require PKCE to bind a specific browser session to its OAuth2 exchange. PKCE can not be disabled for public clients for this reason.

To create an OAuth2 public resource server:

kanidm system oauth2 create-public <name> <displayname> <origin>
kanidm system oauth2 create-public mywebapp "My Web App" https://webapp.example.com

To allow localhost redirection

kanidm system oauth2 enable-localhost-redirects <name>
kanidm system oauth2 disable-localhost-redirects <name>
kanidm system oauth2 enable-localhost-redirects mywebapp

Extended Options for Legacy Clients

Not all resource servers support modern standards like PKCE or ECDSA. In these situations it may be necessary to disable these on a per-resource server basis. Disabling these on one resource server will not affect others. These settings are explained in detail in our FAQ

Kani Warning WARNING
Changing these settings MAY have serious consequences on the security of your resource server. You should avoid changing these if at all possible!

To disable PKCE for a confidential resource server:

kanidm system oauth2 warning-insecure-client-disable-pkce <resource server name>

To enable legacy cryptograhy (RSA PKCS1-5 SHA256):

kanidm system oauth2 warning-enable-legacy-crypto <resource server name>

Example Integrations

Apache mod_auth_openidc

Add the following to a mod_auth_openidc.conf. It should be included in a mods_enabled folder or with an appropriate include.

OIDCRedirectURI /protected/redirect_uri
OIDCCryptoPassphrase <random password here>
OIDCProviderMetadataURL https://kanidm.example.com/oauth2/openid/<resource server name>/.well-known/openid-configuration
OIDCScope "openid"
OIDCUserInfoTokenMethod authz_header
OIDCClientID <resource server name>
OIDCClientSecret <resource server password>
OIDCPKCEMethod S256
OIDCCookieSameSite On
# Set the `REMOTE_USER` field to the `preferred_username` instead of the UUID.
# Remember that the username can change, but this can help with systems like Nagios which use this as a display name.
# OIDCRemoteUserClaim preferred_username

Other scopes can be added as required to the OIDCScope line, eg: OIDCScope "openid scope2 scope3"

In the virtual host, to protect a location:

<Location />
    AuthType openid-connect
    Require valid-user
</Location>

Miniflux

Miniflux is a feedreader that supports OAuth 2.0 and OpenID connect. It automatically appends the .well-known parts to the discovery endpoint. The application name in the redirect URL needs to match the OAUTH2_PROVIDER name.

OAUTH2_PROVIDER = "oidc";
OAUTH2_CLIENT_ID = "miniflux";
OAUTH2_CLIENT_SECRET = "<oauth2_rs_basic_secret>";
OAUTH2_REDIRECT_URL = "https://feeds.example.com/oauth2/kanidm/callback";
OAUTH2_OIDC_DISCOVERY_ENDPOINT = "https://idm.example.com/oauth2/openid/<oauth2_rs_name>";

Nextcloud

Install the module from the nextcloud market place - it can also be found in the Apps section of your deployment as "OpenID Connect user backend".

In Nextcloud's config.php you need to allow connection to remote servers:

'allow_local_remote_servers' => true,

You may optionally choose to add:

'allow_user_to_change_display_name' => false,
'lost_password_link' => 'disabled',

If you forget this, you may see the following error in logs:

Host 172.24.11.129 was not connected to because it violates local access rules

This module does not support PKCE or ES256. You will need to run:

kanidm system oauth2 warning-insecure-client-disable-pkce <resource server name>
kanidm system oauth2 warning-enable-legacy-crypto <resource server name>

In the settings menu, configure the discovery URL and client ID and secret.

You can choose to disable other login methods with:

php occ config:app:set --value=0 user_oidc allow_multiple_user_backends

You can login directly by appending ?direct=1 to your login page. You can re-enable other backends by setting the value to 1

Velociraptor

Velociraptor supports OIDC. To configure it select "Authenticate with SSO" then "OIDC" during the interactive configuration generator. Alternately, you can set the following keys in server.config.yaml:

GUI:
  authenticator:
    type: OIDC
    oidc_issuer: https://idm.example.com/oauth2/openid/:client\_id:/
    oauth_client_id: <resource server name/>
    oauth_client_secret: <resource server secret>

Velociraptor does not support PKCE. You will need to run the following:

kanidm system oauth2 warning-insecure-client-disable-pkce <resource server name>

Initial users are mapped via their email in the Velociraptor server.config.yaml config:

GUI:
  initial_users:
  - name: <email address>

Accounts require the openid and email scopes to be authenticated. It is recommended you limit these to a group with a scope map due to Velociraptors high impact.

# kanidm group create velociraptor_users
# kanidm group add_members velociraptor_users ...
kanidm system oauth2 create_scope_map <resource server name> velociraptor_users openid email

Vouch Proxy

WARNING Vouch proxy requires a unique identifier but does not use the proper scope, "sub". It uses the fields "username" or "email" as primary identifiers instead. As a result, this can cause user or deployment issues, at worst security bypasses. You should avoid Vouch Proxy if possible due to these issues.

Note: You need to run at least the version 0.37.0

Vouch Proxy supports multiple OAuth and OIDC login providers. To configure it you need to pass:

oauth:
  auth_url: https://idm.wherekanidmruns.com/ui/oauth2
  callback_url: https://login.wherevouchproxyruns.com/auth
  client_id: <oauth2_rs_name> # Found in kanidm system oauth2 get XXXX (should be the same as XXXX)
  client_secret: <oauth2_rs_basic_secret> # Found in kanidm system oauth2 get XXXX
  code_challenge_method: S256
  provider: oidc
  scopes:
    - email # Required due to vouch proxy reliance on mail as a primary identifier
  token_url: https://idm.wherekanidmruns.com/oauth2/token
  user_info_url: https://idm.wherekanidmruns.com/oauth2/openid/<oauth2_rs_name>/userinfo

The email scope needs to be passed and thus the mail attribute needs to exist on the account:

kanidm person update <ID> --mail "YYYY@somedomain.com" --name idm_admin

LDAP

While many applications can support external authentication and identity services through Oauth2, not all services can. Lightweight Directory Access Protocol (LDAP) has been the "universal language" of authentication for many years, with almost every application in the world being able to search and bind to LDAP. As many organisations still rely on LDAP, Kanidm can host a read-only LDAP interface for these legacy applications and services.

Kani Warning Warning!
The LDAP server in Kanidm is not a full LDAP server. This is intentional, as Kanidm wants to cover the common use cases - simple bind and search. The parts we do support are RFC compliant however.

What is LDAP

LDAP is a protocol to read data from a directory of information. It is not a server, but a way to communicate to a server. There are many famous LDAP implementations such as Active Directory, 389 Directory Server, DSEE, FreeIPA, and many others. Because it is a standard, applications can use an LDAP client library to authenticate users to LDAP, given "one account" for many applications - an IDM just like Kanidm!

Data Mapping

Kanidm entries cannot be mapped 100% to LDAP's objects. This is because LDAP types are simple key-values on objects which are all UTF8 strings (or subsets thereof) based on validation (matching) rules. Kanidm internally implements complex structured data types such as tagging on SSH keys, or multi-value credentials. These can not be represented in LDAP.

Many of the structures in Kanidm do not correlate closely to LDAP. For example Kanidm only has a GID number, where LDAP's schemas define both a UID number and a GID number.

Entries in the database also have a specific name in LDAP, related to their path in the directory tree. Kanidm is a flat model, so we have to emulate some tree-like elements, and ignore others.

For this reason, when you search the LDAP interface, Kanidm will make some mapping decisions.

  • The Kanidm domain name is used to generate the DN of the suffix by default.
  • The domain_info object becomes the suffix root.
  • All other entries are direct subordinates of the domain_info for DN purposes.
  • Distinguished Names (DNs) are generated from the spn, name, or uuid attribute.
  • Bind DNs can be remapped and rewritten, and may not even be a DN during bind.

These decisions were made to make the path as simple and effective as possible, relying more on the Kanidm query and filter system rather than attempting to generate a tree-like representation of data. As almost all clients can use filters for entry selection we don't believe this is a limitation for the consuming applications.

Security

LDAPS vs StartTLS

StartTLS is not supported due to security risks such as credential leakage and MITM attacks that are fundamental in how StartTLS works. StartTLS can not be repaired to prevent this. LDAPS is the only secure method of communicating to any LDAP server. Kanidm will use its certificates for both HTTPS and LDAPS.

Writes

LDAP's structure is too simplistic for writing to the complex entries that Kanidm internally contains. As a result, writes are rejected for all users via the LDAP interface.

Access Controls

LDAP only supports password authentication. As LDAP is used heavily in POSIX environments the LDAP bind for any DN will use its configured posix password.

As the POSIX password is not equivalent in strength to the primary credentials of Kanidm (which in most cases is multi-factor authentication), the LDAP bind does not grant rights to elevated read permissions. All binds have the permissions of "anonymous" even if the anonymous account is locked.

The exception is service accounts which can use api-tokens during an LDAP bind for elevated read permissions.

The ability to bind with the POSIX password can be disabled to prevent password bruteforce attempts. This does not prevent api-token binds.

Filtering Objects

It is recommended that client applications filter accounts that can authenticate with (class=account) and groups with (class=group).

Server Configuration

To configure Kanidm to provide LDAP, add the argument to the server.toml configuration:

ldapbindaddress = "127.0.0.1:3636"

You should configure TLS certificates and keys as usual - LDAP will reuse the Web server TLS material.

Showing LDAP Entries and Attribute Maps

By default Kanidm is limited in what attributes are generated or remapped into LDAP entries. However, the server internally contains a map of extended attribute mappings for application specific requests that must be satisfied.

An example is that some applications expect and require a 'CN' value, even though Kanidm does not provide it. If the application is unable to be configured to accept "name" it may be necessary to use Kanidm's mapping feature. Currently these are compiled into the server, so you may need to open an issue with your requirements for attribute maps.

To show what attribute maps exists for an entry you can use the attribute search term '+'.

# To show Kanidm attributes
ldapsearch ... -x '(name=admin)' '*'

# To show all attribute maps
ldapsearch ... -x '(name=admin)' '+'

Attributes that are in the map can be requested explicitly, and this can be combined with requesting Kanidm native attributes.

ldapsearch ... -x '(name=admin)' cn objectClass displayname memberof

Group Memberships

Group membership is defined in rfc2307bis or Active Directory style. This means groups are determined from the "memberof" attribute which contains a DN to a group.

People Accounts

Persons can bind (authenticate) to the LDAP server if they are configured as a posix account and have a valid posix password set.

When a person is bound to the directory, they inherit the permissions of anonymous - not their account. This is because a posix password as single factor authentication is not as secure and should not grant the same privileges as the accounts standard credentials.

Service Accounts

If you have issued api tokens for a service account they can be used to gain extended read permissions for those service accounts.

Api tokens can also be used to gain extended search permissions with LDAP. To do this you can bind with a dn of dn=token and provide the api token in the password.

NOTE The dn=token keyword is guaranteed to not be used by any other entry, which is why it was chosen as the keyword to initiate api token binds.

ldapwhoami -H ldaps://URL -x -D "dn=token" -w "TOKEN"
ldapwhoami -H ldaps://idm.example.com -x -D "dn=token" -w "..."
# u: demo_service@idm.example.com

Changing the Basedn

By default the basedn of the LDAP server is derived from the domain name. For example a domain name of idm.example.com will become dc=idm,dc=example,dc=com.

However, you may wish to change this to something shorter or at a higher level within your domain name.

Kani Warning Warning!
Changing the LDAP Basedn will require you to reconfigure your client applications so they search the correct basedn. Be careful when changing this value!

As an admin you can change the domain ldap basedn with:

kanidm system domain set-ldap-basedn <new basedn>
kanidm system domain set-ldap-basedn o=kanidm -D admin

Basedns are validated to ensure they are either dc=, ou= or o=. They must have one or more of these components and must only contain alphanumeric characters.

After the basedn is changed, the new value will take effect after a server restart. If you have a replicated topology, you must restart all servers.

Disable POSIX Password Binds

If you do not have applications that require LDAP password binds, then you should disable this function to limit access.

kanidm system domain set-ldap-allow-unix-password-bind [true|false]
kanidm system domain set-ldap-allow-unix-password-bind -D admin false

Examples

Given a default install with domain "idm.example.com" the configured LDAP DN will be "dc=idm,dc=example,dc=com".

# from server.toml
ldapbindaddress = "[::]:3636"

This can be queried with:

LDAPTLS_CACERT=ca.pem ldapsearch \
    -H ldaps://127.0.0.1:3636 \
    -b 'dc=idm,dc=example,dc=com' \
    -x '(name=test1)'

# test1@example.com, idm.example.com
dn: spn=test1@idm.example.com,dc=idm,dc=example,dc=com
objectclass: account
objectclass: memberof
objectclass: object
objectclass: person
objectclass: posixaccount
displayname: Test User
gidnumber: 12345
memberof: spn=group240@idm.example.com,dc=idm,dc=example,dc=com
name: test1
spn: test1@idm.example.com
entryuuid: 22a65b6c-80c8-4e1a-9b76-3f3afdff8400

LDAP binds can use any unique identifier of the account. The following are all valid bind DNs for the object listed above.

ldapwhoami ... -x -D 'name=test1'
ldapwhoami ... -x -D 'spn=test1@idm.example.com'
ldapwhoami ... -x -D 'test1@idm.example.com'
ldapwhoami ... -x -D 'test1'
ldapwhoami ... -x -D '22a65b6c-80c8-4e1a-9b76-3f3afdff8400'
ldapwhoami ... -x -D 'spn=test1@idm.example.com,dc=idm,dc=example,dc=com'
ldapwhoami ... -x -D 'name=test1,dc=idm,dc=example,dc=com'

Troubleshooting

Can't contact LDAP Server (-1)

Most LDAP clients are very picky about TLS, and can be very hard to debug or display errors. For example these commands:

ldapsearch -H ldaps://127.0.0.1:3636 -b 'dc=idm,dc=example,dc=com' -x '(name=test1)'
ldapsearch -H ldap://127.0.0.1:3636 -b 'dc=idm,dc=example,dc=com' -x '(name=test1)'
ldapsearch -H ldap://127.0.0.1:3389 -b 'dc=idm,dc=example,dc=com' -x '(name=test1)'

All give the same error:

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

This is despite the fact:

  • The first command is a certificate validation error.
  • The second is a missing LDAPS on a TLS port.
  • The third is an incorrect port.

To diagnose errors like this, you may need to add "-d 1" to your LDAP commands or client.

RADIUS

Remote Authentication Dial In User Service (RADIUS) is a network protocol that is commonly used to authenticate Wi-Fi devices or Virtual Private Networks (VPNs). While it should not be a sole point of trust/authentication to an identity, it's still an important control for protecting network resources.

Kanidm has a philosophy that each account can have multiple credentials which are related to their devices, and limited to specific resources. RADIUS is no exception and has a separate credential for each account to use for RADIUS access.

Disclaimer

It's worth noting some disclaimers about Kanidm's RADIUS integration.

One Credential - One Account

Kanidm normally attempts to have credentials for each device and application rather than the legacy model of one to one.

The RADIUS protocol is only able to attest a single password based credential in an authentication attempt, which limits us to storing a single RADIUS password credential per account. However, despite this limitation, it still greatly improves the situation by isolating the RADIUS credential from the primary or application credentials of the account. This solves many common security concerns around credential loss or disclosure, and prevents rogue devices from locking out accounts as they attempt to authenticate to Wi-Fi with expired credentials.

Alternatelly, Kanidm supports mapping users with special configuration of certificates allowing some systems to use EAP-TLS for RADIUS authentication. This returns to the "per device" credential model.

Cleartext Credential Storage

RADIUS offers many different types of tunnels and authentication mechanisms. However, most client devices "out of the box" only attempt a single type when a WPA2-Enterprise network is selected: MSCHAPv2 with PEAP. This is a challenge-response protocol that requires clear text or Windows NT LAN Manager (NTLM) credentials.

As MSCHAPv2 with PEAP is the only practical, universal RADIUS-type supported on all devices with minimal configuration, we consider it imperative that it MUST be supported as the default. Esoteric RADIUS types can be used as well, but this is up to administrators to test and configure.

Due to this requirement, we must store the RADIUS material as clear text or NTLM hashes. It would be silly to think that NTLM is secure as it relies on the obsolete and deprecated MD4 cryptographic hash, providing only an illusion of security.

This means Kanidm stores RADIUS credentials in the database as clear text.

We believe this is a reasonable decision and is a low risk to security because:

  • The access controls around RADIUS secrets by default are strong, limited to only self-account read and RADIUS-server read.
  • As RADIUS credentials are separate from the primary account credentials and have no other rights, their disclosure is not going to lead to a full account compromise.
  • Having the credentials in clear text allows a better user experience as clients can view the credentials at any time to enroll further devices.

Service Accounts Do Not Have Radius Access

Due to the design of service accounts, they do not have access to radius for credential assignment. If you require RADIUS usage with a service account you may need to use EAP-TLS or some other authentication method.

Account Credential Configuration

For an account to use RADIUS they must first generate a RADIUS secret unique to that account. By default, all accounts can self-create this secret.

kanidm person radius generate-secret --name william william
kanidm person radius show-secret --name william william

Account Group Configuration

In Kanidm, accounts which can authenticate to RADIUS must be a member of an allowed group. This allows you to define which users or groups may use a Wi-Fi or VPN infrastructure, and provides a path for revoking access to the resources through group management. The key point of this is that service accounts should not be part of this group:

kanidm group create --name idm_admin radius_access_allowed
kanidm group add-members --name idm_admin radius_access_allowed william

RADIUS Server Service Account

To read these secrets, the RADIUS server requires an account with the correct privileges. This can be created and assigned through the group "idm_radius_servers", which is provided by default.

First, create the service account and add it to the group:

kanidm service-account create --name admin radius_service_account "Radius Service Account"
kanidm group add-members --name admin idm_radius_servers radius_service_account

Now reset the account password, using the admin account:

kanidm service-account credential generate --name admin radius_service_account

Deploying a RADIUS Container

We provide a RADIUS container that has all the needed integrations. This container requires some cryptographic material, with the following files being in /etc/raddb/certs. (Modifiable in the configuration)

filenamedescription
ca.pemThe signing CA of the RADIUS certificate
dh.pemThe output of openssl dhparam -in ca.pem -out ./dh.pem 2048
cert.pemThe certificate for the RADIUS server
key.pemThe signing key for the RADIUS certificate

The configuration file (/data/kanidm) has the following template:

uri = "https://example.com" # URL to the Kanidm server
verify_hostnames = true     # verify the hostname of the Kanidm server

verify_ca = false           # Strict CA verification
ca = /data/ca.pem           # Path to the kanidm ca

auth_token = "ABC..."       # Auth token for the service account
                            # See: kanidm service-account api-token generate

# Default vlans for groups that don't specify one.
radius_default_vlan = 1

# A list of Kanidm groups which must be a member
# before they can authenticate via RADIUS.
radius_required_groups = [
    "radius_access_allowed@idm.example.com",
]

# A mapping between Kanidm groups and VLANS
radius_groups = [
    { spn = "radius_access_allowed@idm.example.com", vlan = 10 },
]

# A mapping of clients and their authentication tokens
radius_clients = [
    { name = "test", ipaddr = "127.0.0.1", secret  = "testing123" },
    { name = "docker" , ipaddr = "172.17.0.0/16", secret = "testing123" },
]

# radius_cert_path = "/etc/raddb/certs/cert.pem"
# the signing key for radius TLS
# radius_key_path = "/etc/raddb/certs/key.pem"
# the diffie-hellman output
# radius_dh_path = "/etc/raddb/certs/dh.pem"
# the CA certificate
# radius_ca_path = "/etc/raddb/certs/ca.pem"

A fully configured example

url = "https://example.com"

# The auth token for the service account
auth_token = "ABC..."

# default vlan for groups that don't specify one.
radius_default_vlan = 99

# if the user is in one of these Kanidm groups,
# then they're allowed to authenticate
radius_required_groups = [
    "radius_access_allowed@idm.example.com",
]

radius_groups = [
    { spn = "radius_access_allowed@idm.example.com", vlan = 10 }
]

radius_clients = [
    { name = "localhost", ipaddr = "127.0.0.1", secret = "testing123" },
    { name = "docker" , ipaddr = "172.17.0.0/16", secret = "testing123" },
]

Moving to Production

To expose this to a Wi-Fi infrastructure, add your NAS in the configuration:

radius_clients = [
    { name = "access_point", ipaddr = "10.2.3.4", secret = "<a_random_value>" }
]

Then re-create/run your docker instance and expose the ports by adding -p 1812:1812 -p 1812:1812/udp to the command.

If you have any issues, check the logs from the RADIUS output, as they tend to indicate the cause of the problem. To increase the logging level you can re-run your environment with debug enabled:

docker rm radiusd
docker run --name radiusd \
    -e DEBUG=True \
    -p 1812:1812 \
    -p 1812:1812/udp
    --interactive --tty \
    --volume /tmp/kanidm:/etc/raddb/certs \
    kanidm/radius:latest

Note: the RADIUS container is configured to provide Tunnel-Private-Group-ID, so if you wish to use Wi-Fi-assigned VLANs on your infrastructure, you can assign these by groups in the configuration file as shown in the above examples.

Service Integration Examples

This chapter demonstrates examples of services and their configuration to integrate with Kanidm.

If you wish to contribute more examples, please open a PR in the Kanidm Project Book.

Kubernetes Ingress

Guard your Kubernetes ingress with Kanidm authentication and authorization.

Prerequisites

We recommend you have the following before continuing:

Instructions

  1. Create a Kanidm account and group:

    1. Create a Kanidm account. Please see the section Creating Accounts.
    2. Give the account a password. Please see the section Resetting Account Credentials.
    3. Make the account a person. Please see the section People Accounts.
    4. Create a Kanidm group. Please see the section Creating Accounts.
    5. Add the account you created to the group you create. Please see the section Creating Accounts.
  2. Create a Kanidm OAuth2 resource:

    1. Create the OAuth2 resource for your domain. Please see the section Create the Kanidm Configuration.
    2. Add a scope mapping from the resource you created to the group you create with the openid, profile, and email scopes. Please see the section Create the Kanidm Configuration.
  3. Create a Cookie Secret to for the placeholder <COOKIE_SECRET> in step 4:

    docker run -ti --rm python:3-alpine python -c 'import secrets,base64; print(base64.b64encode(base64.b64encode(secrets.token_bytes(16))).decode("utf-8"));'
    
  4. Create a file called k8s.kanidm-nginx-auth-example.yaml with the block below. Replace every <string> (drop the <>) with appropriate values:

    1. <FQDN>: The fully qualified domain name with an A record pointing to your k8s ingress.
    2. <KANIDM_FQDN>: The fully qualified domain name of your Kanidm deployment.
    3. <COOKIE_SECRET>: The output from step 3.
    4. <OAUTH2_RS_NAME>: Please see the output from step 2.1 or get the OAuth2 resource you create from that step.
    5. <OAUTH2_RS_BASIC_SECRET>: Please see the output from step 2.1 or get the OAuth2 resource you create from that step.

    This will deploy the following to your cluster:

    ---
    apiVersion: v1
    kind: Namespace
    metadata:
      name: kanidm-example
      labels:
        pod-security.kubernetes.io/enforce: restricted
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      namespace: kanidm-example
      name: website
      labels:
        app: website
    spec:
      revisionHistoryLimit: 1
      replicas: 1
      selector:
        matchLabels:
          app: website
      template:
        metadata:
          labels:
            app: website
        spec:
          containers:
            - name: website
              image: modem7/docker-starwars
              imagePullPolicy: Always
              ports:
                - containerPort: 8080
              securityContext:
                allowPrivilegeEscalation: false
                capabilities:
                  drop: ["ALL"]
          securityContext:
            runAsNonRoot: true
            seccompProfile:
              type: RuntimeDefault
    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      namespace: kanidm-example
      name: website
    spec:
      selector:
        app: website
      ports:
        - protocol: TCP
          port: 8080
          targetPort: 8080
    
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      annotations:
        cert-manager.io/cluster-issuer: lets-encrypt-cluster-issuer
        nginx.ingress.kubernetes.io/auth-url: "https://$host/oauth2/auth"
        nginx.ingress.kubernetes.io/auth-signin: "https://$host/oauth2/start?rd=$escaped_request_uri"
      name: website
      namespace: kanidm-example
    spec:
      ingressClassName: nginx
      tls:
        - hosts:
            - <FQDN>
          secretName: <FQDN>-ingress-tls # replace . with - in the hostname
      rules:
      - host: <FQDN>
        http:
          paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: website
                port:
                  number: 8080
    
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        k8s-app: oauth2-proxy
      name: oauth2-proxy
      namespace: kanidm-example
    spec:
      replicas: 1
      selector:
        matchLabels:
          k8s-app: oauth2-proxy
      template:
        metadata:
          labels:
            k8s-app: oauth2-proxy
        spec:
          containers:
          - args:
            - --provider=oidc
            - --email-domain=*
            - --upstream=file:///dev/null
            - --http-address=0.0.0.0:4182
            - --oidc-issuer-url=https://<KANIDM_FQDN>/oauth2/openid/<OAUTH2_RS_NAME>
            - --code-challenge-method=S256
            env:
            - name: OAUTH2_PROXY_CLIENT_ID
              value: <OAUTH2_RS_NAME>
            - name: OAUTH2_PROXY_CLIENT_SECRET
              value: <OAUTH2_RS_BASIC_SECRET>
            - name: OAUTH2_PROXY_COOKIE_SECRET
              value: <COOKIE_SECRET>
            image: quay.io/oauth2-proxy/oauth2-proxy:latest
            imagePullPolicy: Always
            name: oauth2-proxy
            ports:
            - containerPort: 4182
              protocol: TCP
            securityContext:
              allowPrivilegeEscalation: false
              capabilities:
                drop: ["ALL"]
          securityContext:
            runAsNonRoot: true
            seccompProfile:
              type: RuntimeDefault
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        k8s-app: oauth2-proxy
      name: oauth2-proxy
      namespace: kanidm-example
    spec:
      ports:
      - name: http
        port: 4182
        protocol: TCP
        targetPort: 4182
      selector:
        k8s-app: oauth2-proxy
    
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: oauth2-proxy
      namespace: kanidm-example
    spec:
      ingressClassName: nginx
      rules:
      - host: <FQDN>
        http:
          paths:
          - path: /oauth2
            pathType: Prefix
            backend:
              service:
                name: oauth2-proxy
                port:
                  number: 4182
      tls:
      - hosts:
        - <FQDN>
        secretName: <FQDN>-ingress-tls # replace . with - in the hostname
    
  5. Apply the configuration by running the following command:

    kubectl apply -f k8s.kanidm-nginx-auth-example.yaml
    
  6. Check your deployment succeeded by running the following commands:

    kubectl -n kanidm-example get all
    kubectl -n kanidm-example get ingress
    kubectl -n kanidm-example get Certificate
    

    You may use kubectl's describe and log for troubleshooting. If there are ingress errors see the Ingress NGINX documentation's troubleshooting page. If there are certificate errors see the CertManger documentation's troubleshooting page.

    Once it has finished deploying, you will be able to access it at https://<FQDN> which will prompt you for authentication.

Cleaning Up

  1. Remove the resources create for this example from k8s:

    kubectl delete namespace kanidm-example
    
  2. Remove the objects created for this example from Kanidm:

    1. Delete the account created in section Instructions step 1.
    2. Delete the group created in section Instructions step 2.
    3. Delete the OAuth2 resource created in section Instructions step 3.

References

  1. NGINX Ingress Controller: External OAUTH Authentication
  2. OAuth2 Proxy: OpenID Connect Provider

Traefik

Traefik is a flexible HTTP reverse proxy webserver that can be integrated with Docker to allow dynamic configuration and to automatically use LetsEncrypt to provide valid TLS certificates. We can leverage this in the setup of Kanidm by specifying the configuration of Kanidm and Traefik in the same Docker Compose configuration.

Example setup

Create a new directory and copy the following YAML file into it as docker-compose.yml. Edit the YAML to update the LetsEncrypt account email for your domain and the FQDN where Kanidm will be made available. Ensure you adjust this file or Kanidm's configuration to have a matching HTTPS port; the line traefik.http.services.kanidm.loadbalancer.server.port=8443 sets this on the Traefik side.

NOTE You will need to generate self-signed certificates for Kanidm, and copy the configuration into the kanidm_data volume. Some instructions are available in the "Installing the Server" section of this book.

docker-compose.yml

version: "3.4"

services:
  traefik:
    image: traefik:v2.6
    container_name: traefik
    command:
      - "--certificatesresolvers.http.acme.email=admin@example.com"
      - "--certificatesresolvers.http.acme.storage=/letsencrypt/acme.json"
      - "--certificatesresolvers.http.acme.tlschallenge=true"
      - "--entrypoints.websecure.address=:443"
      - "--entrypoints.websecure.http.tls=true"
      - "--entrypoints.websecure.http.tls.certResolver=http"
      - "--log.level=INFO"
      - "--providers.docker=true"
      - "--providers.docker.exposedByDefault=false"
      - "--serverstransport.insecureskipverify=true"
    restart: always
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "443:443"
  kanidm:
    container_name: kanidm
    image: kanidm/server:devel
    restart: unless-stopped
    volumes:
      - kanidm_data:/data
    labels:
      - traefik.enable=true
      - traefik.http.routers.kanidm.entrypoints=websecure
      - traefik.http.routers.kanidm.rule=Host(`idm.example.com`)
      - traefik.http.routers.kanidm.service=kanidm
      - traefik.http.serversTransports.kanidm.insecureSkipVerify=true
      - traefik.http.services.kanidm.loadbalancer.server.port=8443
      - traefik.http.services.kanidm.loadbalancer.server.scheme=https
volumes:
  kanidm_data: {}

Finally you may run docker-compose up to start up both Kanidm and Traefik.

Replication

Introduction

Replication allows two or more Kanidm servers to exchange their databases and keep their content synchronised. This is critical to allow multiple servers to act in failover groups for highly available infrastructure.

Kanidm replication is eventually consistent. This means that there are no elections or quorums required between nodes - all nodes can accept writes and distribute them to all other nodes. This is important for security and performance.

Because replication is eventually consistent, this means that there can be small delays between different servers receiving a change. This may result in some users noticing discrepancies that are quickly resolved.

To minimise this, it's recommended that when you operate replication in a highly available deployment that you have a load balancer that uses sticky sessions so that users are redirected to the same server unless a failover event occurs. This will help to minimise discrepancies. Alternately you can treat replication and "active-passive" and have your load balancer failover between the two nodes. Since replication is eventually consistent, there is no need for a failover or failback procedure.

In this chapter we will cover the details of planning, deploying and maintaining replication between Kanidm servers.

Vocabulary

Replication requires us to use introduce specific words so that we can describe the replication environment.

Change

An update made in the database.

Node

A server that is participating in replication.

Pull

The act of requesting data from a remote server.

Push

The act of supplying data to a remote server.

Node Configuration

A descriptor that allows a node to pull from another node.

Converge

To approach the same database state.

Topology

The collection of servers that are joined in replication and converge on the same database content. The topology is defined by the set of node configurations.

Replication

The act of exchanging data from one node to another.

Supplier

The node that is supplying data to another node.

Consumer

The node that is replicating content from a supplier.

Refresh

Deleting all of a consumer's database content, and replacing it with the content of a supplier.

Incremental Replication

When a supplier provides a "differential" between the state of the consumer and the supplier for the consumer to apply.

Conflict

If a consumer can not validate a change that a supplier provided, then the entry may move to a conflict state. All nodes will converge to the same conflict state over time.

Tombstone

A marker entry that displays an entry has been deleted. This allow all servers to converge and delete the data.

Planning

Kani Warning WARNING
Replication is a newely developed feature. This means it requires manual configuration and careful monitoring. You should keep backups if you choose to proceed.

It is important that you plan your replication deployment before you proceed. You may have a need for high availability within a datacentre, geographic redundancy, or improvement of read scaling.

Improvement of Read Throughput

Addition of replicas can improve the amount of read and authentication operations performed over the topology as a whole. This is because read operations throughput is additive between nodes.

For example, if you had two servers that can process 1000 authentications per second each, then when in replication the topology can process 2000 authentications per second.

However, while you may gain in read throughput, you must account for downtime - you should not always rely on every server to be online.

The optimal loading of any server is approximately 50%. This allows overhead to absorb load if nearby nodes experience outages. It also allows for absorption of load spikes or other unexpected events.

It is important to note however that as you add replicas the write throughput does not increase in the same way as read throughput. This is because for each write that occurs on a node, it must be replicated and written to every other node. Therefore your write throughput is always bounded by the slowest server in your topology. In reality there is a "slight" improvement in writes due to coalescing that occurs as part of replication, but you should assume that writes are not improved through the addition of more nodes.

Directing Clients to Live Servers

Operating replicas of Kanidm allows you to minimise outages if a single or multiple servers experience downtime. This can assist you with patching and other administrative tasks that you must perform.

However, there are some key limitations to this fault tolerance.

You require a method to fail over between servers. This generally involves a load balancer, which itself must be fault tolerant. Load balancers can be made fault tolerant through the use of protocols like CARP or VRRP, or by configuration of routers with anycast.

If you elect to use CARP or VRRP directly on your Kanidm servers, then be aware that you will be configuring your systems as active-passive, rather than active-active, so you will not benefit from improved read throughput. Contrast, anycast will always route to the closest Kanidm server and will failover to nearby servers so this may be an attractive choice.

You should NOT use DNS based failover mechanisms as clients can cache DNS records and remain "stuck" to a node in a failed state.

Maximum Downtime of a Server

Kanidm's replication protocol enforces limits on how long a server can be offline. This is due to how tombstones are handled. By default the maximum is 7 days. If a server is offline for more than 7 days a refresh will be required for that server to continue participation in the topology.

It is important you avoid extended downtime of servers to avoid this condition.

Deployment

Kani Warning WARNING
Replication is a newely developed feature. This means it requires manual configuration and careful monitoring. You should take regular backups if you choose to proceed.

Node Setup

On the servers that you wish to participate in the replication topology, you must enable replication in their server.toml to allow identity certificates to be generated.

# server.toml

# To proceed with replication, replace the line "ACK_HERE" with
# "i acknowledge that replication is in development = true" where the spaces
# are replaced with '_'

ACK_HERE

[replication]
# The hostname and port of the server that other nodes will connect to.
origin = "repl://localhost:8444"
# The bind address of the replication port.
bindaddress = "127.0.0.1:8444"

Once configured, deploy this config to your servers and restart the nodes.

Manual Node Configurations

NOTE In the future we will develop a replication coordinator so that you don't have to manually configure this. But for now, if you want replication, you have to do it the hard way.

Each node has an identify certificate that is internally generated and used to communicate with other nodes in the topology. This certificate is also used by other nodes to validate this node.

Let's assume we have two servers - A and B. We want B to consume (pull) data from A initially as A is our "first server".

First display the identity certificate of A.

# Server A
docker exec -i -t <container name> \
  kanidmd show-replication-certificate
# certificate: "MII....."

Now on node B, configure the replication node config.

[replication]
# ...

[replication."repl://origin_of_A:port"]
type = "mutual-pull"
partner_cert = "MII... <as output from A show-replication-cert>"

Now we must configure A to pull from B.

# Server B
docker exec -i -t <container name> \
  kanidmd show-replication-certificate
# certificate: "MII....."

Now on node A, configure the replication node config.

[replication]
# ...

[replication."repl://origin_of_B:port"]
type = "mutual-pull"
partner_cert = "MII... <as output from B show-replication-cert>"

Then restart both servers. Initially the servers will refuse to synchronise as their databases do not have matching domain_uuids. To resolve this you can instruct B to manually refresh from A with:

# Server B
docker exec -i -t <container name> \
  kanidmd refresh-replication-consumer

Partially Automated Node Configurations

NOTE In the future we will develop a replication coordinator so that you don't have to manually configure this. But for now, if you want replication, you have to do it the hard way.

This is the same as the manual process, but a single server is defined as the "primary" and the partner server is the "secondary". This means that if database issues occur the content of the primary will take precedence over the secondary. For our example we will define A as the primary and B as the secondary.

First display the identity certificate

# Server A
docker exec -i -t <container name> \
  kanidmd show-replication-certificate
# certificate: "MII....."

Now a secondary, configure the replication node config.

[replication]
# ...

[replication."repl://origin_of_A:port"]
type = "mutual-pull"
partner_cert = "MII... <as output from A show-replication-cert>"
automatic_refresh = true

Now we must configure A to pull from B.

# Server B
docker exec -i -t <container name> \
  kanidmd show-replication-certificate
# certificate: "MII....."

Now on node A, configure the replication node config. It is critical here that you do NOT set automatic_refresh.

[replication]
# ...

[replication."repl://origin_of_B:port"]
type = "mutual-pull"
partner_cert = "MII... <as output from B show-replication-cert>"
# automatic_refresh = false

Then restart both servers. B (secondary) will automatically refresh from A (primary) and then replication will continue bi-directionally from that point.

Administration

Renew Replication Identity Certificate

The replication identity certificate defaults to an expiry of 180 days.

To renew this run the command:

docker exec -i -t <container name> \
  kanidmd renew-replication-certificate
# certificate: "MII....."

You must then copy the new certificate to other nodes in the topology.

NOTE In the future we will develop a replication coordinator so that you don't have to manually renew this. But for now, if you want replication, you have to do it the hard way.

Refresh a Lagging Consumer

If a consumer has been offline for more than 7 days, its error log will display that it requires a refresh.

You can manually perform this on the affected node.

docker exec -i -t <container name> \
  kanidmd refresh-replication-consumer

Synchronisation Concepts

Introduction

In some environments Kanidm may be the first Identity Management system introduced. However many existing environments have existing IDM systems that are well established and in use. To allow Kanidm to work with these, it is possible to synchronise data between these IDM systems.

Currently Kanidm can consume (import) data from another IDM system. There are two major use cases for this:

  • Running Kanidm in parallel with another IDM system
  • Migrating from an existing IDM to Kanidm

An incoming IDM data source is bound to Kanidm by a sync account. All synchronised entries will have a reference to the sync account that they came from defined by their sync_parent_uuid. While an entry is owned by a sync account we refer to the sync account as having authority over the content of that entry.

The sync process is driven by a sync tool. This tool extracts the current state of the sync from Kanidm, requests the set of changes (differences) from the IDM source, and then submits these changes to Kanidm. Kanidm will update and apply these changes and commit the new sync state on success.

In the event of a conflict or data import error, Kanidm will halt and rollback the synchronisation to the last good state. The sync tool should be reconfigured to exclude the conflicting entry or to remap it's properties to resolve the conflict. The operation can then be retried.

This process can continue long term to allow Kanidm to operate in parallel to another IDM system. If this is for a migration however, the sync account can be finalised. This terminates the sync account and removes the sync parent uuid from all synchronised entries, moving authority of the entry into Kanidm.

Alternatelly, the sync account can be terminated which removes all synchronised content that was submitted.

Creating a Sync Account

Creating a sync account requires administration permissions. By default this is available to members of the "system_admins" group which "admin" is a memberof by default.

kanidm system sync create <sync account name>
kanidm system sync create ipasync

Once the sync account is created you can then generate the sync token which identifies the sync tool.

kanidm system sync generate-token <sync account name> <token label>
kanidm system sync generate-token ipasync mylabel
token: eyJhbGci...
Kani Warning Warning!
The sync account token has a high level of privilege, able to create new accounts and groups. It should be treated carefully as a result!

If you need to revoke the token, you can do so with:

kanidm system sync destroy-token <sync account name>
kanidm system sync destroy-token ipasync

Destroying the token does NOT affect the state of the sync account and it's synchronised entries. Creating a new token and providing that to the sync tool will continue the sync process.

Operating the Sync Tool

The sync tool can now be run to replicate entries from the external IDM system into Kanidm.

You should refer to the chapter for the specific external IDM system you are using for details on the sync tool configuration.

The sync tool runs in batches, meaning that changes from the source IDM service will be delayed to appear into Kanidm. This is affected by how frequently you choose to run the sync tool.

If the sync tool fails, you can investigate details in the Kanidmd server output.

The sync tool can run "indefinitely" if you wish for Kanidm to always import data from the external source.

Yielding Authority of Attributes to Kanidm

By default Kanidm assumes that authority over synchronised entries is retained by the sync tool. This means that synchronised entries can not be written to in any capacity outside of a small number of internal Kanidm internal attributes.

An adminisrator may wish to allow synchronised entries to have some attributes written by the instance locally. An example is allowing passkeys to be created on Kanidm when the external synchronisation provider does not supply them.

In this case the synchronisation agreement can be configured to yield it's authority over these attributes to Kanidm.

To configure the attributes that Kanidm can control:

kanidm system sync set-yield-attributes <sync account name> [attr, ...]
kanidm system sync set-yield-attributes ipasync passkeys

This commands takes the set of attributes that should be yielded. To remove an attribute you declare the yield set with that attribute missing.

kanidm system sync set-yield-attributes ipasync passkeys
# To remove passkeys from being Kanidm controlled.
kanidm system sync set-yield-attributes ipasync

Finalising the Sync Account

If you are performing a migration from an external IDM to Kanidm, when that migration is completed you can nominate that Kanidm now owns all of the imported data. This is achieved by finalising the sync account.

Kani Warning Warning!
You can not undo this operation. Once you have finalised an agreement, Kanidm owns all of the synchronised data, and you can not resume synchronisation.
kanidm system sync finalise <sync account name>
kanidm system sync finalise ipasync
# Do you want to continue? This operation can NOT be undone. [y/N]

Once finalised, imported accounts can now be fully managed by Kanidm.

Terminating the Sync Account

If you decide to cease importing accounts or need to remove all imported accounts from a sync account, you can choose to terminate the agreement removing all data that was imported.

Kani Warning Warning!
You can not undo this operation. Once you have terminated an agreement, Kanidm deletes all of the synchronised data, and you can not resume synchronisation.
kanidm system sync terminate <sync account name>
kanidm system sync terminate ipasync
# Do you want to continue? This operation can NOT be undone. [y/N]

Once terminated all imported data will be deleted by Kanidm.

Synchronising from FreeIPA

FreeIPA is a popular opensource LDAP and Kerberos provider, aiming to be "Active Directory" for Linux.

Kanidm is able to synchronise from FreeIPA for the purposes of coexistence or migration.

Installing the FreeIPA Sync Tool

See installing the client tools. The ipa sync tool is part of the tools container.

Configure the FreeIPA Sync Tool

The sync tool is a bridge between FreeIPA and Kanidm, meaning that the tool must be configured to communicate to both sides.

Like other components of Kanidm, the FreeIPA sync tool will read your /etc/kanidm/config if present to understand how to connect to Kanidm.

The sync tool specific components are configured in its own configuration file.

# The sync account token as generated by "system sync generate-token".
sync_token = "eyJhb..."

# A cron-like expression of when to run when in scheduled mode. The format is:
#   sec  min   hour   day of month   month   day of week   year
#
# The default of this value is "0 */5 * * * * *" which means "run every 5 minutes".
# schedule = ""

# If you want to monitor the status of the scheduled sync tool (you should)
# then you can set a bind address here.
#
# If not set, defaults to no status listener.
# status_bind = ""

# The LDAP URI to FreeIPA. This MUST be LDAPS. You should connect to a unique single
# server in the IPA topology rather than via a load balancer or dns srv records. This
# is to prevent replication conflicts and issues due to how 389-ds content sync works.
ipa_uri = "ldaps://specific-server.ipa.dev.kanidm.com"
# Path to the IPA CA certificate in PEM format.
ipa_ca = "/path/to/kanidm-ipa-ca.pem"
# The DN of an account with content sync rights. By default cn=Directory Manager has
# this access.
ipa_sync_dn = "cn=Directory Manager"
ipa_sync_pw = "directory manager password"
# The basedn to examine.
ipa_sync_base_dn = "dc=ipa,dc=dev,dc=kanidm,dc=com"

# By default Kanidm seperates the primary account password and credentials from
# the unix credential. This allows the unix password to be isolated from the
# account password so that compromise of one doesn't compromise the other. However
# this can be surprising for new users during a migration. This boolean allows the
# user password to be set as the unix password during the migration for consistency
# and then after the migration they are "unlinked".
#
# sync_password_as_unix_password = false

# The sync tool can alter or exclude entries. These are mapped by their syncuuid
# (not their ipa-object-uuid). The syncuuid is derived from nsUniqueId in 389-ds.
# This is chosen oven DN because DN's can change with modrdn where nsUniqueId is
# immutable and requires an entry to be deleted and recreated.

[ac60034b-3498-11ed-a50d-919b4b1a5ec0]
# my-problematic-entry
exclude = true

# Remap the uuid of this entry to a new uuid on Kanidm
#
# map_uuid = <uuid>

# Remap the name of this entry to a new name on Kanidm
#
# map_name = <name>

# Remap the gidnumber for groups, and uidnumber for users
#
# map_gidnumber = <number>

This example is located in examples/kanidm-ipa-sync.

In addition to this, you must make some configuration changes to FreeIPA to enable synchronisation.

You can find the name of your 389 Directory Server instance with:

# Run on the FreeIPA server
dsconf --list

Using this you can show the current status of the retro changelog plugin to see if you need to change it's configuration.

# Run on the FreeIPA server
dsconf <instance name> plugin retro-changelog show
dsconf slapd-DEV-KANIDM-COM plugin retro-changelog show

You must modify the retro changelog plugin to include the full scope of the database suffix so that the sync tool can view the changes to the database. Currently dsconf can not modify the include-suffix so you must do this manually.

You need to change the nsslapd-include-suffix to match your FreeIPA baseDN here. You can access the basedn with:

ldapsearch -H ldaps://<IPA SERVER HOSTNAME/IP> -x -b '' -s base namingContexts
# namingContexts: dc=ipa,dc=dev,dc=kanidm,dc=com

You should ignore cn=changelog and o=ipaca as these are system internal namingContexts. You can then create an ldapmodify like the following.

#![allow(unused)]
fn main() {
dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-include-suffix
nsslapd-include-suffix: dc=ipa,dc=dev,dc=kanidm,dc=com
}

And apply it with:

ldapmodify -f change.ldif -H ldaps://<IPA SERVER HOSTNAME/IP> -x -D 'cn=Directory Manager' -W
# Enter LDAP Password:

You must then reboot your FreeIPA server.

Running the Sync Tool Manually

You can perform a dry run with the sync tool manually to check your configurations are correct and that the tool can synchronise from FreeIPA.

kanidm-ipa-sync [-c /path/to/kanidm/config] -i /path/to/kanidm-ipa-sync -n
kanidm-ipa-sync -i /etc/kanidm/ipa-sync -n

Running the Sync Tool Automatically

The sync tool can be run on a schedule if you configure the schedule parameter, and provide the option "--schedule" on the cli

kanidm-ipa-sync [-c /path/to/kanidm/config] -i /path/to/kanidm-ipa-sync --schedule
kanidm-ipa-sync -i /etc/kanidm/ipa-sync --schedule

As the sync tool is part of the tools container, you can run this with:

docker create --name kanidm-ipa-sync \
  --user uid:gid \
  -p 12345:12345 \
  -v /etc/kanidm/config:/etc/kanidm/config:ro \
  -v /path/to/ipa-sync:/etc/kanidm/ipa-sync:ro \
  kanidm-ipa-sync -i /etc/kanidm/ipa-sync --schedule

Monitoring the Sync Tool

When running in schedule mode, you may wish to monitor the sync tool for failures. Since failures block the sync process, this is important to ensuring a smooth and reliable synchronisation process.

You can configure a status listener that can be monitored via tcp with the parameter status_bind.

An example of monitoring this with netcat is:

# status_bind = "[::1]:12345"
# nc ::1 12345
Ok

It's important to note no details are revealed via the status socket, and is purely for Ok or Err status of the last sync. This status socket is suitable for monitoring from tools such as Nagios.

Synchronising from LDAP

If you have an LDAP server that supports sync repl (rfc4533 content synchronisation) then you are able to synchronise from it to Kanidm for the purposes of coexistence or migration.

If there is a specific Kanidm sync tool for your LDAP server, you should use that instead of the generic LDAP server sync.

Installing the LDAP Sync Tool

See installing the client tools.

Configure the LDAP Sync Tool

The sync tool is a bridge between LDAP and Kanidm, meaning that the tool must be configured to communicate to both sides.

Like other components of Kanidm, the LDAP sync tool will read your /etc/kanidm/config if present to understand how to connect to Kanidm.

The sync tool specific components are configured in it's own configuration file.


# The sync account token as generated by "system sync generate-token".
sync_token = "eyJhb..."

# A cron-like expression of when to run when in scheduled mode. The format is:
#   sec  min   hour   day of month   month   day of week   year
#
# The default of this value is "0 */5 * * * * *" which means "run every 5 minutes".
# schedule = ""

# If you want to monitor the status of the scheduled sync tool (you should)
# then you can set a bind address here.
#
# If not set, defaults to no status listener.
# status_bind = ""

# The LDAP URI to the server. This MUST be LDAPS. You should connect to a unique single
# server in the LDAP topology rather than via a load balancer or dns srv records. This
# is to prevent replication conflicts and issues due to how 389-ds and openldap sync works.
ldap_uri = "ldaps://specific-server.ldap.kanidm.com"
# Path to the LDAP CA certificate in PEM format.
ldap_ca = "/path/to/kanidm-ldap-ca.pem"
# The DN of an account with content sync rights. On 389-ds, by default cn=Directory Manager has
# this access. On OpenLDAP you must grant this access.
ldap_sync_dn = "cn=Directory Manager"
ldap_sync_pw = "directory manager password"

# The basedn to search
ldap_sync_base_dn = "dc=ldap,dc=dev,dc=kanidm,dc=com"
# Filter the entries that are synchronised with this filter
# NOTE: attribute-value-assertions with spaces require quoting!
ldap_filter = "(|(objectclass=person)(objectclass=posixgroup))"
# ldap_filter = "(cn=\"my value\")"

# By default Kanidm seperates the primary account password and credentials from
# the unix credential. This allows the unix password to be isolated from the
# account password so that compromise of one doesn't compromise the other. However
# this can be surprising for new users during a migration. This boolean allows the
# user password to be set as the unix password during the migration for consistency
# and then after the migration they are "unlinked".
#
# sync_password_as_unix_password = false

# The objectclass used to identify persons to import to Kanidm.
#
# If not set, defaults to "person"
# person_objectclass = ""

# Attribute mappings. These allow you to bind values from your directory server
# to the values that Kanidm will import.
#
# person_attr_user_name = "uid"
# person_attr_display_name = "cn"
# person_attr_gidnumber = = "uidnumber"
# person_attr_login_shell = "loginshell"
# person_attr_password = "userpassword"

# If the password value requires a prefix for Kanidm to import it, this can be optionally
# provided here.
#
# person_password_prefix = ""

# The objectclass used to identify groups to import to Kanidm.
#
# If not set, defaults to "groupofnames"
# group_objectclass = ""

# Attribute mappings. These allow you to bind values from your directory server
# to the values that Kanidm will import.
#
# group_attr_name = "cn"
# group_attr_description = "description"
# group_attr_member = "member"
# group_attr_gidnumber = "gidnumber"


# The sync tool can alter or exclude entries. These are mapped by their syncuuid
# The syncuuid is derived from nsUniqueId in 389-ds. It is the entryUUID for OpenLDAP
# This is chosen oven DN because DN's can change with modrdn where nsUniqueId/entryUUID is
# immutable and requires an entry to be deleted and recreated.

[ac60034b-3498-11ed-a50d-919b4b1a5ec0]
# my-problematic-entry
exclude = true

# Remap the uuid of this entry to a new uuid on Kanidm
#
# map_uuid = <uuid>

# Remap the name of this entry to a new name on Kanidm
#
# map_name = <name>

# Remap the gidnumber for groups, and uidnumber for users
#
# map_gidnumber = <number>



This example is located in examples/kanidm-ldap-sync.

In addition to this, you may be required to make some configuration changes to your LDAP server to enable synchronisation.

OpenLDAP

You must enable the syncprov overlay in slapd.conf

moduleload syncprov.la
overlay syncprov

In addition you must grant an account full read access and raise its search limits.

access to *
    by dn.base="cn=sync,dc=example,dc=com" read
    by * break

limits dn.exact="cn=sync,dc=example,dc=com" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited

For more details see the openldap administration guide.

389 Directory Server

You can find the name of your 389 Directory Server instance with:

dsconf --list

Using this you can show the current status of the retro changelog plugin to see if you need to change its configuration.

dsconf <instance name> plugin retro-changelog show
dsconf slapd-DEV-KANIDM-COM plugin retro-changelog show

You must modify the retro changelog plugin to include the full scope of the database suffix so that the sync tool can view the changes to the database. Currently dsconf can not modify the include-suffix so you must do this manually.

You need to change the nsslapd-include-suffix to match your LDAP baseDN here. You can access the basedn with:

ldapsearch -H ldaps://<SERVER HOSTNAME/IP> -x -b '' -s base namingContexts
# namingContexts: dc=ldap,dc=dev,dc=kanidm,dc=com

You should ignore cn=changelog as this is a system internal namingContext. You can then create an ldapmodify like the following.

#![allow(unused)]
fn main() {
dn: cn=Retro Changelog Plugin,cn=plugins,cn=config
changetype: modify
replace: nsslapd-include-suffix
nsslapd-include-suffix: dc=ipa,dc=dev,dc=kanidm,dc=com
}

And apply it with:

ldapmodify -f change.ldif -H ldaps://<SERVER HOSTNAME/IP> -x -D 'cn=Directory Manager' -W
# Enter LDAP Password:

You must then reboot your 389 Directory Server.

Running the Sync Tool Manually

You can perform a dry run with the sync tool manually to check your configurations are correct and that the tool can synchronise from LDAP.

kanidm-ldap-sync [-c /path/to/kanidm/config] -i /path/to/kanidm-ldap-sync -n
kanidm-ldap-sync -i /etc/kanidm/ldap-sync -n

Running the Sync Tool Automatically

The sync tool can be run on a schedule if you configure the schedule parameter, and provide the option "--schedule" on the cli

kanidm-ldap-sync [-c /path/to/kanidm/config] -i /path/to/kanidm-ldap-sync --schedule
kanidm-ldap-sync -i /etc/kanidm/ldap-sync --schedule

As the sync tool is part of the tools container, you can run this with:

docker create --name kanidm-ldap-sync \
  --user uid:gid \
  -p 12345:12345 \
  -v /etc/kanidm/config:/etc/kanidm/config:ro \
  -v /path/to/ldap-sync:/etc/kanidm/ldap-sync:ro \
  kanidm-ldap-sync -i /etc/kanidm/ldap-sync --schedule

Monitoring the Sync Tool

When running in schedule mode, you may wish to monitor the sync tool for failures. Since failures block the sync process, this is important for a smooth and reliable synchronisation process.

You can configure a status listener that can be monitored via tcp with the parameter status_bind.

An example of monitoring this with netcat is:

# status_bind = "[::1]:12345"
# nc ::1 12345
Ok

It's important to note no details are revealed via the status socket, and is purely for Ok or Err status of the last sync. This status socket is suitable for monitoring from tools such as Nagios.

Access Control

While Kanidm exists to make authorisation decisions on behalf of other services, internally it must make decisions about writes operations to the entries within its database. To make these choices, Kanidm has an internal set of access controls which are the rules describing who may perform what actions.

Default Permissions

The project ships default access controls which are designed to limit and isolate the privileges of accounts whenever possible.

This separation is the reason why admin and idm_admin exist as separate accounts. There are two distinct access silos within Kanidm. Access to manage Kanidm as a service (such as application integrations and domain naming) and access to manage people and groups. This is to limit the possible harm that an attacker may make if they gain access to these roles.

Permission Delegation

A number of types in Kanidm allow permission delegation such as groups and service accounts. This allows entries to be assigned an entry manager who has write access to that entity but not all entities of the same class.

High Privilege Groups

Kanidm has a special group called idm_high_privilege. This acts as a "taint" on its members to indicate that they have an elevated level of access within Kanidm or other systems.

This taint flag exists to prevent lateral movement from other roles that have higher levels of privilege.

An example is idm_service_desk which has the ability to trigger credential resets for users. This is an important aspect of the service desk role. However, a member of the service desk should not be able to modify the credentials of their peers, nor should they be able to escalate by accessing the credentials of users in a role such as idm_admins. Since idm_service_desk and idm_admins are both tainted with idm_high_privilege then this lateral movement is not possible. Only high privileged roles are able to then reset the accounts of high privilege users.

You may add other groups to idm_high_privilege to achieve the same taint effect for other services.

Default Permission Groups

Kanidm ships with default permission groups. You can use these to enable accounts to perform certain tasks within Kanidm as required.

group namedescription
domain_adminsmodify the name of this domain
idm_access_control_adminswrite access controls
idm_account_policy_adminsmodify account policy requirements for user authentication
idm_group_adminscreate and modify groups
idm_oauth2_adminscreate and modify oauth2 integrations
idm_people_adminscreate and modify persons
idm_people_on_boardingcreate (but not modify) persons. Intended for use with service accounts
idm_people_pii_readallow read to personally identifying information
idm_people_self_write_mailallow self-modification of the mail attribute
idm_radius_serversread user radius secrets. Intended for use with service accounts
idm_radius_service_adminscreate and reset user radius secrets, and allow users to access radius
idm_recycle_bin_adminsmodify and restore entries from the recycle bin
idm_schema_adminsadd and modify elements of schema
idm_service_account_adminscreate and modify service accounts
idm_unix_adminsenable posix attributes on accounts and groups

Default Roles

Kanidm ships with 3 high level permission groups. These roles have no inherent permissions, they are created by being members of the default permission groups.

group namedescription
idm_adminsmanage persons and their groups
idm_service_deskassist persons with credential resets or other queries
system_adminsmanage the operation of Kanidm as a database and service

Troubleshooting

Some things to try.

Is the server started?

If you don't see "ready to rock! 🪨" in your logs, it's not started. Scroll back and look for errors!

Can you connect?

If the server's running on idm.example.com:8443 then a simple connectivity test is done using curl.

Run the following command:

curl https://idm.example.com:8443/status

This is similar to what you should see:

➜ curl -vk https://idm.example.com:8443/status
*   Trying 10.0.0.14:8443...
* Connected to idm.example.com (10.0.0.14) port 8443 (#0)
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* (304) (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
* Server certificate:
*  subject: C=AU; ST=Queensland; L=Brisbane; O=INSECURE EXAMPLE; OU=kanidm; CN=idm.example.com
*  start date: Sep 20 09:28:18 2022 GMT
*  expire date: Oct 21 09:28:18 2022 GMT
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
> GET /status HTTP/1.1
> Host: idm.example.com:8443
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< cache-control: no-store, max-age=0
< content-length: 4
< content-type: application/json
< date: Tue, 20 Sep 2022 11:52:23 GMT
< pragma: no-cache
< set-cookie: kanidm-session=+LQJKwL0UdAEMoTc0Zrgne2hU+N2nB+Lcf+J1OoI9n4%3DNE7xuL9yCq7B0Ai+IM3gq5T+YZ0ckDuDoWZKzhPMHmSk3oFSscp9vy9n2a5bBFjWKgeNwdLzRbYc4rvMqYi11A%3D%3D; HttpOnly; SameSite=Strict; Secure; Path=/; Expires=Wed, 21 Sep 2022 11:52:23 GMT
< x-content-type-options: nosniff
< x-kanidm-opid: 8b25f050-7f6e-4ce1-befe-90be3c4f8a98
<
* Connection #0 to host localhost left intact
true

This means:

  1. you've successfully connected to a host (10.0.0.14),
  2. TLS worked
  3. Received the status response "true"

If you see something like this:

➜ curl -v https://idm.example.com:8443
*   Trying 10.0.0.1:8443...
* connect to 10.0.0.1 port 8443 failed: Connection refused
* Failed to connect to idm.example.com port 8443 after 5 ms: Connection refused
* Closing connection 0
curl: (7) Failed to connect to idm.example.com port 8443 after 5 ms: Connection refused

Then either your DNS is wrong (it's pointing at 10.0.0.1) or you can't connect to the server for some reason.

If you get errors about certificates, try adding -k to skip certificate verification checking and just test connectivity:

curl -vk https://idm.example.com:8443/status

Server things to check

  • Has the config file got bindaddress = "127.0.0.1:8443" ? Change it to bindaddress = "[::]:8443", so it listens on all interfaces.
  • Is there a firewall on the server?
  • If you're running in docker, did you expose the port (-p 8443:8443) or configure the network to host/macvlan/ipvlan?

Client errors

When you receive a client error it will list an "Operation ID" sometimes also called the OpId or KOpId. This UUID matches to the UUID's in the logs allowing you to precisely locate the server logs related to the failing operation.

Try running commands with RUST_LOG=debug to get more information:

RUST_LOG=debug kanidm login --name anonymous

Reverse Proxies not sending HTTP/1.1 requests

NGINX (and probably other proxies) send HTTP/1.0 requests to the upstream server by default. This'll lead to errors like this in your proxy logs:

*17 upstream prematurely closed connection while reading response header from upstream, client: 172.19.0.1, server: example.com, request: "GET / HTTP/1.1", upstream: "https://172.19.0.3:8443/", host: "example.com:8443"

The fix for NGINX is to set the proxy_http_version to 1.1. This can go in the same block as the proxy_pass option.

proxy_http_version 1.1

OpenTelemetry errors

If you see something like this:

OpenTelemetry trace error occurred. Exporter otlp encountered the following error(s): the grpc server returns error (The system is not in a state required for the operation's execution): , detailed error message: TRACE_TOO_LARGE: max size of trace (5000000) exceeded while adding 86725 bytes to trace a657b63f6ca0415eb70b6734f20f82cf for tenant single-tenant

Then you'l need to tweak the maximum trace size in your OTLP receiver. In Grafana Tempo you can add the following keys to your tempo.yaml, in this example we're setting it to 20MiB:

overrides:
  defaults:
    global:
      max_bytes_per_trace: 20971520 # 20MiB

Frequently Asked Questions

... or ones we think people might ask.

Why TLS?

You may have noticed that Kanidm requires you to configure TLS in your container or server install.

We are a secure-by-design rather than secure-by-configuration system, so TLS for all connections is considered mandatory and a default rather than an optional feature you add later.

Can Kanidm work without TLS?

No, it can not. TLS is required due to our use of secure-cookies. secure-cookies is a flag set in cookies that asks a client to transmit them back to the origin site if and only if the client sees HTTPS is present in the URL.

Kanidm's authentication system is a stepped challenge response design, where you initially request an "intent" to authenticate. Once you establish this intent, the server sets up a session-id into a secure cookie, and informs the client of what authentication methods can proceed.

If you do NOT have a HTTPS URL, the cookie with the session-id is not transmitted. The server detects this as an invalid-state request in the authentication design, and immediately breaks the connection, because it appears insecure. This prevents credential disclosure since the authentication session was not able to be established due to the lost session-id cookie.

Simply put, we are trying to use settings like secure_cookies to add constraints to the server so that you must perform and adhere to best practices - such as having TLS present on your communication channels.

This is also why we do not allow the server to start without a TLS certificate being configured.

Why disallow HTTP (without TLS) between my load balancer and Kanidm?

Because Kanidm is one of the keys to a secure network, and insecure connections to them are not best practice. This can allow account hijacking, privilege escalation, credential disclosures, personal information leaks and more. The entire path between a client and the server must be protected at all times.

OAuth2

RFC6819 - OAuth2 Threat Model and Security Considerations is a comprehensive and valuable resource discussing the security of OAuth2 and influences OpenID Connect as well. In general Kanidm follows and implements many of the recommendations in this document, as well as choosing not to implement certain known insecure OAuth2 features.

Why is disabling PKCE considered insecure?

RFC7636 - Proof Key for Code Exchange by OAuth Public Clients exists to prevent authorisation code interception attacks. This is where an attacker can retrieve the authorisation code and then perform the code exchange without the user being aware. A successful code exchange issues the attacker with an access_token and optionally a refresh_token. The RFC has an excellent explanation of the attack. Additionally, this threat is discussed in RFC6819 Section 4.4.1.

As Kanidm aims for "secure by default" design, even with confidential clients, we deem it important to raise the bar for attackers. For example an attacker may have access to the client_id and client_secret of a confidential client as it was mishandled by a system administrator. While they may not have direct access to the client/application systems, they could still use this client_id+secret to then carry out the authorisation code interception attack listed.

For confidential clients (refered to as a basic client in Kanidm due to the use of HTTP Basic for client_id+secret presentation) PKCE may optionally be disabled. This can allow authorisation code attacks to be carried out - however if TLS is used and the client_secret never leaks, then these attacks will not be possible. Since there are many public references to system administrators mishandling secrets such as these so we should not rely on this as our sole defence.

For public clients (which have no client_id authentication) we strictly enforce PKCE since disclosure of the authorisation code to an attacker will allow them to perform the code exchange.

OpenID connect internally has a nonce parameter in its operations. Commonly it is argued that this value removes the need for OpenID connect clients to implement PKCE. It does not. This parameter is not equivalent or a replacement for PKCE. While the nonce can assist with certain attack mitigations, authorisation code interception is not prevented by the presence or validation of the nonce value.

We would strongly encourage OAuth2 client implementations to implement and support PKCE, as it provides defense in depth to known and exploited authorisation code interception attacks.

Why is RSA considered legacy

While RSA is cryptographically sound, to achieve the same level as security as ECDSA it requires signatures and keys that are significantly larger. This has costs for network transmission and CPU time to verify these signatures. At this time (2024) to achieve the same level of security as a 256 bit ECDSA, RSA requires a 3072 bit key. Similarly a 384 bit ECDSA key requires a 8192 bit RSA for equivalent cryptographic strength, and a 521 bit ECDSA key would likely require a 16884 bit RSA key (or greater).

This means that going forward more applications will require ECDSA over RSA due to its increased strength for significantly faster and smaller key sizes.

Where this has more serious costs is our future desire to add support for Hardware Security Modules. Since RSA keys are much larger on these devices it may significantly impact performance of the HSM and may also limit the amount of keys we can store on the device. In the case of some HSM models, they do not even support RSA keys up to 8192 bits (but they do support ECDSA 384 and 521). An example of this is TPMs, which only support up to 4096 bit RSA keys at this time.

As a result, we want to guide people toward smaller, faster and more secure cryptographic standards like ECDSA. We want to encourage application developers to implement ECDSA in their OAuth2 applications as it is likely that limitations of RSA will be hit in the future.

Generally, it's also positive to encourage applications to review and update their cryptographic implementations over time too. Cryptography and security is not stangnant, it requires continual review, assessment and improvement.

Can I change the database backend from SQLite to - name of favourite database here -

No, it is not possible swap out the SQLite database for any other type of SQL server.

ATTEMPTING THIS WILL BREAK YOUR KANIDM INSTANCE IRREPARABLY

This question is normally asked because people want to setup multiple Kanidm servers connected to a single database.

Kanidm does not use SQL as a database. Kanidm uses SQL as a durable key-value store and Kanidm implements it's own database, caching, querying, optimisation and indexing on top of that key-value store.

As a result, because Kanidm specifically implements it's own cache layer above the key-value store (sqlite in this example) then if you were to connect two Kanidm instances to the same key-value store, as each server has it's own cache layer and they are not in contact, it is possible for writes on one server to never be observed by the second, and if the second were to then write over those entries it will cause loss of the changes from the first server.

Kanidm now implements it's own eventually consistent distributed replication which also removes the need for external databases to be considered.

Why aren't snaps launching with home_alias set?

Snaps rely on AppArmor and AppArmor doesn't follow symlinks. When home_alias is any value other than none a symlink will be created and pointing to home_attr. It is recommended to use alternative software packages to snaps.

All users in Kanidm can change their name (and their spn) at any time. If you change home_attr from uuid you must have a plan on how to manage these directory renames in your system.

Why so many crabs?

It's a rust thing.

Will you implement -insert protocol here-

Probably, on an infinite time-scale! As long as it's not STARTTLS. Please log an issue and start the discussion!

Why do the crabs have knives?

Don't ask. They just do.

Why won't you take this FAQ thing seriously?

Look, people just haven't asked many questions yet.

Glossary

This is a glossary of terms used through out this book. While we make every effort to explains terms and acronyms when they are used, this may be a useful reference if something feels unknown to you.

Domain Names

  • domain - This is the domain you "own". It is the highest level entity. An example would be example.com (since you do not own .com).
  • subdomain - A subdomain is a domain name space under the domain. A subdomains of example.com are a.example.com and b.example.com. Each subdomain can have further subdomains.
  • domain name - This is any named entity within your domain or its subdomains. This is the umbrella term, referring to all entities in the domain. example.com, a.example.com, host.example.com are all valid domain names with the domain example.com.
  • origin - An origin defines a URL with a protocol scheme, optional port number and domain name components. An example is https://host.example.com
  • effective domain - This is the extracted domain name from an origin excluding port and scheme.

Accounts

  • trust - A trust is when two Kanidm domains have a relationship to each other where accounts can be used between the domains. The domains retain their administration boundaries, but allow cross authentication.
  • replication - This is the process where two or more Kanidm servers in a domain can synchronise their database content.
  • UAT - User Authentication Token. This is a token issue by Kanidm to an account after it has authenticated.
  • SPN - Security Principal Name. This is a name of an account comprising it's name and domain name. This allows distinction between accounts with identical names over a trust boundary

Internals

  • entity, object, entry - Any item in the database. Generally these terms are interchangeable, but internally they are referred to as Entry.
  • account - An entry that may authenticate to the server, generally allowing extended permissions and actions to be undertaken.

Access Control

  • privilege - An expression of what actions an account may perform if granted
  • target - The entries that will be affected by a privilege
  • receiver - The entries that will be able to use a privilege
  • acp - an Access Control Profile which defines a set of privileges that are granted to receivers to affect target entries.
  • role - A term used to express a group that is the receiver of an access control profile allowing it's members to affect the target entries.

Getting Started (for Developers)

Setup the Server

It's important before you start trying to write code and contribute that you understand what Kanidm does and its goals.

An important first step is to install the server so if you have not done that yet, go and try that now! 😄

Setting up your Machine

Each operating system has different steps required to configure and build Kanidm.

MacOS

A prerequisite is Apple Xcode for access to git and compiler tools. You should install this first.

You will need rustup to install a Rust toolchain.

To build the Web UI you'll need wasm-pack (cargo install wasm-pack).

SUSE / OpenSUSE

You will need to install rustup and our build dependencies with:

zypper in rustup git libudev-devel sqlite3-devel libopenssl-3-devel libselinux-devel pam-devel tpm2-0-tss-devel

You can then use rustup to complete the setup of the toolchain.

In some cases you may need to build other vendored components, or use an alternate linker. In these cases we advise you to also install.

zypper in clang lld make sccache

You should also adjust your environment with:

export RUSTC_WRAPPER=sccache
export CC="sccache /usr/bin/clang"
export CXX="sccache /usr/bin/clang++"

And add the following to a cargo config of your choice (such as ~/.cargo/config), adjusting for cpu arch

[target.aarch64-unknown-linux-gnu]
linker = "clang"
rustflags = [
    "-C", "link-arg=-fuse-ld=lld",
]

Fedora

You will need rustup to install a Rust toolchain.

You will also need some system libraries to build this:

systemd-devel sqlite-devel openssl-devel pam-devel

Building the Web UI requires additional packages:

perl-FindBin perl-File-Compare

Ubuntu

You need rustup to install a Rust toolchain.

You will also need some system libraries to build this, which can be installed by running:

sudo apt-get install libudev-dev libssl-dev pkg-config libpam0g-dev

Tested with Ubuntu 20.04 and 22.04.

Windows

Kani Warning NOTICE
Our support for Windows is still in development, so you may encounter some compilation or build issues.

You need rustup to install a Rust toolchain.

An easy way to grab the dependencies is to install vcpkg.

This is how it works in the automated build:

  1. Enable use of installed packages for the user system-wide:

    vcpkg integrate install
    
  2. Install the openssl dependency, which compiles it from source. This downloads all sorts of dependencies, including perl for the build.

    vcpkg install openssl:x64-windows-static-md
    

There's a powershell script in the root directory of the repository which, in concert with openssl will generate a config file and certs for testing.

Getting the Source Code

Get Involved

To get started, you'll need to fork or branch, and we'll merge based on pull requests.

Kanidm is (largely) a monorepo. This can be checked out with:

git clone https://github.com/kanidm/kanidm.git
cd kanidm

Other supporting projects can be found on the project github

If you are forking, then fork in GitHub and then add your remote.

git remote add myfork git@github.com:<YOUR USERNAME>/kanidm.git

Select an issue (always feel free to reach out to us for advice!), and create a branch to start working:

git branch <feature-branch-name>
git checkout <feature-branch-name>
cargo test
Kani Warning IMPORTANT
Kanidm is unable to accept code that is generated by an AI for legal reasons. copilot and other tools that generate code in this way can not be used in Kanidm.

When you are ready for review (even if the feature isn't complete and you just want some advice):

  1. Run the test suite: cargo test
  2. Ensure rust formatting standards are followed: cargo fmt --check
  3. Try following the suggestions from clippy, after running cargo clippy. This is not a blocker on us accepting your code!
  4. Then commit your changes:
git commit -m 'Commit message' change_file.rs ...
git push <myfork> <feature-branch-name>

If you receive advice or make further changes, just keep committing to the branch, and pushing to your branch. When we are happy with the code, we'll merge in GitHub, meaning you can now clean up your branch.

git checkout master
git pull
git branch -D <feature-branch-name>

Rebasing

If you are asked to rebase your change, follow these steps:

git checkout master
git pull
git checkout <feature-branch-name>
git rebase master

Then be sure to fix any merge issues or other comments as they arise. If you have issues, you can always stop and reset with:

git rebase --abort

Building the Book

You'll need mdbook and the extensions to build the book:

cargo install mdbook mdbook-mermaid mdbook-template

To build it:

make book

Or to run a local webserver:

cd book
mdbook serve

Designs

See the "Design Documents" section of this book.

Rust Documentation

A list of links to the library documentation is at kanidm.com/documentation.

Advanced

Minimum Supported Rust Version

The MSRV is specified in the package Cargo.toml files.

We tend to be quite proactive in updating this to recent rust versions so we are open to increasing this value if required!

Build Profiles

Build profiles allow us to change the operation of Kanidm during it's compilation for development or release on various platforms. By default the "developer" profile is used that assumes the correct relative paths within the monorepo.

Setting different developer profiles while building is done by setting the environment variable KANIDM_BUILD_PROFILE to one of the bare filename of the TOML files in /profiles.

For example, this will set the CPU flags to "none" and the location for the Web UI files to /usr/share/kanidm/ui/pkg:

KANIDM_BUILD_PROFILE=release_suse_generic cargo build --release --bin kanidmd

Building the Web UI

NOTE: There is a pre-packaged version of the Web UI at /server/web_ui/pkg/, which can be used directly. This means you don't need to build the Web UI yourself.

The Web UI uses Rust WebAssembly rather than Javascript. To build this you need to set up the environment:

cargo install wasm-pack

Then you are able to build the UI:

cd server/web_ui/
./build_wasm_dev.sh

To build for release, run build_wasm.sh, or make webui from the project root.

The "developer" profile for kanidmd will automatically use the pkg output in this folder.

Development Server for Interactive Testing

Especially if you wish to develop the WebUI then the ability to run the server from the source tree is critical.

Once you have the source code, you need encryption certificates to use with the server, because without certificates, authentication will fail.

We recommend using Let's Encrypt, but if this is not possible kanidmd will create self-signed certificates in /tmp/kanidm.

You can now build and run the server with the commands below. It will use a database in /tmp/kanidm/kanidm.db.

Start the server

cd server/daemon
./run_insecure_dev_server.sh

While the server is running, you can use the admin socket to generate an admin password:

./run_insecure_dev_server.sh recover-account admin

Record the password above.

In a new terminal, you can now build and run the client tools with:

cargo run --bin kanidm -- --help
cargo run --bin kanidm -- login -H https://localhost:8443 -D anonymous -C /tmp/kanidm/ca.pem
cargo run --bin kanidm -- self whoami -H https://localhost:8443 -D anonymous -C /tmp/kanidm/ca.pem

cargo run --bin kanidm -- login -H https://localhost:8443 -D admin -C /tmp/kanidm/ca.pem
cargo run --bin kanidm -- self whoami -H https://localhost:8443 -D admin -C /tmp/kanidm/ca.pem

You may find it easier to modify ~/.config/kanidm per the book client tools section for extended administration locally.

Raw actions

Kani Warning NOTICE
It's not recommended to use these tools outside of extremely complex or advanced development requirements. These are a last resort!

The server has a low-level stateful API you can use for more complex or advanced tasks on large numbers of entries at once. Some examples are below, but generally we advise you to use the APIs or CLI tools. These are very handy to "unbreak" something if you make a mistake however!

# Create from json (group or account)
kanidm raw create -H https://localhost:8443 -C ../insecure/ca.pem -D admin example.create.account.json
kanidm raw create  -H https://localhost:8443 -C ../insecure/ca.pem -D idm_admin example.create.group.json

# Apply a json stateful modification to all entries matching a filter
kanidm raw modify -H https://localhost:8443 -C ../insecure/ca.pem -D admin '{"or": [ {"eq": ["name", "idm_person_account_create_priv"]}, {"eq": ["name", "idm_service_account_create_priv"]}, {"eq": ["name", "idm_account_write_priv"]}, {"eq": ["name", "idm_group_write_priv"]}, {"eq": ["name", "idm_people_write_priv"]}, {"eq": ["name", "idm_group_create_priv"]} ]}' example.modify.idm_admin.json
kanidm raw modify -H https://localhost:8443 -C ../insecure/ca.pem -D idm_admin '{"eq": ["name", "idm_admins"]}' example.modify.idm_admin.json

# Search and show the database representations
kanidm raw search -H https://localhost:8443 -C ../insecure/ca.pem -D admin '{"eq": ["name", "idm_admin"]}'

# Delete all entries matching a filter
kanidm raw delete -H https://localhost:8443 -C ../insecure/ca.pem -D idm_admin '{"eq": ["name", "test_account_delete_me"]}'

Build a Kanidm Container

Build a container with the current branch using:

make <TARGET>

Check make help for a list of valid targets.

The following environment variables control the build:

ENV variableDefinitionDefault
IMAGE_BASEBase location of the container image.kanidm
IMAGE_VERSIONDetermines the container's tag.None
CONTAINER_TOOL_ARGSSpecify extra options for the container build tool.None
IMAGE_ARCHPassed to --platforms when the container is built.linux/amd64,linux/arm64
CONTAINER_BUILD_ARGSOverride default ARG settings during the container build.None
CONTAINER_TOOLUse an alternative container build tool.docker
BOOK_VERSIONSets version used when building the documentation book.master

Container Build Examples

Build a kanidm container using podman:

CONTAINER_TOOL=podman make build/kanidmd

Build a kanidm container and use a redis build cache:

CONTAINER_BUILD_ARGS='--build-arg "SCCACHE_REDIS=redis://redis.dev.blackhats.net.au:6379"' make build/kanidmd

Automatically Built Containers

To speed up testing across platforms, we're leveraging GitHub actions to build containers for test use.

Whenever code is merged with the master branch of Kanidm, containers are automatically built for kanidmd and radius. Sometimes they fail to build, but we'll try to keep them available.

To find information on the packages, visit the Kanidm packages page.

An example command for pulling and running the radius container is below. You'll need to authenticate with the GitHub container registry first.

docker pull ghcr.io/kanidm/radius:devel
docker run --rm -it \
    -v $(pwd)/kanidm:/data/kanidm \
    ghcr.io/kanidm/radius:devel

This assumes you have a kanidm client configuration file in the current working directory.

Testing the OpenAPI generator things

There's a script in scripts/openapi_tests which runs a few docker containers - you need to be running a local instance on port 8443 to be able to pull the JSON file for testing.

Frequently Asked Questions

This is a list of common questions that are generally raised by developers or technical users.

Why don't you use library/project X?

A critical aspect of kanidm is the ability to test it. Generally requests to add libraries or projects can come in different forms so I'll answer to a few of them:

Is the library in Rust?

If it's not in Rust, it's not eligible for inclusion. There is a single exception today (rlm python) but it's very likely this will also be removed in the future. Keeping a single language helps with testing, but also makes the project more accessible and consistent to developers. Additionally, features exist in Rust that help to improve the quality of the project from development to production.

Is the project going to create a microservice like architecture?

If the project (such as an external OAuth/OIDC gateway, or a different DB layer) would be used in a tight-knit manner to Kanidm then it is no longer a microservice, but a monolith with multiple moving parts. This creates production fragility and issues such as:

  • Differences and difficulties in correlating log events
  • Design choices of the project not being compatible with Kanidm's model
  • Extra requirements for testing/production configuration

This last point is key. It is a critical part of kanidm that the following must work on all machines, and run every single test in the suite.

git clone https://github.com/kanidm/kanidm.git
cd kanidm
cargo test

Not only this, but it's very important for quality that running cargo test truly tests the entire stack of the application - from the database, all the way to the client utilities and other daemons communicating to a real server. Many developer choices have already been made to ensure that testing is the most important aspect of the project to ensure that every feature is high quality and reliable.

The addition of extra projects or dependencies would violate this principle and lead to a situation where it would not be possible to effectively test for all developers.

Why don't you use Raft/Etcd/MongoDB/Other to solve replication?

There are a number of reasons why these are generally not compatible. Generally these databases or technologies do solve problems, but they are not the problems in Kanidm.

CAP theorem

CAP theorem states that in a database you must choose only two of the three possible elements:

  • Consistency - All servers in a topology see the same data at all times
  • Availability - All servers in a topology can accept write operations at all times
  • Partitioning - In the case of a network separation in the topology, all systems can continue to process read operations

Many protocols like Raft or Etcd are databases that provide PC guarantees. They guarantee that they are always consistent, and can always be read in the face of partitioning, but to accept a write, they must not be experiencing a partitioning event. Generally, this is achieved by the fact that these systems elect a single node to process all operations, and then re-elect a new node in the case of partitioning events. The elections will fail if a quorum is not met disallowing writes throughout the topology.

This doesn't work for Authentication systems and global scale databases. As you introduce non-negligible network latency, the processing of write operations will decrease in these systems. This is why Google's Spanner is a PA system.

PA systems are also considered to be "eventually consistent". All nodes can provide reads and writes at all times, but during a network partitioning or after a write there is a delay for all nodes to arrive at a consistent database state. A key element is that the nodes perform a consistency operation that uses application aware rules to allow all servers to arrive at the same state without communication between the nodes.

Update Resolution

Many databases do exist that are PA, such as CouchDB or MongoDB. However, they often do not have the properties required in update resolution that is required for Kanidm.

An example of this is that CouchDB uses object-level resolution. This means that if two servers update the same entry the "latest write wins". An example of where this won't work for Kanidm is if one server locks the account as an admin is revoking the access of an account, but another account updates the username. If the username update happened second, the lock event would be lost creating a security risk. There are certainly cases where this resolution method is valid, but Kanidm is not one.

Another example is MongoDB. While it does attribute level resolution, it does this without the application awareness of Kanidm. For example, in Kanidm if we have an account lock based on time, we can select the latest time value to over-write the following, or we could have a counter that can correctly increment/advance between the servers. However, Mongo is not aware of these rules, and it would not be able to give the experience we desire. Mongo is a very good database, it's just not the right choice for Kanidm.

Additionally, it's worth noting that most of these other databases would violate the previous desires to keep the language as Rust and may require external configuration or daemons which may not be possible to test.

How PAM/nsswitch Work

Linux and BSD clients can resolve identities from Kanidm into accounts via PAM and nsswitch.

Name Service Switch (NSS) is used for connecting the computers with different data sources to resolve name-service information. By adding the nsswitch libraries to /etc/nsswitch.conf, we are telling NSS to lookup password info and group identities in Kanidm:

passwd: compat kanidm
group: compat kanidm

When a service like sudo, sshd, su, etc. wants to authenticate someone, it opens the pam.d config of that service, then performs authentication according to the modules defined in the pam.d config. For example, if you run ls -al /etc/pam.d /usr/etc/pam.d in SUSE, you can see the services and their respective pam.d config.

Troubleshooting builds

WASM Build failures due to "Error: Not able to find or install a local wasm-bindgen."

This seems to relate to a version mismatch error in wasm-pack as seen in this thread in the wasm-pack repository.

Try reinstalling wasm-bindgen-cli by running the following (the --force is important):

cargo install --force wasm-bindgen-cli

Or reinstalling wasm-pack similarly.

If that doesn't work, try running the build with the RUST_LOG=debug environment variable to investigate further.

Access Profiles Rework 2022

Access controls are critical for a project like Kanidm to determine who can access what on other entries. Our access controls have to be dynamic and flexible as administrators will want to define their own access controls. In almost every call in the server, they are consulted to determine if the action can be carried out. We also supply default access controls so that out of the box we are a complete and useful IDM.

The original design of the access control system was intended to satisfy our need for flexibility, but we have begun to discover a number of limitations. The design incorporating filter queries makes them hard to administer as we have not often publicly talked about the filter language and how it internally works. Because of their use of filters it is hard to see on an entry "what" access controls will apply to entries, making it hard to audit without actually calling the ACP subsystem. Currently the access control system has a large impact on performance, accounting for nearly 35% of the time taken in a search operation.

Additionally, the default access controls that we supply have started to run into limits and rough cases due to changes as we have improved features. Some of this was due to limited design with user cases in mind during development.

To resolve this a number of coordinating features need implementation to improve this situation. These features will be documented first, and the use cases second with each use case linking to the features that satisfy it.

Required Features to Satisfy

Refactor of default access controls

The current default privileges will need to be refactored to improve separation of privilege and improved delegation of finer access rights.

Access profiles target specifiers instead of filters

Access profiles should target a list of groups for who the access profile applies to, and who receives the access it is granting.

Alternately an access profile could target "self" so that self-update rules can still be expressed.

An access profile could target an oauth2 definition for the purpose of allowing reads to members of a set of scopes that can access the service.

The access profile receiver would be group based only. This allows specifying that "X group of members can write self" meaning that any member of that group can write to themself and only themself.

In the future we could also create different target/receiver specifiers to allow other extended management and delegation scenarioes. This improves the situation making things more flexible from the current filter system. It also may allow filters to be simplified to remove the SELF uuid resolve step in some cases.

Filter based groups

These are groups who's members are dynamically allocated based on a filter query. This allows a similar level of dynamic group management as we have currently with access profiles, but with the additional ability for them to be used outside of the access control context. This is the "bridge" allowing us to move from filter based access controls to "group" targeted.

A risk of filter based groups is "infinite churn" because of recursion. This can occur if you had a rule such a "and not memberof = self" on a dynamic group. Because of this, filters on dynamic groups may not use "memberof" unless they are internally provided by the kanidm project so that we can vet these rules as correct and without creating infinite recursion scenarioes.

Access rules extracted to ACI entries on targets

The access control profiles are an excellent way to administer access where you can specific whom has access to what, but it makes it harder for the reverse query which is "who has access to this specific entity". Since this is needed for both search and auditing, by specifying our access profiles in the current manner, but using them to generate ACE rules on the target entry will allow the search and audit paths to answer the question of "who has access to this entity" much faster.

Sudo Mode

A flag should exist on a session defining "sudo" mode which requires a special account policy membership OR a re-authentication to enable. This sudo flag is a time window on a session token which can allow/disallow certain behaviours. It would be necessary for all write paths to have access to this value.

Account Policy

Account policy defines rules on accounts and what they can or can't do with regard to properties and authentication. This is required for sudo mode so that a group of accounts can be "always in sudo" mode and this enforces rules on session expiry.

Access Control Use Cases

Default Roles / Separation of Privilege

By default we attempt to separate privileges so that "no single account" has complete authority over the system.

Satisfied by:

  • Refactor of default access controls
  • Filter based groups
  • Sudo Mode

System Admin

This role, also called "admins" is responsible to manage Kanidm as a service. It does NOT manage users or accounts.

The "admins" role is responsible to manage:

  • The name of the domain
  • Configuration of the servers and replication
  • Management of external integrations (oauth2)

Service Account Admin

The role would be called "sa_admins" and would be responsible for top level management of service accounts, and delegating authority for service account administration to managing users.

  • Create service accounts
  • Delegate service account management to owners groups
  • Migrate service accounts to persons

The service account admin is capable of migrating service accounts to persons as it is "yielding" control of the entity, rather than an idm admin "taking" the entity which may have security impacts.

Service Desk

This role manages a subset of persons. The helpdesk roles are precluded from modification of "higher privilege" roles like service account, identity and system admins. This is due to potential privilege escalation attacks.

  • Can create credential reset links
  • Can lock and unlock accounts and their expiry.

Idm Admin

This role manages identities, or more specifically person accounts. In addition in is a "high privilege" service desk role and can manage high privilege users as well.

  • Create persons
  • Modify and manage persons
  • All roles of service desk for all persons

Self Write / Write Privilege

Currently write privileges are always available to any account post-authentication. Writes should only be available after an extra "challenge" or "sudo" style extra auth, and only have a limited time window of usage. The write window can be extended during the session. This allows extremely long lived sessions contrast to the current short session life. It also makes it safer to provide higher levels of privilege to persons since these rights are behind a re-authentication event.

Some accounts should always be considered able to write, and these accounts should have limited authentication sessions as a result of this.

Satisfied by:

  • Access profiles target specifiers instead of filters
  • Sudo Mode

Oauth2 Service Read (Nice to Have)

For ux/ui integration, being able to list oauth2 applications that are accessible to the user would be a good feature. To limit "who" can see the oauth2 applications that an account can access a way to "allow read" but by proxy of the related users of the oauth2 service. This will require access controls to be able to interpret the oauth2 config and provide rights based on that.

Satisfied by:

  • Access profiles target specifiers instead of filters

Administration

Access controls should be easier to manage and administer, and should be group based rather than filter based. This will make it easier for administrators to create and define their own access rules.

  • Refactor of default access controls
  • Access profiles target specifiers instead of filters
  • Filter based groups

Service Account Access

Service accounts should be able to be "delegated" administration, where a group of users can manage a service account. This should not require administrators to create unique access controls for each service account, but a method to allow mapping of the service account to "who manages it".

  • Sudo Mode
  • Account Policy
  • Access profiles target specifiers instead of filters
  • Refactor of default access controls

Auditing of Access

It should be easier to audit whom has access to what by inspecting the entry to view what can access it.

  • Access rules extracted to ACI entries on targets
  • Access profiles target specifiers instead of filters

Access Profiles

Access Profiles (ACPs) are a way of expressing the set of actions which accounts are permitted to perform on database records (object) in the system.

As a result, there are specific requirements to what these can control and how they are expressed.

Access profiles define an action of allow or deny: deny has priority over allow and will override even if applicable. They should only be created by system access profiles because certain changes must be denied.

Access profiles are stored as entries and are dynamically loaded into a structure that is more efficient for use at runtime. Schema and its transactions are a similar implementation.

Search Requirements

A search access profile must be able to limit:

  1. the content of a search request and its scope.
  2. the set of data returned from the objects visible.

An example:

Alice should only be able to search for objects where the class is person and the object is a memberOf the group called "visible".

Alice should only be able to see those the attribute displayName for those users (not their legalName), and their public email.

Worded a bit differently. You need permission over the scope of entries, you need to be able to read the attribute to filter on it, and you need to be able to read the attribute to receive it in the result entry.

If Alice searches for (&(name=william)(secretdata=x)), we should not allow this to proceed because Alice doesn't have the rights to read secret data, so they should not be allowed to filter on it. How does this work with two overlapping ACPs? For example: one that allows read of name and description to class = group, and one that allows name to user. We don't want to say (&(name=x)(description=foo)) and it to be allowed, because we don't know the target class of the filter. Do we "unmatch" all users because they have no access to the filter components? (Could be done by inverting and putting in an AndNot of the non-matchable overlaps). Or do we just filter our description from the users returned (But that implies they DID match, which is a disclosure).

More concrete:

search {
    action: allow
    targetscope: Eq("class", "group")
    targetattr: name
    targetattr: description
}

search {
    action: allow
    targetscope: Eq("class", "user")
    targetattr: name
}

SearchRequest {
    ...
    filter: And: {
        Pres("name"),
        Pres("description"),
    }
}

A potential defense is:

acp class group: Pres(name) and Pres(desc) both in target attr, allow
acp class user: Pres(name) allow, Pres(desc) deny. Invert and Append

So the filter now is:

And: {
    AndNot: {
        Eq("class", "user")
    },
    And: {
        Pres("name"),
        Pres("description"),
    },
}

This would now only allow access to the name and description of the class group.

If we extend this to a third, this would work. A more complex example:

search {
    action: allow
    targetscope: Eq("class", "group")
    targetattr: name
    targetattr: description
}

search {
    action: allow
    targetscope: Eq("class", "user")
    targetattr: name
}

search {
    action: allow
    targetscope: And(Eq("class", "user"), Eq("name", "william"))
    targetattr: description
}

Now we have a single user where we can read description. So the compiled filter above as:

And: {
    AndNot: {
        Eq("class", "user")
    },
    And: {
        Pres("name"),
        Pres("description"),
    },
}

This would now be invalid, first, because we would see that class=user and william has no name so that would be excluded also. We also may not even have "class=user" in the second ACP, so we can't use subset filter matching to merge the two.

As a result, I think the only possible valid solution is to perform the initial filter, then determine on the candidates if we could have have valid access to filter on all required attributes. IE this means even with an index look up, we still are required to perform some filter application on the candidates.

I think this will mean on a possible candidate, we have to apply all ACP, then create a union of the resulting targetattrs, and then compared that set into the set of attributes in the filter.

This will be slow on large candidate sets (potentially), but could be sped up with parallelism, caching or other methods. However, in the same step, we can also apply the step of extracting only the allowed read target attrs, so this is a valuable exercise.

Delete Requirements

A delete profile must contain the content and scope of a delete.

An example:

Alice should only be able to delete objects where the memberOf is purgeable, and where they are not marked as protected.

Create Requirements

A create profile defines the following limits to what objects can be created, through the combination of filters and attributes.

An example:

Alice should only be able to create objects where the class is group, and can only name the group, but they cannot add members to the group.

An example of a content requirement could be something like "the value of an attribute must pass a regular expression filter". This could limit a user to creating a group of any name, except where the group's name contains "admin". This a contrived example which is also possible with filtering, but more complex requirements are possible.

For example, we want to be able to limit the classes that someone could create on an object because classes often are used in security rules.

Modify Requirements

A modify profile defines the following limits:

  • a filter for which objects can be modified,
  • a set of attributes which can be modified.

A modify profile defines a limit on the modlist actions.

For example: you may only be allowed to ensure presence of a value. (Modify allowing purge, not-present, and presence).

Content requirements (see Create Requirements) are out of scope at the moment.

An example:

Alice should only be able to modify a user's password if that user is a member of the students group.

Note: modify does not imply read of the attribute. Care should be taken that we don't disclose the current value in any error messages if the operation fails.

Targeting Requirements

The target of an access profile should be a filter defining the objects that this applies to.

The filter limit for the profiles of what they are acting on requires a single special operation which is the concept of "targeting self".

For example: we could define a rule that says "members of group X are allowed self-write to the mobilePhoneNumber attribute".

An extension to the filter code could allow an extra filter enum of self, that would allow this to operate correctly, and would consume the entry in the event as the target of "Self". This would be best implemented as a compilation of self -> eq(uuid, self.uuid).

Implementation Details

CHANGE: Receiver should be a group, and should be single value/multivalue? Can only be a group.

Example profiles:

search {
    action: allow
    receiver: Eq("memberof", "admins")
    targetscope: Pres("class")
    targetattr: legalName
    targetattr: displayName
    description: Allow admins to read all users names
}

search {
    action: allow
    receiver: Self
    targetscope: Self
    targetattr: homeAddress
    description: Allow everyone to read only their own homeAddress
}

delete {
    action: allow
    receiver: Or(Eq("memberof", "admins), Eq("memberof", "servicedesk"))
    targetscope: Eq("memberof", "tempaccount")
    description: Allow admins or servicedesk to delete any member of "temp accounts".
}

// This difference in targetscope behaviour could be justification to change the keyword here
// to prevent confusion.
create {
    action: allow
    receiver: Eq("name", "alice")
    targetscope: And(Eq("class", "person"), Eq("location", "AU"))
    createattr: location
    createattr: legalName
    createattr: mail
    createclass: person
    createclass: object
    description: Allow alice to make new persons, only with class person+object, and only set
        the attributes mail, location and legalName. The created object must conform to targetscope
}

modify {
    action: allow
    receiver: Eq("name", "claire")
    targetscope: And(Eq("class", "group"), Eq("name", "admins"))
    presentattr: member
    description: Allow claire to promote people as members of the admins group.
}

modify {
    action: allow
    receiver: Eq("name", "claire")
    targetscope: And(Eq("class", "person"), Eq("memberof", "students"))
    presentattr: sshkeys
    presentattr: class
    targetclass: unixuser
    description: Allow claire to modify persons in the students group, and to grant them the
        class of unixuser (only this class can be granted!). Subsequently, she may then give
        the sshkeys values as a modification.
}

modify {
    action: allow
    receiver: Eq("name", "alice")
    targetscope: Eq("memberof", "students")
    removedattr: sshkeys
    description: Allow allice to purge or remove sshkeys from members of the students group,
        but not add new ones
}

modify {
    action: allow
    receiver: Eq("name", "alice")
    targetscope: Eq("memberof", "students")
    removedattr: sshkeys
    presentattr: sshkeys
    description: Allow alice full control over the ssh keys attribute on members of students.
}

// This may not be valid: Perhaps if <*>attr: is on modify/create, then targetclass, must
// must be set, else class is considered empty.
//
// This profile could in fact be an invalid example, because presentattr: class, but not
// targetclass, so nothing could be granted.
modify {
    action: allow
    receiver: Eq("name", "alice")
    targetscope: Eq("memberof", "students")
    presentattr: class
    description: Allow alice to grant any class to members of students.
}

Formalised Schema

A complete schema would be:

Attributes

NameSingle/MultiTypeDescription
acp_allowsingle valuebool
acp_enablesingle valueboolThis ACP is enabled
acp_receiversingle valuefilter???
acp_targetscopesingle valuefilter???
acp_search_attrmulti valueutf8 case insenseA list of attributes that can be searched.
acp_create_classmulti valueutf8 case insenseObject classes in which an object can be created.
acp_create_attrmulti valueutf8 case insenseAttribute Entries that can be created.
acp_modify_removedattrmulti valueutf8 case insenseModify if removed?
acp_modify_presentattrmulti valueutf8 case insense???
acp_modify_classmulti valueutf8 case insense???

Classes

NameMust HaveMay Have
access_control_profile[acp_receiver, acp_targetscope][description, acp_allow]
access_control_search[acp_search_attr]
access_control_delete
access_control_modify[acp_modify_removedattr, acp_modify_presentattr, acp_modify_class]
access_control_create[acp_create_class, acp_create_attr]

Important: empty sets really mean empty sets!

The ACP code will assert that both access_control_profile and one of the search/delete/modify/create classes exists on an ACP. An important factor of this design is now the ability to compose multiple ACP's into a single entry allowing a create/delete/modify to exist! However, each one must still list their respective actions to allow proper granularity.

"Search" Application

The set of access controls is checked, and the set where receiver matches the current identified user is collected. These then are added to the users requested search as:

And(<User Search Request>, Or(<Set of Search Profile Filters))

In this manner, the search security is easily applied, as if the targets to conform to one of the required search profile filters, the outer And condition is nullified and no results returned.

Once complete, in the translation of the entry -> proto_entry, each access control and its allowed set of attrs has to be checked to determine what of that entry can be displayed. Consider there are three entries, A, B, C. An ACI that allows read of "name" on A, B exists, and a read of "mail" on B, C. The correct behaviour is then:

A: name
B: name, mail
C: mail

So this means that the entry -> proto entry part is likely the most expensive part of the access control operation, but also one of the most important. It may be possible to compile to some kind of faster method, but initially a simple version is needed.

"Delete" Application

Delete is similar to search, however there is the risk that the user may say something like:

Pres("class").

Were we to approach this like search, this would then have "every thing the identified user is allowed to delete, is deleted". A consideration here is that Pres("class") would delete "all" objects in the directory, but with the access control present, it would limit the deletion to the set of allowed deletes.

In a sense this is a correct behaviour - they were allowed to delete everything they asked to delete. However, in another it's not valid: the request was broad and they were not allowed access to delete everything they requested.

The possible abuse vector here is that an attacker could then use delete requests to enumerate the existence of entries in the database that they do not have access to. This requires someone to have the delete privilege which in itself is very high level of access, so this risk may be minimal.

So the choices are:

  1. Treat it like search and allow the user to delete what they are allowed to delete, but ignore other objects
  2. Deny the request because their delete was too broad, and they must specify a valid deletion request.

Option #2 seems more correct because the delete request is an explicit request, not a request where you want partial results. Imagine someone wants to delete users A and B at the same time, but only has access to A. They want this request to fail so they KNOW B was not deleted, rather than it succeed and have B still exist with a partial delete status.

However, a possible issue is that Option #2 means that a delete request of And(Eq(attr, allowed_attribute), Eq(attr, denied)), which is rejected may indicate presence of the denied attribute. So option #1 may help in preventing a security risk of information disclosure.

This is also a concern for modification, where the modification attempt may or may not fail depending on the entries and if you can/can't see them.

IDEA: You can only delete/modify within the read scope you have. If you can't read it (based on the read rules of search), you can't delete it. This is in addition to the filter rules of the delete applying as well. So performing a delete of Pres(class), will only delete in your read scope and will never disclose if you are denied access.

"Create" Application

Create seems like the easiest to apply. Ensure that only the attributes in createattr are in the createevent, ensure the classes only contain the set in createclass, then finally apply filter_no_index to the entry to entry. If all of this passes, the create is allowed.

A key point is that there is no union of create ACI's - the WHOLE ACI must pass, not parts of multiple. This means if a control say "allows creating group with member" and "allows creating user with name", creating a group with name is not allowed - despite your ability to create an entry with name, its classes don't match. This way, the administrator of the service can define create controls with specific intent for how they will be used without the risk of two controls causing unintended effects (users that are also groups, or allowing invalid values.

An important consideration is how to handle overlapping ACI. If two ACI could match the create should we enforce both conditions are upheld? Or only a single upheld ACI allows the create?

In some cases it may not be possible to satisfy both, and that would block creates. The intent of the access profile is that "something like this CAN" be created, so I believe that provided only a single control passes, the create should be allowed.

"Modify" Application

Modify is similar to Create, however we specifically filter on the modlist action of present, removed or purged with the action. The rules of create still apply; provided all requirements of the modify are permitted, then it is allowed once at least one profile allows the change.

A key difference is that if the modify ACP lists multiple presentattr types, the modify request is valid if it is only modifying one attribute. IE we say presentattr: name, email, but we only attempt to modify email.

Considerations

  • When should access controls be applied? During an operation, we only validate schema after pre* Plugin application, so likely it has to be "at that point", to ensure schema-based validity of the entries that are allowed to be changed.
  • Self filter keyword should compile to eq("uuid", "...."). When do we do this and how?
  • memberof could take name or uuid, we need to be able to resolve this correctly, but this is likely an issue in memberof which needs to be addressed, ie memberof uuid vs memberof attr.
  • Content controls in create and modify will be important to get right to avoid the security issues of LDAP access controls. Given that class has special importance, it's only right to give it extra consideration in these controls.
  • In the future when recyclebin is added, a re-animation access profile should be created allowing revival of entries given certain conditions of the entry we are attempting to revive. A service-desk user should not be able to revive a deleted high-privilege user.

Access Control Defaults

  • Do we need some kind of permission atoms to allow certain tasks?

Use Cases:

  • User sign-up portal (need service account that can create users and do cred reset)

  • Role for service account generation.

  • Remote backup - this account should be able to trigger and retrieve a backup

  • Groups should be able to be changed by a managing group (managed by)

  • IP limits on accounts?

  • Users need to not be able to see other users.

    • Means the user can't read member attr, but can see groups + group info.
  • Anonymous needs to be able to be blocked more easily.

  • Enable disable self-mail write

  • Enable disable self-name-change

To achieve

  • IP access limits
  • Managed By rules
  • Better group specification syntax (not filters)

Domain Admin

graph LR

DomainAdmin("Domain Admin") --> AccessControlAdmin("Access Control Admin")
DomainAdmin("Domain Admin") --> AccountPolicyAdmin("Account Policy Admin")
DomainAdmin("Domain Admin") --> DomainConfigAdmin("Domain Config Admin")
DomainAdmin("Domain Admin") --> HPGroupAdmin("HP Group Admin")
DomainAdmin("Domain Admin") --> SchemaAdmin("Schema Admin")
DomainAdmin("Domain Admin") --> SyncAccountAdmin("Sync Account Admin")

IDM Admin

graph LR

IdmAdmin("IDM Admin") --> GroupAdmin("Group Admin")
IdmAdmin("IDM Admin") --> PersonAdmin("Person Admin")
IdmAdmin("IDM Admin") --> PersonPIIModify("Person PII Modify")
IdmAdmin("IDM Admin") --> PersonReadNoPII("Person Read No PII")
IdmAdmin("IDM Admin") --> PosixAccountIncludesCredMod("POSIX Account - [Includes Cred Mod]")
IdmAdmin("IDM Admin") --> RadiusAccountModify("Radius Account Modify")

Integration Admin

graph LR

IntegrationAdmin("Integration Admin") --> Oauth2Admin("Oauth2 Admin")
IntegrationAdmin("Integration Admin") --> PosixAccountConsumer("POSIX Account Consumer")
IntegrationAdmin("Integration Admin") --> RadiusServiceAdmin("Radius Service Admin")

Help Desk

graph LR

HelpDesk("Help Desk") --> PersonCredentialModify("Person Credential Modify")
HelpDesk("Help Desk") --> PersonReadNoPII("Person Read No PII")

Account "Self"

graph LR

SelfMailModify("Self Mail Modify") --> |"Modifies"| Self
SelfRead("Self Read") --> |"Read"| Self
SelfModify("Self Modify") --> |"Writes Secrets"| Self
SelfNameModify("Self Name Modify") --> |"Modifies"| Self

Duplicated for Service Accounts, HP persons, HP service Accounts.

graph LR

PersonOnBoard("Person On Board") --> |"Creates"| Persons("Persons")
PersonAdmin("Person Admin") --> |"Creates Deletes"| Persons("Persons")
PersonPIIModify --> |"Reads Modifies"| Persons
PersonPIIModify("Person PII Modify") -.-> |"Member of"| PersonAdmin
PersonCredentialModify("Person Credential Modify") -.-> |"Member of"| PersonAdmin
PersonCredentialModify("Person Credential Modify") -.-> |"Member of"| PersonOnBoard
PersonCredentialModify --> |"Reads Modifies"| Persons
PersonCredentialModify --> |"Reads"| PersonReadNoPII("Person Read No PII")
PersonAdmin --> PersonReadWithPII("Person Read - With PII")
PersonReadWithPII --> PersonReadNoPII
PersonReadNoPII --> |"Reads"| Persons
PosixAccountIncludesCredMod --> |"Extends (Add Posix Account)"| Persons

Domain and Schema

graph LR

DomainConfigAdmin("Domain Configuration Admin") --> |"Modifies Reads"| Domain
DomainConfigAdmin("Domain Configuration Admin") --> |"Modifies Reads"| System
SyncAccountAdmin("Sync Account Admin") --> |"Creates Modifies Deletes"| SyncAccounts("Sync Accounts")
SchemaAdmin("Schema Admin") --> |"Creates Modifies"| Schema("Schema")
AccessControlAdmin("Access Control Admin") --> |"Creates Modifies Deletes"| AccessControls("Access Controls")

High-Priv and Groups

graph LR

GroupAdmin("Group Admin") --> |"Create Modify Delete"| Groups("Groups")
AccountPolicyAdmin("Account Policy Admin") --> |"Modifies Extends"| Groups("Groups")
GroupAdmin --> |"Modify Delete"| HPGroups("HP Groups")
GroupAdmin --> |"Add Members"| HPGroup("HP Group")

HPGroupAdmin("HP Group Admin") --> HPGroup
GroupAdmin -.-> |"Inherits"| HPGroupAdmin

OAuth2 Specific

graph LR

Oauth2Admin("Oauth2 Admin") --> |"Creates Modifies Delegates"| Oauth2RS("Oauth2 RS")
ScopedMember("Scoped Member") --> |"Reads"| Oauth2RS

POSIX-Specific

graph LR

PosixAccountConsumer("POSIX Account Consumer") --> |"Reads Auths"| PosixAccounts("Posix Accounts")

Radius

graph LR

RadiusServiceAdmin("Radius Service Admin") --> |"Adds Members"| RadiusService("Radius Service")
RadiusService --> |"Reads Secrets"| RadiusAccounts("Radius Accounts")
RadiusAccountModify("Radius Account Modify") --> |"Writes Secrets"| RadiusAccounts

Recycle Bin Admin

graph LR

RecycleBinAdmin("Recycle Bin Admin") --> |"Modifies Reads Revives"| RecycledEntries("Recycled Entries")

Architectural Overview

Kanidm has a number of components and layers that make it up. As this project is continually evolving, if you have questions or notice discrepancies with this document please contact William (Firstyear) at any time.

Tools

Kanidm Tools are a set of command line clients that are intended to help administrators deploy, interact with, and support a Kanidm server installation. These tools may also be used for servers or machines to authenticate and identify users. This is the "human interaction" part of the server from a CLI perspective.

Clients

The kanidm client is a reference implementation of the client library, that others may consume or interact with to communicate with a Kanidm server instance. The tools above use this client library for all of its actions. This library is intended to encapsulate some high level logic as an abstraction over the REST API.

Proto

The kanidm proto is a set of structures that are used by the REST and raw API's for HTTP communication. These are intended to be a reference implementation of the on-the-wire protocol, but importantly these are also how the server represents its communication. This makes this the authoritative source of protocol layouts with regard to REST or raw communication.

Kanidmd (main server)

Kanidmd is intended to have minimal (thin) client tools, where the server itself contains most logic for operations, transformations, and routing of requests to their relevant datatypes. As a result, the kanidmd section is the largest component of the project as it implements nearly everything required for IDM functionality to exist.

Search

Search is the "hard worker" of the server, intended to be a fast path with minimal overhead so that clients can acquire data as quickly as possible. The server follows the below pattern.

Search flow diagram

(1) All incoming requests are from a client on the left. These are either REST requests, or a structured protocol request via the raw interface. It's interesting to note the raw request is almost identical to the queryserver event types - whereas REST requests we have to generate request messages that can become events.

The frontend uses a webserver with a thread-pool to process and decode network I/O operations concurrently. This then sends asynchronous messages to a worker (actor) pool for handing.

(2) These search messages in the actors are transformed into "events" - a self-contained structure containing all relevant data related to the operation at hand. This may be the event origin (a user or internal), the requested filter (query), and perhaps even a list of attributes requested. These events are designed to ensure correctness. When a search message is transformed to a search event, it is checked by the schema to ensure that the request is valid and can be satisfied securely.

As these workers are in a thread pool, it's important that these are concurrent and do not lock or block - this concurrency is key to high performance and safety. It's also worth noting that this is the level where read transactions are created and committed - all operations are transactionally protected from an early stage to guarantee consistency of the operations.

  1. When the event is known to be consistent, it is then handed to the queryserver - the query server begins a process of steps on the event to apply it and determine the results for the request. This process involves further validation of the query, association of metadata to the query for the backend, and then submission of the high-level query to the backend.

  2. The backend takes the request and begins the low-level processing to actually determine a candidate set. The first step in query optimisation, to ensure we apply the query in the most efficient manner. Once optimised, we then use the query to query indexes and create a potential candidate set of identifiers for matching entries (5.). Once we have this candidate id set, we then retrieve the relevant entries as our result candidate set (6.) and return them (7.) to the backend.

  3. The backend now deserialises the databases candidate entries into a higher level and structured (and strongly typed) format that the query server knows how to operate on. These are then sent back to the query server.

  4. The query server now applies access controls over what you can / can't see. This happens in two phases. The first is to determine "which candidate entries you have the rights to query and view" and the second is to determine "which attributes of each entry you have the right to perceive". This separation exists so that other parts of the server can impersonate users and conduct searches on their behalf, but still internally operate on the full entry without access controls limiting their scope of attributes we can view.

  5. From the entries reduced set (ie access controls applied), we can then transform each entry into it's protocol forms - where we transform each strong type into a string representation for simpler processing for clients. These protoentries are returned to the front end.

  6. Finally, the protoentries are now sent to the client in response to their request.

Write

The write path is similar to the search path, but has some subtle differences that are worth paying attention to.

write flow diagram

(1), (2) Like search, all client operations come from the REST or raw apis, and are transformed or generated into messages. These messages are sent to a single write worker. There is only a single write worker due to the use of copy-on-write structures in the server, limiting us to a single writer, but allowing search transaction to proceed without blocking in parallel.

(3) From the worker, the relevant event is created. This may be a "Create", "Modify" or "Delete" event. The query server handles these slightly differently. In the create path, we take the set of entries you wish to create as our candidate set. In modify or delete, we perform an impersonation search, and use the set of entries within your read bounds to generate the candidate set. This candidate set will now be used for the remainder of the writing operation.

It is at this point, we assert access controls over the candidate set and the changes you wish to make. If you are not within rights to perform these operations the event returns an error.

(4) The entries are now sent to the pre-operation plugins for the relevant operation type. This allows transformation of the candidate entries beyond the scope of your access controls, and to maintain some elements of data consistency. For example one plugin prevents creation of system protected types where another ensures that uuid exists on every entry.

(5) These transformed entries are now returned to the query server.

(6) The backend is sent the list of entries for writing. Indexes are generated (7) as required based on the new or modified entries, and the entries themselves are written (8) into the core db tables. This operation returns a result (9) to the backend, which is then filtered up to the query server (10)

(11) Provided all operations to this point have been successful, we now apply post write plugins which may enforce or generate different properties in the transaction. This is similar to the pre plugins, but allows different operations. For example, a post plugin ensures uuid reference types are consistent and valid across the set of changes in the database. The most critical is memberof, which generates reverse reference links from entries to their group memberships, enabling fast rbac operations. These are done as post plugins because at this point internal searches can now yield and see the modified entries that we have just added to the indexes and datatables, which is important for consistency (and simplicity) especially when you consider batched operations.

(12) Finally the result is returned up (13) through (14) the layers (15) to the client to inform them of the success (or failure) of the operation.

IDM

TBD

Radius

The radius components are intended to be minimal to support a common set of radius operations in a container image that is simple to configure. If you require a custom configuration you should use the python tools here and configure your own radius instance as required.

The Authentication Flow

  1. Client sends an init request. This can be either:
    1. AuthStep::Init which just includes the username, or
    2. AuthStep::Init2 which can request a "privileged" session
  2. The server responds with a list of authentication methods. (AuthState::Choose(Vec<AuthAllowed>))
  3. Client requests auth with a method (AuthStep::Begin(AuthMech))
  4. Server responds with an acknowledgement (AuthState::Continue(Vec<AuthAllowed>)). This is so the challenge can be included in the response, for Passkeys or other challenge-response methods.
    • If required, this challenge/response continues in a loop until the requirements are satisfied - for example, TOTP + Password.
  5. The result is returned, either:
    • Success, with the User Auth Token as a String.
    • Denied, with a reason as a String.
sequenceDiagram;
    autonumber
    participant Client
    participant Kanidm
    
    Note over Client: "I'm Ferris and I want to start auth!"
    Client ->> Kanidm: AuthStep::Init(username)
    Note over Kanidm: "You can use the following methods"
    Kanidm ->> Client: AuthState::Choose(Vec<AuthAllowed>)

    loop Authentication Checks
        Note over Client: I want to use this mechanism
        Client->>Kanidm: AuthStep::Begin(AuthMech)
        Note over Kanidm: Ok, you can do that.
        Kanidm->>Client: AuthState::Continue(Vec<AuthAllowed>)
        Note over Client: Here is my credential
        Client->>Kanidm: AuthStep::Cred(AuthCredential)
        Note over Kanidm: Kanidm validates the Credential,<br /> and if more methods are required,<br /> return them.
        Kanidm->>Client: AuthState::Continue(Vec<AuthAllowed>)
        Note over Client, Kanidm: If there's no more credentials required, break the loop.

    end

    Note over Client,Kanidm: If Successful, return the auth token
    Kanidm->>Client: AuthState::Success(String Token)

    Note over Client,Kanidm: If Failed, return that and a message why.
    Kanidm-xClient: AuthState::Denied(String Token)

Domain Join - Machine Accounts

There are a number of features we have been considering that will require us to finally give in and support machine accounts also know as domain joining.

Feature Requirements

Limiting Unix Password Auth

Currently unix password authentication is targetted as the method for sudo. Initial access to the machine should come from ssh keys (and in future, ctap2).

In order to maintain compatability with LDAP style authentication, we allow "anonymous hosts" to retrieve ssh public keys, and then perform sudo authentication.

This has the obvious caveat that anyone can stand up a machine that trusts a Kanidm instance. This presents a double edged sword:

  • By configuring a machine to authenticate via Kanidm, there is full trust in the authentication decisions Kanidm makes.
  • Users of Kanidm may be tricked into accessing a machine that is not managed by their IT or other central authority.

To prevent this, UNIX authentication should be configurable to prevent usage from unregistered machines. This will require the machine to present machine authentication credentials simultaneously with the user's credentials.

A potential change is removing the current unix password auth mechanism as a whole. Instead the user's auth token would contain a TPM bound credential that only the domain joined machine's TPM could access and use.

Requesting Cryptographic Credentials

When a user logs in to a machine, it may be required that they can use that authentication to identify themself to other systems. When a user authenticates with credentials such as ssh-keys, it's not possible to use these to request other forwardable credentials - and ssh agent forwarding only allows forwarding of ssh credentials, not other types of credentials that may be needed.

In this case when a user authenticates with SSH, since they're using a trusted machine, Kanidm can request short-term and limited credentials on the users behalf.

An example is that we could dynamically request TLS certificates or Kerberos credentials.

Normally with ssh in this manner, everything has to use kerberos. This would force users to kinit on their machine to ssh and forward their credentials to the next machine. This causes friction since configuring kerberos on machines is an exercise in frustration, and with BYOD it gets even worse. In addition when using ssh with an ssh key, the only viable kinit mechanism is password or password + totp once the user has logged in. This is because pkcs11 can't be forwarded over ssh, nor can CTAP2, limiting kinit to weaker authentication mechanisms.

Security Considerations

  • Anonymous joins should not be allowed or permitted.
  • Join tokens need to be revoked (causing related machines to request re-enrollment) or expired (related machines can continue to function)
  • Join tokens must be auditable.
  • Private keys SHOULD be stored in a TPM, or at least a software HSM with a secured unlock key.
  • The use of the private key must prevent replay attacks

Overview

Since the machine would now be an entity requiring authentication, we need to have a process to establish and maintain this trust relationship.

  1. A join token is created by a user who is authorised to perform domain joins.
  2. The machine is audited for a known trust state. This process may vary from site to site. A future improvement could be that the join token can only release on certain TPM PCR values.
  3. The join token is yielded to the Kanidm UNIX daemon which submits its signing key to the Kanidm server.
  4. The kanidm server verifies the submission and creates a machine account.
  5. The kanidm unix daemon now uses it's signing key to sign a challenge that is submitted with all requests to the kanidm server.

Extra

  1. Machines should be able to "re-join" with an alternate join token, moving their machine account join token relationship.
  2. Machines must be able to self-enroll newer keys which may have stronger cryptographic requirements.

Details

Join Token Creation

Join tokens are persisted in the database allowing tracing back to the usage of the token.

Every machine that is joined by that token will related back to that token. This allows auditing of which token was used to join which machine.

Machines may re-enroll with an alternate token.

The join token should be signed. The JWK pub key should be available at a known HTTPS uri so that the client can use it to validate the join token and its content. This may allow policy to be embedded into the join token for the client to self-adhere to in the join process.

Machine Auditing

The machine should be audited to be in a secure state. It's not yet clear how to proceed here, but we should consider using TPM PCRs with secure boot to measure this and validate the machine state.

One possible way to achieve this could be with full disk encryption that is bound to secure boot and TPM PCRs. Kanidm-unixd could validate the same PCR's to start operating. The challenge here would be updates of the expected PCR values during a system update. Alternately, Kanidm could "assume" that if started, then the FDE must have passed and attestation of health "is out of scope" for us.

Public Key Submission

The private key should be generated and stored in a TPM/HSM. If possible, we should also submit attestation of this.

The submission of the public key should prevent replays, and should sign either a nonce or the current time. The current time must be valid to within a number of seconds. The nonce must be created by the server.

The machine must submit its public key, the time value and the signature. This should accompany the join token.

If the signature is valid, and the join token is correct, then the machine is joined and has a machine account created. The machine account is linked to the join token.

Machine Account

The machine account is a new form of account, similar to a service account. It should identify the machine, its hostname, and other properties. It should also contain the machine's public key id.

When the machine requests certain API's from Kanidm, it should submit signed requests that include the current time. The kid is used to find the machine account that is submitting the request. This then validates the identity of the caller, and then allows the action to proceed.

Elevation of Privilege Inside User Sessions

To improve user experience, we need to allow long lived sessions in browsers. This is especially important as a single sign on system, users tend to be associated 1 to 1 with devices, and by having longer lived sessions, they have a smoother experience.

However, we also don't want user sessions to have unbound write permissions for the entire (possibly unlimited) duration of their session.

Prior art for this is github, which has unbounded sessions on machines and requests a re-authentication when a modifying or sensitive action is to occur.

For us to implement this will require some changes to how we manage sessions.

Session Issuance

  • ISSUE: Sessions are issued identically for service-accounts and persons

  • CHANGE: service-accounts require a hard/short session expiry limit and always have elevated permissions

  • CHANGE: persons require no session expiry and must request elevation for privs.

  • ISSUE: Sessions currently indicate all read-write types as the same access scope type.

  • CHANGE: Split sessions to show rwalways, rwcapable, rwactive

  • ISSUE: Sessions currently are recorded identically between service-accounts, persons, and api tokens

  • CHANGE: Change the session storage types to have unique session types for these ✅

  • ISSUE: Access Scope types are confused by api session using the same types.

  • CHANGE: Use access scope only as the end result of current effective permission calculation and not as a method to convert to anything else. ✅

    AccessScope { ReadOnly, ReadWrite, Synchronise }

    // Bound by token expiry ApiTokenScope { ReadOnly, ReadWrite, Synchronise }

    UatTokenScope { ReadOnly, // Want to avoid "read write" here to prevent dev confusion. PrivilegeCapable, PrivilegeActive { expiry }, ReadWrite, }

    SessionScope { Ro, RwAlways, PrivCapable, }

    ApiTokenScope { RO RW Sync }

    AuthSession if service account rw always, bound expiry

    if person
      priv cap, unbound exp
         - Should we have a "trust the machine flag" to limit exp though?
         - can we do other types of cryptographic session binding?
    

Session Validation

  • CHANGE: Session with PrivCapable indicates that re-auth can be performed.
  • CHANGE: Improve how Uat/Api Token scopes become Access Scopes
  • CHANGE: Remove all AccessScope into other types. ✅

Session Re-Authentication

  • Must be performed by the same credential that issued the session originally

    • This is now stored in the session metadata itself.
    • Does it need to be in the cred-id?
  • CHANGE: Store the cred id in UAT so that a replica can process the operation in a replication sync failure?

    • This would rely on re-writing the session.
  • CHANGE: Should we record in the session when priv-escalations are performed?

Misc

  • CHANGE: Compact/shrink UAT size if possible.

Diagram

                                                                                                    Set                                               
                                                                         ┌───────────────────────PrivActive────────────────────┐                      
                                                                         │                         + Exp                       │                      
                                                                         │                                                     │                      
                              ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐        │                      .───────────.         ┌────────────────┐              
                                                                         │   ┌────────────────▶( If Priv Cap )───────▶│Re-Auth-Allowed │              
                              │                                 │        │   │                  `───────────'         └────────────────┘              
                                   DB Content                    ┌ ─ ─ ─ ┼ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─                                                        
┌───────────────────┐         │                                 │    JWT │   │                │                                                       
│                   │                                            │       ▼   │                                                                        
│    AuthSession    │         │         ┌──────────────┐        │    ┌──────────────┐         │                                                       
│                   │                   │SessionScope  │         │   │UatScope      │                                                                 
│  Service Account  │         │         │- RO          │        │    │- RO          │         │                                                       
│     -> RWAlways   │──────────────────▶│- RW          │─────────┼──▶│- RW          │──────────────────────────┐                                      
│                   │         │         │- PrivCapable │        │    │- PrivCapable │         │                │                                      
│      Person       │                   └──────────────┘         │   │- PrivActive  │                          │                                      
│     -> PrivCap    │         │                                 │    └──────────────┘         │                │                                      
│                   │                                            │                                             │                                      
└───────────────────┘         │                                 │                             │                ▼                                      
                                                                 │                                     ┌──────────────┐                               
                              │                                 │                             │        │AccessScope   │              ┌───────────────┐
                                                                 │                                     │- RO          │              │               │
                              │                                 │                             │        │- RW          │───────────▶  │Access Controls│
                                                                 │                                     │- Sync        │              │               │
 ┌───────────────────┐        │       ┌─────────────────┐       │     ┌──────────────┐        │        └──────────────┘              └───────────────┘
 │                   │                │ApiSessionScope  │        │    │ApiTokenScope │                         ▲                                      
 │ Create API Token  │        │       │- RO             │       │     │- RO          │        │                │                                      
 │                   │───────────────▶│- RW             │────────┼───▶│- RW          │─────────────────────────┘                                      
 │Access Based On Req│        │       │- Sync           │       │     │- Sync        │        │                                                       
 │                   │                └─────────────────┘        │    │              │                                                                
 └───────────────────┘        │                                 │     └──────────────┘        │                                                       
                                                                 │                                                                                    
                              │                                 │                             │                                                       
                               ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─

TODO:

  1. Remove the ident-only access scope, it's useless! ✅
  2. Split tokens to have a dedicated session type separate to uat sessions. ✅
  3. Change uat session access scope recording to match service-account vs person intent.
  4. Change UAT session issuance to have the uat purpose reflect the readwrite or readwrite-capable nature of the session, based on auth-type that was used.
  5. Based on auth-type, limit or unlimit expiry to match the intent of the session.

Oauth2 Refresh Tokens

Due to how Kanidm authentication sessions were originally implemented they had short session times (1 hour) due to the lack of privilege separation in tokens. Now with privilege separation being implemented session lengths have been extended to 8 hours with possible increases in the future.

However, this leaves us with an issue with oauth2 - oauth2 access tokens are considered valid until their expiry and we should not issue tokens with a validity of 8 hours or longer since that would allow rogue users to have a long window of usage of the token before they were forced to re-auth. It also means that in the case that an account must be forcefully terminated then the user would retain access to applications for up to 8 hours or more.

To prevent this, we need oauth2 tokens to "check in" periodically to re-afirm their session validity.

This is performed with access tokens and refresh tokens. The access token has a short lifespan (proposed 15 minutes) and must be refreshed with Kanidm which can check the true session validity and if the session has been revoked. This creates a short window for revocation to propagate to oauth2 applications since each oauth2 application must periodically check in to keep their access token alive.

Risks

Refresh tokens are presented to the relying server where they receive an access token and an optional new refresh token. Because of this, it could be possible to present a refresh token multiple times to proliferate extra refresh and access tokens away from the system. Preventing this is important to limit where the tokens are used and monitor and revoke them effectively.

In addition, old refresh tokens should not be able to be used once exchanged, they should be "at most once". If this is not enforced then old refresh tokens can be used to gain access to sessions even if the associated access token was expired by many hours and it's refresh token was already used.

This is supported by draft oauth security topics section 2.2.2 and draft oauth security topics refresh token protection

Refresh tokens must only be used by the client application associated. Kanidm strictly enforces this already with our client authorisation checks. This is discussed in rfc6749 section 10.4.

Design

      ┌─────────────────────────────────────────┐
      │Kanidm                                   │
      │                                         │
      │ ┌─────────┐                ┌─────────┐  │
      │ │         │                │         │  │
      │ │         │                │         │  │
      │ │ Session │  3. Update     │ Session │  │
      │ │  NIB 1  │─────NIB───────▶│  NIB 2  │  │
      │ │         │                │         │  │
      │ │         │                │         │  │
      │ │         │                │         │  │
      │ └─────────┘                └─────────┘  │
      │   │                           │         │
      └───┼───────────────────────────┼─────────┘
     ┌────┘             ▲        ┌────┘          
     │                  │        │               
     │                  │        │               
1. Issued               │   4. Issued            
     │                  │        │               
     │                  │        │               
     │                  │        │               
     ▼                  │        ▼               
 ┌───────┐              │    ┌───────┐           
 │       │              │    │       │           
 │Access │              │    │Access │           
 │   +   │              │    │   +   │           
 │Refresh│──2. Refresh──┘    │Refresh│           
 │ IAT 1 │                   │ IAT 2 │           
 │       │                   │       │           
 └───────┘                   └───────┘

In this design we associate a "not issued before" (NIB) timestamp to our sessions. For a refresh token to be valid for issuance, the refresh tokens IAT must be greater than or equal to the NIB.

In this example were the refresh token with IAT 1 reused after the second token was issued, then this condition would fail as the NIB has advanced to 2. Since IAT 1 is not greater or equal to NIB 2 then the refresh token must have previously been used for access token exchange.

In a replicated environment this system is also stable and correct even if a session update is missed.

                                          2.                                                       
              ┌───────────────────────Replicate────────────────┐                                   
              │                                                │                                   
              │                                                │                                   
      ┌───────┼─────────────────────────────────┐       ┌──────┼──────────────────────────────────┐
      │Kanidm │                                 │       │Kanidm│                                  │
      │       │                                 │       │      ▼                                  │
      │ ┌─────────┐                ┌─────────┐  │       │ ┌─────────┐                 ┌─────────┐ │
      │ │         │                │         │  │       │ │         │                 │         │ │
      │ │         │                │         │  │       │ │         │                 │         │ │
      │ │ Session │  4. Update     │ Session │  │       │ │ Session │   7. Update     │ Session │ │
      │ │  NIB 1  │─────NIB───────▶│  NIB 2  │  │       │ │  NIB 1  │ ─────NIB───────▶│  NIB 3  │ │
      │ │         │                │         │  │       │ │         │                 │         │ │
      │ │         │                │         │  │       │ │         │                 │         │ │
      │ │         │                │         │  │       │ │         │                 │         │ │
      │ └─────────┘                └─────────┘  │       │ └─────────┘                 └─────────┘ │
      │   │                           │         │       │      ▲                        │         │
      └───┼───────────────────────────┼─────────┘       └──────┼────────────────────────┼─────────┘
     ┌────┘             ▲        ┌────┘                        │                   ┌────┘          
     │                  │        │                             │                   │               
     │                  │        │                             │                   │               
1. Issued               │   5. Issued                          │              8. Issued            
     │                  │        │                             │                   │               
     │                  │        │                             │                   │               
     │                  │        │                             │                   │               
     ▼                  │        ▼                             │                   ▼               
 ┌───────┐              │    ┌───────┐                         │               ┌───────┐           
 │       │              │    │       │                         │               │       │           
 │Access │              │    │Access │                         │               │Access │           
 │   +   │              │    │   +   │                         │               │   +   │           
 │Refresh│──3. Refresh──┘    │Refresh│                         │               │Refresh│           
 │ IAT 1 │                   │ IAT 2 │─────6. Refresh──────────┘               │ IAT 3 │           
 │       │                   │       │                                         │       │           
 └───────┘                   └───────┘                                         └───────┘

In this example, we can see that the replication of the session with NIB 1 happens to the second Kanidm server, but the replication of session with NIB 2 has not occurred yet. If the token that was later issued with IAT 2 was presented to the second server it would still be valid and able to refresh since IAT 2 is greater or equal to NIB 1. This would also prompt the session to advance to NIB 3 such that when replication begun again, the session with NIB 3 would take precedence over the former NIB 2 session.

While this allows a short window where a former access token could be used on the second replica, this infrastructure being behind load balancers and outside of an attackers influence significantly hinders the ability to attack this for very little gain.

Attack Detection

draft oauth security topics section 4.14.2 specifically calls out that when refresh token reuse is detected then all tokens of the session should be canceled to cause a new authorisation code flow to be initiated.

Inactive Refresh Tokens

Similar draft oauth security topics section 4.14.2 also discusses that inactive tokens should be invalidated after a period of time. From the view of the refresh token this is performed by an internal exp field in the encrypted refresh token.

From the servers side we will require a "not after" parameter that is updated on token activity. This will also require inactive session cleanup in the server which can be extended into the session consistency plugin that already exists.

Since the act of refreshing a token is implied activity then we do not require other signaling mechanisms.

Questions

Currently with authorisation code grants and sessions we issue these where the sessions are recorded in an async manner. For consistency I believe the same should be true here but is there a concern with the refresh being issued but a slight delay before it's recorded? I think given the nature of our future replication we already have to consider the async/eventual nature of things, so this doesn't impact that further, and may just cause client latency in the update process.

However, we also don't want a situation where our async/delayed action queues become too full or overworked. Maybe queue monitoring/backlog issues are a separate problem though.

Replication Coordinator Design

Many other IDM systems configure replication on each node of the topology. This means that the administrator is responsible for ensuring all nodes are connected properly, and that agreements are bidirectional. As well this requires manual work for administrators to configure each node individually, as well as monitoring individually. This adds a significant barrier to "stateless" configurations.

In Kanidm we want to avoid this - we want replication to be coordinated to make deployment of replicas as easy as possible for new sites.

Kanidm Replication Coordinator

The intent of the replication coordinator (KRC) is to allow nodes to subscribe to the KRC which configures the state of replication across the topology.

1. Out of band -                ┌────────────────┐
 issue KRC ca + ────────────────┤                │
   Client JWT.                  │                │
        │       ┌──────────────▶│                │──────────────────────┐
        │       │2. HTTPS       │     Kanidm     │                      │
        │     JWT in Bearer     │  Replication   │            5. Issue repl config
        │  Request repl config  │  Coordinator   │             with partner public
        │  Send self signed ID  │                │                     key
        │       │  cert         │                │                      │
        │       │     ┌─────────│                │◀────────┐            │
        │       │     │         │                │       4. HTTPS       │
        │       │     │         └────────────────┘    JWT in Bearer     │
        │       │   3. Issue                       Request repl config  │
        │       │  repl config                     Send self signed ID  │
        │       │     │                                    cert         │
        │       │     │                                    │            │
        │       │     │                                    │            │
        │       │     │                                    │            │
        │       │     │                                    │            │
        ▼       │     ▼                                    │            ▼
       ┌────────────────┐                                ┌─┴──────────────┐
       │                │                                │                │
       │                │                                │                │
       │                │       5. mTLS with self        │                │
       │                │──────────signed cert──────────▶│                │
       │ Kanidm Server  │      Perform replication       │ Kanidm Server  │
       │     (node)     │                                │     (node)     │
       │                │                                │                │
       │                │                                │                │
       │                │                                │                │
       │                │                                │                │
       └────────────────┘                                └────────────────┘

The KRC issues configuration tokens. These are JWT's that are signed by the KRC.

A configuration token is not unique to a node. It can be copied between many nodes. This allows stateless deployments where nodes can be spun up and provided their replication config.

The node is provided with the KRC TLS CA, and a configuration token.

The node when configured contacts the KRC with its configuration token as bearer authentication. The KRC uses this to determine and issue a replication configuration. Because the configuration token is signed by the KRC, a fraudulent configuration token can not be used by an attacker to fraudulently subscribe a kanidm node. Because the KRC is contacted over TLS this gives the node strong assurances of the legitimacy of the KRC due to TLS certificate validation and pinning.

The KRC must be able to revoke replication configuration tokens in case of a token disclosure.

The node sends its KRC token, server UUID, and server repl public key to the KRC.

The configuration token defines the replication group identifier of that node. The KRC uses the configuration token and the servers UUID to assign replication metadata to the node. The KRC issues a replication configuration to the node.

The replication configuration defines the nodes that the server should connect to, as well as providing the public keys that are required for that node to perform replication. These are elaborated on in node configuration.

Kanidm Node Configuration

There are some limited cases where an administrator may wish to manually define replication configuration for their deployments. In these cases the admin can manually configure replication parameters in the Kanidm configuration.

A kanidm node for replication requires either:

  • The URL to the KRC
  • the KRC CA cert
  • KRC issued configuration JWT

OR

  • A replication configuration map

A replication configuration map contains a set of agreements and their direction of operation.

All replicas require:

  • The direct url that other nodes can reach them on (this is NOT the origin of the server!)

Pull mode

This is the standard and preferred mode. The map contains for each node to pull from.

  • the url of the node's replication endpoint.
  • The self-signed node certificate to be pinned for the connection.
  • If a refresh required message is received, if automatic refresh should be carried out.

Push mode

This mode is only available in manual configurations, and should only be used as a last resort.

  • The url of the nodes replication endpoint.
  • The self-signed node certificate to be pinned for the connection.
  • If a refresh required message would be sent, if the node should be force-refreshed next cycle.

Worked examples

Manual configuration

There are two nodes, A and B.

The administrator configures the kanidm server with replication urls

[replication]
node_url = https://private.name.of.node

The administrator extracts their replication certificates with the kanidmd binary admin features. This will reflect the node_url in the certificate.

kanidmd replication get-certificate

For each node, a replication configuration is created in json. For A pulling from B.

[
  { "pull":
    {
      url: "https://node-b.private-name",
      publiccert: "pem certificate from B",
      automatic_refresh: false
    }
  },
  { "allow-pull":
    {
      clientcert: "pem certificate from B"
    }
  }
]

For B pulling from A.

[
  { "pull":
    {
      url: "https://node-a.private-name",
      publiccert: "pem certificate from A",
      automatic_refresh: false
    }
  },
  { "allow-pull":
    {
      clientcert: "pem certificate from A"
    }
  }
]

Notice that automatic refresh only goes from A -> B and not the other way around. This allows one server to be "authoritative".

TODO: The node configuration will also need to list nodes that can do certain tasks. An example of these tasks is that to prevent "update storms" a limited set of nodes should be responsible for recycling and tombstoning of entries. These should be defined as tasks in the replication configuration, so that the KRC can later issue out which nodes are responsible for those processes.

These are analogous to the AD FSMO roles, but I think we need a different name for them. Single Node Origin Task? Single Node Operation Runner? Yes I'm trying to make silly acronyms.

KRC Configuration

Still not fully sure about the KRC config yet. More thinking needed!

The KRC is configured with it's URL and certificates.

[krc_config]
origin = https://krc.example.com
tls_chain = /path/to/tls/chain
tls_key = /path/to/tls/key

The KRC is also configured with replication groups.

  [origin_nodes]
  # This group never auto refreshes - they are authoritative.
  mesh = full

  [replicas_syd]
  # Every node has two links inside of this group.
  mesh = 2
  # at least 2 nodes in this group link externally.
  linkcount = 2
  linkto = [ "origin_nodes" ]

  [replicas_bne]
  # Every node has one link inside of this group.
  mesh = 1
  # at least 1 node in this group link externally.
  linkcount = 1
  linkto = [ "origin_nodes" ]

This would yield the following arrangement.

                      ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
                        origin_nodes                       │
                      │
                          ┌────────┐         ┌────────┐    │
                      │   │        │         │        │
                          │   O1   │◀───────▶│   O2   │    │
                      │   │        │         │        │
                          └────────┘◀───┬───▶└────────┘    │
                      │        ▲        │         ▲
                               │        │         │        │
                      │        │        │         │
                               ▼        │         ▼        │
                      │   ┌────────┐◀───┴───▶┌────────┐
                          │        │         │        │    │
                      │   │   O3   │◀───────▶│   O4   │◀─────────────────────────────┐
                          │        │         │        │    │                         │
                      │   └────────┘         └────────┘                              │
                               ▲                  ▲        │                         │
                      └ ─ ─ ─ ─│─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─                          │
                               │                  │                                  │
                               │                  │                                  │
                               │                  │                                  │
                            ┌──┘                  │                                  │
                            │                     │                                  │
                            │                     │                                  │
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─             │      ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─
  replicas_bne              │        │            │        replicas_syd              │        │
│                           │                     │      │                           │
    ┌────────┐         ┌────────┐    │            │          ┌────────┐         ┌────────┐    │
│   │        │         │        │                 │      │   │        │         │        │
    │   B1   │◀───────▶│   B2   │    │            └──────────│   S1   │◀───────▶│   S2   │    │
│   │        │         │        │                        │   │        │         │        │
    └────────┘         └────────┘    │                       └────────┘         └────────┘    │
│                           ▲                            │        ▲                  ▲
                            │        │                            │                  │        │
│                           │                            │        │                  │
                            ▼        │                            ▼                  ▼        │
│   ┌────────┐         ┌────────┐                        │   ┌────────┐         ┌────────┐
    │        │         │        │    │                       │        │         │        │    │
│   │   B3   │◀───────▶│   B4   │                        │   │   S3   │◀───────▶│   S4   │
    │        │         │        │    │                       │        │         │        │    │
│   └────────┘         └────────┘                        │   └────────┘         └────────┘
                                     │                                                        │
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─                    └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─

!!! TBD - How to remove / decomission nodes?

I think origin nodes are persistent and must be manually defined. Will this require configuration of their server uuid in the config?

Auto-node groups need to check in with periodic elements, and missed checkins.

Checkins need to send ruv? This will allow the KRC to detect nodes that are stale.

If a node misses checkins after a certain period they should be removed from the KRC knowledge?

R/O nodes could removed after x days of failed checkins, without much consequence.

R/W nodes on the other hand it's a bit trickier to know if they should be automatically removed.

Or is delete of nodes a manual cleanup / triggers clean-ruv?

Should replication maps have "priorities" to make it a tree so that if nodes are offline then it can auto-re-route? Should they have multiple paths? Want to avoid excess links/loops/disconnections of nodes.

I think some more thought is needed here. Possibly a node state machine.

I think for R/O nodes, we need to define how R/W will pass through. I can see a possibility like

                                No direct line
       ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ of sight─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐

       │                                                               ▼
┌────────────┐                 ┌─────────────┐────OOB Write────▶┌─────────────┐
│            │                 │ Remote Kani │                  │             │
│   Client   │─────Write──────▶│   Server    │                  │    Main     │
│            │                 │             │                  │             │
└────────────┘                 └─────────────┘◀───Replication───└─────────────┘

This could potentially even have some filtering rules about what's allowed to proxy writes through. Generally though I think that RO will need a lot more thought, for now I want to focus on just simple cases like a pair / group of four replicas. 😅

Requirements

  • Cryptographic (key) only authentication
  • Node to Node specific authentication
  • Scheduling of replication
  • Multiple nodes
  • Direction of traffic?
  • Use of self-signed issued certs for nodes.
  • Nodes must reject if incoming clients have the same certs.

Replication Design and Notes

Replication is a critical feature in an IDM system, especially when deployed at major sites and businesses. It allows for horizontal scaling of system read and write capacity, improves fault tolerance (hardware, power, network, environmental), and can improve client latency (by positioning replicas near clients).

Replication Background

Replication is a directed graph model, where each node (server) and directed edge (replication agreement) form a graph (topology). As the topology and direction can be seen, nodes of the graph can be classified based on their data transit properties.

NOTE: Historically many replication systems used the terms "master" and "slave". This has a number of negative cultural connotations, and is not used by this project.

Read-Write server

This is a server that is fully writable. It accepts external client writes, and these writes are propagated to the topology. Many read-write servers can be in a topology and written to in parallel.

Transport Hub

This is a server that is not writeable to clients, but can accept incoming replicated writes, and then propagates these to other servers. All servers that are directly after this server in the topology must not be a read-write, as writes may not propagate back from the transport hub. IE the following is invalid

RW 1 ---> HUB <--- RW 2

Note the replication direction in this, and that changes into HUB will not propagate back to RW 1 or RW 2.

Read-Only server

Also called a read-only replica, or in AD an RODC. This is a server that only accepts incoming replicated changes, and has no outbound replication agreements.

Replication systems are dictated by CAP theorem. This is a theory that states from "consistency, availability and partition tolerance" you may only have two of the three at any time.

Consistency

This is the property that a write to a server is guaranteed to be consistent and acknowledged to all servers in the replication topology. A change happens on all nodes or it does not happen at all, and clients contacting any server will always see the latest data.

Availability

This is the property that every request will receive a non-error response without the guarantee that the data is "up to date".

Partition Tolerance

This is the property that your topology in the face of partition tolerance will continue to provide functional services (generally reads).

Almost all systems expect partition tolerance, so the choice becomes between consistency and availability. These create a series of tradeoffs. Choosing consistency normally comes at significantly reduced write throughput due to the need for a majority of nodes to acknowledge changes. However, it avoids a need for complex conflict resolution systems. It also means that clients can be in a situation where they can't write. For IDM this would mean new sessions could not be created or accounts locked for security reasons.

Kanidm has chosen availability, as the needs of IDM dictate that we always function even in the face of partition tolerance, and when other failures occur. This comes at the cost of needing to manage conflict resolution. This AP selection is often called "eventually consistent" as nodes will convenge to an identical state over time.

Replication Phases

There are two phases of replication

  1. Refresh

This is when the content of a node is completely removed, and has the content of another node applied to replace it. No conflicts or issues can occur in this, as the refreshed node is now a "perfect clone" of the source node.

  1. Incremental

This is when differentials of changes are sent between nodes in the topology. By sending small diffs, it saves bandwidth between nodes and allows changes on all nodes to be merged and combined with other nodes. It is the handling of these incremental updates that can create conflicts in the data of the server.

Ordering of Writes - Change Identifiers

Rather than using an external coordinator to determine consistency, time is used to determine ordering of events. This allows any server to create a total-ordering of all events as though every write had occurred on a single server. This is how all nodes in replication will "arrive" at the same conclusion about data state, without the need for communication.

In order for time to be used in this fashion, it is important that the clock in use is always advancing and never stepping backwards. If a clock was to go backwards, it would cause an event on one node to be written in a different order than the way that other servers will apply the same writes. This creates data corruption.

As an aside, there are systems that do replication today and do not use always advancing clocks which can allow data corruption to seep in.

In addition it's also important that if an event happens at exactly the same moment on two nodes (down to the nanosecond) that a way of breaking the tie exists. This is why each server has an internal uuid, where the server uuid is used to order events if the timestamps are identical.

These points in time are represented by a changed identifier that contains the time of the event, and the server uuid that performed the event. In addition every write transaction of the server records the current time of the transaction, and if a subsequent transaction starts with a "time in the past", then the time is "dragged forward" to one nanosecond after the former transaction. This means CID's always advance - and never go backwards.

Conflict Resolution

Despite the ability to order writes by time, consistency is not a property that we can guarantee in an AP system. we must be able to handle the possibility of inconsistent data and the correct methods to bring all nodes into a consistent state with cross communication. These consistency errors are called conflicts. There are multiple types of conflict that can occur in a system like Kanidm.

Entry Conflicts

This is when the UUID of an entry is duplicated on a separate node. For example, two entries with UUID=A are created at the same time on two separate nodes. During replication one of these two nodes will persist and the other will become conflicted.

Attribute Conflicts

When entries are updated on two nodes at the same time, the changes between the entries need to be merged. If the same attribute is updated on two nodes at the same time the differences need to be reconciled. There are three common levels of resolution used for this. Lets consider an entry such as:

# Node A
attr_a: 1
attr_b: 2
attr_c: 3

# Node B
attr_b: 1
attr_c: 2
attr_d: 3
  • Object Level

In object level resolution the entry that was "last written wins". The whole content of the last written entry is used, and the earlier write is lost.

In our example, if node B was the last write the entry would resolve as:

# OL Resolution
attr_b: 1
attr_c: 2
attr_d: 3
  • Attribute Level

In attribute level resolution, the time of update for each attribute is tracked. If an attribute was written later, the content of that attribute wins over the other entries.

For example if attr b was written last on node B, and attr c was written last on node A then the entry would resolve to:

# AL Resolution
attr_a: 1  <- from node A
attr_b: 1  <- from node B
attr_c: 3  <- from node A
attr_d: 3  <- from node B
  • Value Level

In value level resolution, the values of each attribute is tracked for changes. This allows values to be merged, depending on the type of attribute. This is the most "consistent" way to create an AP system, but it's also the most complex to implement, generally requiring a changelog of entry states and differentials for sequential reapplication.

Using this, our entries would resolve to:

# VL Resolution
attr_a: 1
attr_b: 1, 2
attr_c: 2, 3
attr_d: 3

Each of these strategies has pros and cons. In Kanidm we have used a modified attribute level strategy where individual attributes can internally perform value level resolution if needed in limited cases. This allows fast and simple replication, while still allowing the best properties of value level resolution in limited cases.

Schema Conflicts

When an entry is updated on two nodes at once, it may be possible that the updates on each node individually are valid, but when combined create an inconsistent entry that is not valid with respect to the schema of the server.

Plugin Conflicts

Kanidm has a number of "plugins" that can enforce logical rules in the database such as referential integrity and attribute uniqueness. In cases that these rules are violated due to incremental updates, the plugins in some cases can repair the data. However in cases where this can not occur, entries may become conflicts.

Tracking Writes - Change State

To track these writes, each entry contains a hidden internal structure called a change state. The change state tracks when the entry was created, when any attribute was written to, and when the entry was deleted.

The change state reflects the lifecycle of the entry. It can either be:

  • Live
  • Tombstoned

A live entry is capable of being modified and written to. It is the "normal" state of an entry in the database. A live entry contains an "origin time" or "created at" timestamp. This allows unique identification of the entry when combined with the uuid of the entry itself.

A tombstoned entry is a "one way street". This represents that the entry at this uuid is deleted. The tombstone propagates between all nodes of the topology, and after a tombstone window has passed, is reaped by all nodes internally.

A live entry also contains a map of change times. This contains the maximum CID of when an attribute of the entry was updated last. Consider an entry like:

attr_a: 1
attr_b: 2
attr_c: 3
uuid:   X

This entries changestate would show:

Live {
  at: { server_uuid: A, cid: 1 },
  attrs: {
    attr_a: cid = 1
    attr_b: cid = 1
    attr_c: cid = 2
  }
}

This shows us that the entry was created on server A, at cid time 1. At creation, the attributes a and b were created since they have the same cid.

attr c was either updated or created after this - we can't tell if it existed at cid 1, we can only know that a write of some kind occurred at cid 2.

Resolving Conflicts

With knowledge of the change state structure we can now demonstrate how the lower level entry and attribute conflicts are detected and managed in Kanidm.

Entry

An entry conflict occurs when two servers create an entry with the same UUID at the same time. This would be shown as:

        Server A            Server B
Time 0: create entry X
Time 1:                     create entry X
Time 2:       <-- incremental --
Time 3:        -- incremental -->

We can add in our entry change state for liveness here.

Time 0: create entry X cid { time: 0, server: A }
Time 1:                     create entry X cid { time: 1, server: B }
Time 2:       <-- incremental --
Time 3:        -- incremental -->

When the incremental occurs at time point 2, server A would consider these on a timeline as:

Time 0: create entry X cid { time: 0, server: A }
Time 1: create entry X cid { time: 1, server: B }

When viewed like this, we can see that if the second create had been performed on the same server, it would have been rejected as a duplicate entry. With replication enabled, this means that the latter entry will be moved to the conflict state instead.

The same process occurs with the same results when the reverse incremental operation occurs to server B where it receives the entry with the earlier creation from A. It will order the events and "conflict" it's local copy of the entry.

Attribute

An attribute conflict occurs when two servers modify the same attribute of the same entry before an incremental replication occurs.

        Server A            Server B
Time 0: create entry X
Time 1:        -- incremental -->
Time 2: modify entry X
Time 3:                     modify entry X
Time 4:       <-- incremental --
Time 5:        -- incremental -->

During an incremental operation, a modification to a live entry is allowed to apply provided the entries UUID and AT match the servers metadata. This gives the servers assurance that an entry is not in a conflict state, and that the node applied the change to the same entry. Were the AT values not the same, then the entry conflict process would be applied.

We can expand the metadata of the modifications to help understand the process here for the attribute.

        Server A            Server B
Time 0: create entry X
Time 1:        -- incremental -->
Time 2:                     modify entry X attr A cid { time: 2, server: B }
Time 3: modify entry X attr A cid { time: 3, server: A }
Time 4:       <-- incremental --
Time 5:        -- incremental -->

When the incremental is sent in time 4 from B to A, since the modification of the attribute is earlier than the content of A, the incoming attribute state is discarded. (A future version of Kanidm may preserve the data instead).

At time 5 when the increment returns from A to B, the higher cid causes the value of attr A to be replaced with the content from server A.

This allows all servers to correctly order and merge changes between nodes.

Schema

An unlikely but possible scenario is a set of modifications that create incompatible entry states with regard to schema. For example:

        Server A            Server B
Time 0: create group X
Time 1:        -- incremental -->
Time 2: modify group X into person X
Time 3:                     modify group X attr member
Time 4:       <-- incremental --
Time 5:        -- incremental -->

It is rare (if not will never happen) that an entry is morphed in place from a group to a person, from one class to a fundamentally different class. But the possibility exists so we must account for it.

In this case what would occur is that the attribute of 'member' would be applied to a person, which is invalid for the kanidm schema. In this case, the entry would be moved into a conflict state since logically it is not valid for directory operations (even if the attributes and entry level replication requirements for consistency have been met).

Plugin

Finally, plugins allow enforcement of rules above schema. An example is attribute uniqueness. Consider the following operations.

        Server A            Server B
Time 0: create entry X      create entry Y
Time 1:        -- incremental -->
Time 2:       <-- incremental --
Time 3: modify entry X attr name = A
Time 4:                     modify entry Y attr name = A
Time 5:       <-- incremental --
Time 6:        -- incremental -->

Here the entry is valid per the entry, attribute and schema rules. However, name is a unique attribute and can not have duplicates. This is the most likely scenario for conflicts to occur, since users can rename themself at any time.

In this scenario, in the incremental replication both entry Y and X would be move to the conflict state. This is because the name attribute may have been updated multiple times, or between incremental operations, meaning that either server can not reliably determine if X or Y is valid at this point in time, and wrt to future replications.

Incremental Replication

To this point, we have described "refresh" as a full clone of data between servers. This is easy to understand, and works as you may expect. The full set of all entries and their changestates are sent from a supplier to a consumer, replacing all database content on the consumer.

Incremental replication however requires knowledge of the state of the consumer and supplier to determine a difference of the entries between the pair.

To achieve this each server tracks a replication update vector (RUV), that describes the range of changes organised per server that originated the change. For example, the RUV on server B may contain:

|-----|----------|----------|
|     | s_uuid A | s_uuid B |
|-----|----------|----------|
| min | T4       | T6       |
|-----|----------|----------|
| max | T8       | T16      |
|-----|----------|----------|

This shows that server B contains the set of data ranging from server A at time 4 and server B at time 6 to the latest values of server A at time 8 and server B at time 16.

During incremental replication the consumer sends it RUV to the supplier. The supplier calculates the difference between the consumer RUV and the supplier RUV. For example

Server A RUV                   Server B RUV
|-----|----------|----------|  |-----|----------|----------|
|     | s_uuid A | s_uuid B |  |     | s_uuid A | s_uuid B |
|-----|----------|----------|  |-----|----------|----------|
| min | T4       | T6       |  | min | T4       | T6       |
|-----|----------|----------|  |-----|----------|----------|
| max | T10      | T16      |  | max | T8       | T20      |
|-----|----------|----------|  |-----|----------|----------|

If A was the supplier, and B the consumer, when comparing these RUV's Server A would determine that B required the changes A {T9, T10}. Since B is ahead of A wrt to the server B changes, server A would not supply these ranges. In the reverse, B would supply B {T17 -> T20}.

If there were multiple servers, this allows replicas to proxy changes.

Server A RUV                              Server B RUV
|-----|----------|----------|----------|  |-----|----------|----------|----------|
|     | s_uuid A | s_uuid B | s_uuid C |  |     | s_uuid A | s_uuid B | s_uuid C |
|-----|----------|----------|----------|  |-----|----------|----------|----------|
| min | T4       | T6       | T5       |  | min | T4       | T6       | T4       |
|-----|----------|----------|----------|  |-----|----------|----------|----------|
| max | T10      | T16      | T13      |  | max | T8       | T20      | T8       |
|-----|----------|----------|----------|  |-----|----------|----------|----------|

In this example, if A were supplying to B, then A would supply A {T9, T10} and C {T9 -> T13}. This allows the replication to avoid full connection (where every node must contact every other node).

In order to select the entries for supply, the database maintains an index of entries that are affected by any change for any cid. This allows range requests to be made for efficient selection of what entries were affected in any cid.

After an incremental replication is applied, the RUV is updated to reflect the application of these differences.

Lagging / Advanced Consumers

Replication relies on each node periodically communicating for incremental updates. This is because of deletes. A delete event occurs by a Live entry becoming a Tombstone. A tombstone is replicated over the live entry. Tombstones are then reaped by each node individually once the replication delay window has passed.

This delay window is there to allow every node the chance to have the tombstone replicated to it, so that all nodes will delete the tombstone at a similar time.

Once the delay window passes, the RUV is trimmed. This moves the RUV minimum.

We now need to consider the reason for this trimming process. Lets use these RUV's

Server A RUV                   Server B RUV
|-----|----------|----------|  |-----|----------|----------|
|     | s_uuid A | s_uuid B |  |     | s_uuid A | s_uuid B |
|-----|----------|----------|  |-----|----------|----------|
| min | T10      | T6       |  | min | T4       | T9       |
|-----|----------|----------|  |-----|----------|----------|
| max | T15      | T16      |  | max | T8       | T20      |
|-----|----------|----------|  |-----|----------|----------|

The RUV for A on A does not overlap the range of the RUV for A on B (A min 10, B max 8).

This means that a tombstone could have been created at T9 and then reaped. This would mean that B would not have perceived that delete and then the entry would become a zombie - back from the dead, risen again, escaping the grave, breaking the tombstone. This could have security consequences especially if the entry was a group providing access or a user who was needing to be deleted.

To prevent this, we denote server B as lagging since it is too old. We denote A as advanced since it has data newer that can not be applied to B.

This will "freeze" B, where data will not be supplied to B, nor will data from B be accepted by other nodes. This is to prevent the risk of data corruption / zombies.

There is some harm to extending the RUV trim / tombstone reaping window. This window could be expanded even to values as long as years. It would increase the risk of conflicting changes however, where nodes that are segregated for extended periods have been accepting changes that may conflict with the other side of the topology.

REST Interface

Kani Warning Note!
This is a work in progress and not all endpoints have perfect schema definitions, but they're all covered!

We're generating an OpenAPI specification file and Swagger interface using utoipa.

The Swagger UI is available at /docs/swagger-ui on your server (ie, if your origin is https://example.com:8443, visit https://example.com:8443/docs/swagger-ui).

The OpenAPI schema is similarly available at /docs/v1/openapi.json.

You can download the schema file using kanidm api download-schema <filename> - it defaults to ./kanidm-openapi.json.

Kanidm Python Module

So far it includes:

  • asyncio methods for all calls, leveraging aiohttp
  • every class and function is fully python typed (test by running make test/pykanidm/mypy)
  • test coverage for 95% of code, and most of the missing bit is just when you break things
  • loading configuration files into nice models using pydantic
  • basic password authentication
  • pulling RADIUS tokens

TODO: a lot of things.

Setting up your dev environment.

Setting up a dev environment can be a little complex because of the mono-repo.

  1. Install poetry: python -m pip install poetry. This is what we use to manage the packages, and allows you to set up virtual python environments easier.
  2. Build the base environment. From within the pykanidm directory, run: poetry install This'll set up a virtual environment and install all the required packages (and development-related ones)
  3. Start editing!

Most IDEs will be happier if you open the kanidm_rlm_python or pykanidm directories as the base you are working from, rather than the kanidm repository root, so they can auto-load integrations etc.

Building the documentation

To build a static copy of the docs, run:

make docs/pykanidm/build

You can also run a local live server by running:

make docs/pykanidm/serve

This'll expose a web server at http://localhost:8000.

RADIUS Module Development

Setting up a dev environment has some extra complexity due to the mono-repo design.

  1. Install poetry: python -m pip install poetry. This is what we use to manage the packages, and allows you to set up virtual python environments easier.
  2. Build the base environment. From within the kanidm_rlm_python directory, run: poetry install
  3. Install the kanidm python library: poetry run python -m pip install ../pykanidm
  4. Start editing!

Most IDEs will be happier if you open the kanidm_rlm_python or pykanidm directories as the base you are working from, rather than the kanidm repository root, so they can auto-load integrations etc.

Running a test RADIUS container

From the root directory of the Kanidm repository:

  1. Build the container - this'll give you a container image called kanidm/radius with the tag devel:
make build/radiusd
  1. Once the process has completed, check the container exists in your docker environment:
➜ docker image ls kanidm/radius
REPOSITORY      TAG       IMAGE ID       CREATED              SIZE
kanidm/radius   devel     5dabe894134c   About a minute ago   622MB

Note: If you're just looking to play with a pre-built container, images are also automatically built based on the development branch and available at ghcr.io/kanidm/radius:devel

  1. Generate some self-signed certificates by running the script - just hit enter on all the prompts if you don't want to customise them. This'll put the files in /tmp/kanidm:
./insecure_generate_tls.sh
  1. Run the container:
cd kanidm_rlm_python && ./run_radius_container.sh

You can pass the following environment variables to run_radius_container.sh to set other options:

  • IMAGE: an alternative image such as ghcr.io/kanidm/radius:devel
  • CONFIG_FILE: mount your own config file

For example:

IMAGE=ghcr.io/kanidm/radius:devel \
    CONFIG_FILE=~/.config/kanidm \
    ./run_radius_container.sh

Testing authentication

Authentication can be tested through the client.localhost Network Access Server (NAS) configuration with:

docker exec -i -t radiusd radtest \
    <username> badpassword \
    127.0.0.1 10 testing123

docker exec -i -t radiusd radtest \
    <username> <radius show_secret value here> \
    127.0.0.1 10 testing123

Packaging

Packages are known to exist for the following distributions:

To ease packaging for your distribution, the Makefile has targets for sets of binary outputs.

TargetDescription
release/kanidmKanidm's CLI
release/kanidmdThe server daemon
release/kanidm-sshSSH-related utilities
release/kanidm-unixdUNIX tools, PAM/NSS modules

Debian / Ubuntu Packaging

Building packages

This happens in Docker currently, and here's some instructions for doing it for Ubuntu:

  1. Start in the root directory of the repository.
  2. Run ./platform/debian/ubuntu_docker_builder.sh This'll start a container, mounting the repository in ~/kanidm/ and installing dependencies via ./scripts/install_ubuntu_dependencies.sh.
  3. Building packages uses make, get a list by running make -f ./platform/debian/Makefile help
  4. So if you wanted to build the package for the Kanidm CLI, run make -f ./platform/debian/Makefile debs/kanidm.
  5. The package will be copied into the target directory of the repository on the docker host - not just in the container.

Adding a package

There's a set of default configuration files in packaging/; if you want to add a package definition, add a folder with the package name and then files in there will be copied over the top of the ones from packaging/ on build.

You'll need two custom files at minimum:

  • control - a file containing information about the package.
  • rules - a makefile doing all the build steps.

There's a lot of other files that can go into a .deb, some handy ones are:

FilenameWhat it does
preinstRuns before installation occurs
postrmRuns after removal happens
prermRuns before removal happens - handy to shut down services.
postinstRuns after installation occurs - we're using that to show notes to users