.. meta::
   :description: Documentation on how to setup and use containerized AI models in Konfuzio.

.. _Containerized AI models:

# Containerized AI models in Konfuzio

Konfuzio version `released-2024-08-11_09-56-15` introduces containerized AI models, thanks to the underlying open source BentoML framework. While from a user point of view this change is transparent, developers and on-premise users should be aware of the differences and implications of this change.

## Motivation

Compared to pickle files, containerization offers several advantages when deploying your AI models:

- **Isolation**: AI models are run in separate containers, ensuring that they do not interfere with each other or with the Konfuzio environment.
- **Scalability**: Containers can be easily scaled up or down, depending on the load.
- **Portability**: Containers bundle their dependencies and their original project metadata, ensuring that they can be moved across environments without issues.
- **Consistency**: The AI model is run in the same environment it was trained in, ensuring that the results are consistent.
- **Security**: Containers can be run in a secure environment, with access controls and other security measures in place.
- **Resource management**: Containers can be run with resource limits, ensuring that they do not consume too many resources.

Containerized AI models can be run either with standard Docker Compose setups or in a Kubernetes cluster. Some parts of this documentation will be specific to Kubernetes setups, but the general principles apply to both setups.

See the [Konfuzio blog](https://konfuzio.com/en/containerization/) for a more detailed overview.

## Overview of the Bento format

### Archives

AI models produced with an SDK version higher than 0.3.12 and a Konfuzio Server version higher than `released-2024-08-11_09-56-15` can produce a `.bento` file instead of a `.pkl` file. This is a compressed archive that internally still includes the `.pkl` file, together with other files:

- **bento.yaml**: metadata for the bento archive.
- **openapi.yaml**: machine parsable API specification for the AI service.
- **env/docker/**: Dockerfile and entrypoint to build the AI model into a Docker image.
- **env/python/**: Python version and dependency versions.
- **models/**: Contains the Python pickle file(s) for the AI model.
- **src/categories_and_labels_data.json5**: Metadata for the project this AI was trained on.
- **src/extraction/**: Python code and utility functions that exposes the AI model as a REST API service.

When uploading or downloading AI models between instances or environments, `.bento` file are now the default.

You can find more information about the creation and serving of `.bento` files from the Konfuzio SDK in the [Konfuzio SDK documentation](https://dev.konfuzio.com/sdk/explanations.html#containerization-of-ais).

### Containers

`.bento` archives include a generated `Dockerfile` that is used to create a Docker image from the AI model.

*Kubernetes only:* The [Konfuzio AI Deployment Operator](#operator) automatically builds `.bento` archives into Docker containers, but this operation can also be done locally starting from the `.bento` file, running `bentoml containerize` (see [docs](https://docs.bentoml.com/en/latest/guides/containerization.html)).

Image storage (such as a Docker registry) is treated similarly to a cache layer: Bento images are built on demand from `.bento` files, but Konfuzio server will try pulling a pre-built image from the registry (for Kubernetes deployments) or reusing an already built one (in case of Docker deployments) before resorting to building a `.bento`.

### Services

Since the Konfuzio server and the AI models no longer share a runtime or code, communication between them happens via REST API, with predictable JSON requests and predictable JSON responses.

A full specification of available input/output interfaces for Konfuzio AI models can be found in [our documentation](https://dev.konfuzio.com/sdk/sourcecode.html#ai-containerization).

Custom AI models running arbitrary code (even non-Python code) can be created as long as they expose an HTTP service that conforms to one of the interfaces supported by Konfuzio.

### Operator

*Kubernetes only:* The Konfuzio AI Deployment Operator is a Kubernetes operator that manages the lifecycle of BentoML deployments in a Kubernetes cluster. It is responsible for creating, updating, and deleting BentoML deployments, as well as monitoring their health and status.

It's composed of:

- **KonfuzioAIDeployment**: a custom resource definition (CRD) that defines the desired state of a BentoML deployment.
- **Controller**: a pod that watches for changes to KonfuzioAIDeployment resources and takes action to ensure that the actual state of the deployment matches the desired state.
- **Builder**: a component that builds the Docker image on demand for the deployment from the `.bento` file.
- **Buildkit Daemon**: a component that runs the build process in a separate container, allowing for better resource management and isolation.

When a `KonfuzioAIDeployment` resource is created, the operator will:

1. Check if the Docker image for the deployment already exists in the registry.
2. If it does not exist, it will build the image from the `.bento` file using the buildkit daemon, and push it to the registry.
3. Once the image is available, the operator will create a Kubernetes deployment for the BentoML service, using the image and the configuration specified in the KonfuzioAIDeployment resource.
4. The operator will monitor the health and status of the deployment, and take action if it becomes unhealthy or fails.
5. If the deployment is deleted, usually because of inactivity, the operator will remove the associated Kubernetes resources.
6. If the deployment is updated, the operator will stop the old deployment and create a new one with the updated configuration.

In a Kubernetes setup, the Konfuzio server will communicate with the operator to create and manage the deployments. The operator will handle the details of building and deploying the Docker images.

The following diagram shows the architecture of the Konfuzio AI Deployment Operator:

..mermaid::

  flowchart TB
    classDef server stroke-width:2px,font-weight:bold
    classDef container stroke-width:1px,shape:subprocess
    classDef process stroke-width:1px
    classDef decision stroke-width:1px,shape:diamond
    classDef database stroke-width:1px,shape:cylinder
    classDef endpoint stroke-width:1px,shape:rect
    
    subgraph Konfuzio["Konfuzio Server"]
        Z["Document"]@{ shape: doc } --> A
        A[("Konfuzio Server")]:::server --> A1
        A1{"AI deployment<br>exists in cluster?"}:::decision
    end
    
    subgraph Kubernetes["Kubernetes Cluster"]
        B["KonfuzioAIDeployment<br>of mymodel.bento"]:::process
        B1["Operator Instance"]@{ shape: cylinder }
        C["Operator detects<br>new deployment"]:::process --> D
        D{"mymodel.bento<br>exists in registry?"}:::decision

        subgraph Builder["Builder"]
            E["Builder Instance"]@{ shape: cylinder } --> F1
            F1["Download .bento file"]:::process --> F2
            F2["Build Docker image"]:::process --> F3
            F3["Push image"]:::process
        end
        
        G["Operator<br>deploys AI Model"]:::process --> H
        H["Pull image from<br>Docker registry"]:::process --> I
        
        subgraph AI["AI deployment"]
            I["Pod with AI Model"]@{ shape: cylinder }
            J["Receive HTTP request<br>with document data"]:::process --> K
            K["AI<br>extraction<br>process"]@{ shape: extract } --> L
            L["Return AI predictions<br>as JSON response"]:::process
        end

        N{"More documents<br>to process?"}:::decision
        O["Wait for timeout"]@{ shape: delay }
        M["Resource<br>Termination"]@{ shape: double-circle }
    end
    
    subgraph Registry["Container Registry"]
        REGISTRY[("Docker Registry")]:::database
    end
    
    A1 -->|"No"| B
    A1 -->|"Yes"| I
    B --> C
    B1 -.- C
    B1 -.- G
    D -->|"Yes"| G
    D -->|"No"| E
    F3 --> REGISTRY
    F3 --> G
    REGISTRY --> H
    I --> J
    L --> A
    L --> N
    N -->|"Yes"| Z
    N -->|"No"| O
    O --> M
    A --> J

## Setup

Containerized functionality is currently experimental and can be enabled per-project by an administrator in the Superuser Project section of the interface. The default mode for both bulding and serving is "Python Object" (same as before); to enable "Containerized" mode additional setup is required (see the following sections).

"Containerized" mode can be enabled separately for building and serving AI models:

| Build mode    | Serve mode    | Notes                                                                                                                                                 |
| ------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| Python Object | Python Object | Same as before: training generates a pickle file which is used directly when using the AI model.                                                      |
| Containerized | Containerized | Training generates a .bento file which is containerized on demand when using the AI model.                                                            |
| Containerized | Python Object | Training generates a .bento file; when using the AI model, the pickle file contained inside the .bento is used directly, mimicking previous behavior. |
| Python Object | Containerized | Not supported.                                                                                                                                        |

In addition, you can use the following environment variables to determine which mode *new* projects should use by default:

- `NEW_PROJECT_DEFAULT_BUILD_MODE`
- `NEW_PROJECT_DEFAULT_SERVE_MODE`

For both of these variables, `0` (default) is Python Object mode, and `1` is Containerized mode. **These environment variables will not change the settings of existing projects.**

### Additional steps for Kubernetes setup

*Kubernetes only:* If you're planning to use Kubernetes to serve your AI models, you will need to set up the Konfuzio AI Deployment Operator. This service is responsible for building and serving the AI models, and can be installed in the same Kubernetes cluster as the Konfuzio server or in a different one.

The operator is installed through the Konfuzio server helm chart, and can be enabled by setting the `enableOperator` value to `true` in the `values.yaml` file. This will install the operator in the same namespace as the Konfuzio server.

### Build mode

You can either connect Konfuzio server to Docker and build your Bento archives locally, or defer the building to a remote service.

#### Local build

**Recommended for Docker and Docker Compose setups.**

These environment settings should be added:

```
BENTO_CONTAINER_BUILD_MODE=local
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket
```

Built images will be stored inside the specified Docker instance. No image registry is required. Note that this means that with workers running on multiple machines, each machine will have a separate build cache.

When using a containerized AI model, the Konfuzio server will:

- Check if a built Docker image exists already in the Docker instance from `DOCKER_HOST`;
- If it does not exist, it will build an image for the AI model using the Docker instance from `DOCKER_HOST`.

#### Remote build

**Recommended for Kubernetes setups.**

These environment settings should be added:

```
BENTO_CONTAINER_BUILD_MODE=remote
```

Additionally, you should configure the [Konfuzio AI Deployment Operator](#operator). The Konfuzio server will automatically use the operator to build and serve the images.

### Serve mode

You can either connect Konfuzio server to Docker and use it to serve the AI models, or use a Kubernetes cluster to serve them remotely, even in a separate environment.

#### Local serve

**Recommended for Docker and Docker Compose setups.**

These environment settings should be added:

```
BENTO_SERVING_MODE=local
BENTO_SERVING_SHUTDOWN_TIMEOUT=0
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket
DOCKER_NETWORK=mynetwork  # optional, connects the served containers to the specified network
```

Note that whichever path you use for `DOCKER_HOST` should be mounted inside the Konfuzio container from the Docker host, and that this gives permissions to the Konfuzio container to start up and shut down Docker containers on the host.

When using a containerized AI model, the Konfuzio server will:

- Check that a container running this AI model exists already (if `BENTO_SERVING_SHUTDOWN_TIMEOUT > 0`), and reuse it if this the case;
- If a container does not exist already, it will start it in the Docker instance from `DOCKER_HOST`;
- It will communicate with the Docker instance via HTTP;
- Once everything is done, it will shutdown the container (if `BENTO_SERVING_SHUTDOWN_TIMEOUT == 0`).

#### Remote serve

**Recommended for Kubernetes setups.**

These environment settings should be added:

```
BENTO_CONTAINER_BUILD_MODE=remote
BENTO_SERVING_REMOTE_KUBE_CONFIG=/home/.kube/config  # you can set this to the path to the kubeconfig file for a remote Kubernetes cluster. If not set, we will use the "in cluster" configuration
```

Additionally, you should configure the [Konfuzio AI Deployment Operator](#operator). The Konfuzio server will automatically use the operator to build and serve the images.

##### Additional Kubernetes settings

The "AI model service settings" field, present in the Superuser AI Models section, can be used to pass additional options to the [spec field of the KonfuzioAIDeployment](#operator). For example, you can configure your AI model to use more pods or resources by setting it to something like this:

```
{
  "autoscaling": {
    "maxReplicas": 4,
    "minReplicas": 2,
    "metrics": [
      {
        "type": "Resource",
        "resource": {
          "name": "cpu",
          "target": {
            "type": "Utilization",
            "averageUtilization": 60
          }
        }
      }
    ]
  },
  "resources": {
    "limits": {
      "memory": "8Gi"
    },
    "requests": {
      "memory": "1Gi",
      "cpu": "200m"
    }
  }
}
```

#### Strategies for starting and shutting down containers

In both local and remote serve, a strategy can be chosen on the AI model level to determine when and whether the relevant container is shutdown:

- **Shuts down after timeout** (`TIMEOUT`): follows the `BENTO_SERVING_SHUTDOWN_TIMEOUT` environment variable, which determines how long the container should be kept alive after the last request, unless `BENTO_SERVING_SHUTDOWN_TIMEOUT` is 0, in which case it is the same as `IMMEDIATE`.
- **Shuts down immediately after processing** (`IMMEDIATE`): the container is shut down immediately after the request is processed, regardless of `BENTO_SERVING_SHUTDOWN_TIMEOUT` setting. This means that if there are *n* concurrent extractions on the same AI model, *n* independent containers will start up and shut down once the extraction is done. This is functionally equivalent to the previous behavior of the Konfuzio server in Python Object mode.
- **Always online** (`ALWAYS_ON`): the container will not be shut down after the request is processed, and will be kept alive indefinitely.

The Konfuzio server always checks if a container exists before starting a new one for `TIMEOUT` and `ALWAYS_ON`, and will reuse it if it does. `IMMEDIATE` will always start a new container.

(Kubernetes only) The number of containers for the same AI model can be controlled by the `maxReplicas` and `minReplicas` fields in the `autoscaling` section of the [AI model service settings](#additional-kubernetes-settings).

Changing the strategy setting on an AI model, making it inactive, or deleting it, will stop all running containers for that AI model.

#### Custom service URL

If you have a service conforming to the Konfuzio specification that resides outside the Konfuzio server environment (i.e. a different cloud provider, or even your own computer) you can bypass the Konfuzio deploying mechanism altogether:

- Create an AI model via the Superuser AI Models interface.
- Set the _AI model service URL_ field to the **publicly accessible** URL of the deployed Konfuzio AI model.

The Konfuzio instance will make a POST request to this service, according to the specified request type, and process the response data as it would do within a local/remote service.

##### Access restrictions in custom services

To prevent unauthorized access to your custom service running outside of Konfuzio, you can use a combination of different strategies:

1. **Firewall your service**. Using a reverse proxy on top of it, or a private network, you can restrict access to the IP(s) of the Konfuzio server, and deny all other requests.
2. **Implement custom authentication**. By default, [project credentials](https://help.konfuzio.com/modules/projects/index.html?highlight=credentials#project-credentials) defined in the project settings are passed as HTTP headers in the POST calls made to the service, with the addition of an `ENV_` prefix. For example, if you define a project credential called `AUTHENTICATION` with value of `mypassword`, the Konfuzio server will add `ENV_AUTHENTICATION: mypassword` to the headers of all extraction requests to the service. You can then parse these headers directly in your AI service (if using a custom one) or in your reverse proxy to implement custom logic for authenticating these calls.

### Cron jobs

A cron job initiated by Celery will check every minute for containers that need to be removed (if `BENTO_SERVING_SHUTDOWN_TIMEOUT > 0`). Make sure that the Celery Beat worker is enabled on your Konfuzio setup.

### Project metadata

AI models from the Konfuzio platform get saved with a `categories_and_labels_data.json5` file that includes the project metadata the AI model was originally trained on. This file can be used when moving the AI model across environments to ensure that the destination environment contains all the necessary structure for the AI model to work correctly.

This information gets parsed and saved automatically when you upload a `.bento` file manually on a Konfuzio instance. On the settings for the destination Project, you can enable the _Create labels and label sets_ option to automatically create missing labels and label sets according to the AI's metadata.

When this option is enabled, existing Labels and Label Sets will be matched with the ones in the AI's metadata information, and ones that do not exist will automatically be created and associated to the relevant Category/Label Set. The matching happens via name: renamed Labels and Label Sets will _not_ be matched and will be recreated instead. Moreover, information about types and whether Labels can appear multiple times will not be changed if already existing.

The flowchart below shows how the project metadata is used to create the necessary structure for the AI model to work correctly.

.. mermaid::

  flowchart TD
    A[Bento build]
    B[categories_and_labels_data.json<br>created<br><small>information comes from the Project instance of the pickle file</small>]

    C[Resave with bento]
    C-->B
    A-->B
    B-->D

    D[Project metadata is stored<br>on the AI model in the db]

    E[Extract a Document<br>with Bento via API]
    E-->I
    I-->I1
    I-->I2
    I1-->F
    I2-->L
    I[<code>create_labels_and_templates?</code>]
    I1[True]
    I2[False]
    F[<code>create_structure_from_schema</code><br>is called to generate <code>mappings</code><br>from the AI Project metadata<br>to the current Project state from db]
    G[<code>mappings</code> are stored<br>on the AI model in the db]
    H[<code>mappings</code> are used by<br><code>convert_response_to_annotations</code><br>to resolve Label/Label Set IDs]
    L[IDs of Labels/Label Sets between the current project and the project in the AI model match, so no mapping needed]

    D-->F
    F-->G
    G-->H

## Upgrade strategies

If you have an existing Konfuzio installation, when upgrading to `released-2024-08-11_09-56-15` nothing will change by default, and the server will keep using pickle files for building and serving.

However, when upgrading to VERSION, which uses Python 3.11 instead of Python 3.8, models that are not containerized and still use pickle files will stop working. To prevent this, you can follow the strategies outlined below.

We recommend familiarizing yourself with the new concepts and start generating new AI models using Bento already today. In the next sections we will outline some possible strategies to ensure a smooth upgrade. You can also [contact us](https://konfuzio.com/support/) if you wish to discuss additional upgrade support and/or custom migration strategies.

### Enable Containerized build mode

You can start generating Bento files for new AI models without additional setup, by changing the [build mode](#build-mode) to Containerized and keeping the [serve mode](#serve-mode) as Python Object. Bento archives will be generated, but only the pickle file inside of them will be used, mirroring previous behavior.

Once you've done the additional [setup](#setup) necessary to run containerized AIs, you can change your serve mode to Bento and containers will be generated from the previously created Bento archives.

### Convert existing AI models

The `save_with_bento` command line utility can be used to convert existing pickles into Containerized AI models, which makes them future-proof for future upgrades. Run `python manage.py save_with_bento --help` for additional information.

#### Containerize Python 3.8 models when Konfuzio server already runs Python 3.11

If you have a Konfuzio server running Python 3.11 and you have AI models that were trained with Python 3.8, that were not containerized prior to the upgrade, you can use a Docker command running a Python 3.8 instance of the Konfuzio server to containerize them:

```
docker run \
  --env-file <path to the env file> \
  git.konfuzio.com:5050/konfuzio/text-annotation/master:released-2025-02-20_10-34-28 \
  python manage.py save_with_bento <ids of the AI models to convert>
```

This will spin up a temporary Konfuzio server running Python 3.8, convert the AI models to Bento, and save them back to the database. Ensure that tne env file contains the necessary environment variables for the Konfuzio server to run (i.e., the same variables you currently use to run the Konfuzio server).

### Leave legacy AI models behind

For some very old (pre-Konfuzio SDK v0.2) models, conversion is not possible. In these cases, training new AI models (with Containerized build mode enabled) is the recommended option.

## Additional resources

- [BentoML documentation](https://docs.bentoml.com/en/latest/)
- [Konfuzio SDK documentation for AI containerization](https://dev.konfuzio.com/sdk/explanations.html#containerization-of-ais)
- [Konfuzio SDK documentation for supported schemas](https://dev.konfuzio.com/sdk/sourcecode.html#ai-containerization)
- [Konfuzio environment variables documentation](https://dev.konfuzio.com/web/on_premises.html#environment-variables-for-konfuzio-server)
- [Konfuzio blog post on containerization](https://konfuzio.com/en/containerization/)
- [Konfuzio support](https://konfuzio.com/support/)