Containerized AI models in Konfuzio

Konfuzio version released-2024-08-11_09-56-15 introduces containerized AI models, thanks to the underlying open source BentoML framework. While from a user point of view this change is transparent, developers and on-premise users should be aware of the differences and implications of this change.

Motivation

Compared to pickle files, containerization offers several advantages when deploying your AI models:

  • Isolation: AI models are run in separate containers, ensuring that they do not interfere with each other or with the Konfuzio environment.

  • Scalability: Containers can be easily scaled up or down, depending on the load.

  • Portability: Containers bundle their dependencies and their original project metadata, ensuring that they can be moved across environments without issues.

  • Consistency: The AI model is run in the same environment it was trained in, ensuring that the results are consistent.

  • Security: Containers can be run in a secure environment, with access controls and other security measures in place.

  • Resource management: Containers can be run with resource limits, ensuring that they do not consume too many resources.

Containerized AI models can be run either with standard Docker Compose setups or in a Kubernetes cluster. Some parts of this documentation will be specific to Kubernetes setups, but the general principles apply to both setups.

See the Konfuzio blog for a more detailed overview.

Overview of the Bento format

Archives

AI models produced with an SDK version higher than 0.3.12 and a Konfuzio Server version higher than released-2024-08-11_09-56-15 can produce a .bento file instead of a .pkl file. This is a compressed archive that internally still includes the .pkl file, together with other files:

  • bento.yaml: metadata for the bento archive.

  • openapi.yaml: machine parsable API specification for the AI service.

  • env/docker/: Dockerfile and entrypoint to build the AI model into a Docker image.

  • env/python/: Python version and dependency versions.

  • models/: Contains the Python pickle file(s) for the AI model.

  • src/categories_and_labels_data.json5: Metadata for the project this AI was trained on.

  • src/extraction/: Python code and utility functions that exposes the AI model as a REST API service.

When uploading or downloading AI models between instances or environments, .bento file are now the default.

You can find more information about the creation and serving of .bento files from the Konfuzio SDK in the Konfuzio SDK documentation.

Containers

.bento archives include a generated Dockerfile that is used to create a Docker image from the AI model.

Kubernetes only: The Konfuzio AI Deployment Operator automatically builds .bento archives into Docker containers, but this operation can also be done locally starting from the .bento file, running bentoml containerize (see docs).

Image storage (such as a Docker registry) is treated similarly to a cache layer: Bento images are built on demand from .bento files, but Konfuzio server will try pulling a pre-built image from the registry (for Kubernetes deployments) or reusing an already built one (in case of Docker deployments) before resorting to building a .bento.

Services

Since the Konfuzio server and the AI models no longer share a runtime or code, communication between them happens via REST API, with predictable JSON requests and predictable JSON responses.

A full specification of available input/output interfaces for Konfuzio AI models can be found in our documentation.

Custom AI models running arbitrary code (even non-Python code) can be created as long as they expose an HTTP service that conforms to one of the interfaces supported by Konfuzio.

Operator

Kubernetes only: The Konfuzio AI Deployment Operator is a Kubernetes operator that manages the lifecycle of BentoML deployments in a Kubernetes cluster. It is responsible for creating, updating, and deleting BentoML deployments, as well as monitoring their health and status.

It’s composed of:

  • KonfuzioAIDeployment: a custom resource definition (CRD) that defines the desired state of a BentoML deployment.

  • Controller: a pod that watches for changes to KonfuzioAIDeployment resources and takes action to ensure that the actual state of the deployment matches the desired state.

  • Builder: a component that builds the Docker image on demand for the deployment from the .bento file.

  • Buildkit Daemon: a component that runs the build process in a separate container, allowing for better resource management and isolation.

When a KonfuzioAIDeployment resource is created, the operator will:

  1. Check if the Docker image for the deployment already exists in the registry.

  2. If it does not exist, it will build the image from the .bento file using the buildkit daemon, and push it to the registry.

  3. Once the image is available, the operator will create a Kubernetes deployment for the BentoML service, using the image and the configuration specified in the KonfuzioAIDeployment resource.

  4. The operator will monitor the health and status of the deployment, and take action if it becomes unhealthy or fails.

  5. If the deployment is deleted, usually because of inactivity, the operator will remove the associated Kubernetes resources.

  6. If the deployment is updated, the operator will stop the old deployment and create a new one with the updated configuration.

In a Kubernetes setup, the Konfuzio server will communicate with the operator to create and manage the deployments. The operator will handle the details of building and deploying the Docker images.

The following diagram shows the architecture of the Konfuzio AI Deployment Operator:

..mermaid:

flowchart TB
  classDef server stroke-width:2px,font-weight:bold
  classDef container stroke-width:1px,shape:subprocess
  classDef process stroke-width:1px
  classDef decision stroke-width:1px,shape:diamond
  classDef database stroke-width:1px,shape:cylinder
  classDef endpoint stroke-width:1px,shape:rect

  subgraph Konfuzio["Konfuzio Server"]
      Z["Document"]@{ shape: doc } --> A
      A[("Konfuzio Server")]:::server --> A1
      A1{"AI deployment<br>exists in cluster?"}:::decision
  end

  subgraph Kubernetes["Kubernetes Cluster"]
      B["KonfuzioAIDeployment<br>of mymodel.bento"]:::process
      B1["Operator Instance"]@{ shape: cylinder }
      C["Operator detects<br>new deployment"]:::process --> D
      D{"mymodel.bento<br>exists in registry?"}:::decision

      subgraph Builder["Builder"]
          E["Builder Instance"]@{ shape: cylinder } --> F1
          F1["Download .bento file"]:::process --> F2
          F2["Build Docker image"]:::process --> F3
          F3["Push image"]:::process
      end

      G["Operator<br>deploys AI Model"]:::process --> H
      H["Pull image from<br>Docker registry"]:::process --> I

      subgraph AI["AI deployment"]
          I["Pod with AI Model"]@{ shape: cylinder }
          J["Receive HTTP request<br>with document data"]:::process --> K
          K["AI<br>extraction<br>process"]@{ shape: extract } --> L
          L["Return AI predictions<br>as JSON response"]:::process
      end

      N{"More documents<br>to process?"}:::decision
      O["Wait for timeout"]@{ shape: delay }
      M["Resource<br>Termination"]@{ shape: double-circle }
  end

  subgraph Registry["Container Registry"]
      REGISTRY[("Docker Registry")]:::database
  end

  A1 -->|"No"| B
  A1 -->|"Yes"| I
  B --> C
  B1 -.- C
  B1 -.- G
  D -->|"Yes"| G
  D -->|"No"| E
  F3 --> REGISTRY
  F3 --> G
  REGISTRY --> H
  I --> J
  L --> A
  L --> N
  N -->|"Yes"| Z
  N -->|"No"| O
  O --> M
  A --> J

Setup

Containerized functionality is currently experimental and can be enabled per-project by an administrator in the Superuser Project section of the interface. The default mode for both bulding and serving is “Python Object” (same as before); to enable “Containerized” mode additional setup is required (see the following sections).

“Containerized” mode can be enabled separately for building and serving AI models:

Build mode

Serve mode

Notes

Python Object

Python Object

Same as before: training generates a pickle file which is used directly when using the AI model.

Containerized

Containerized

Training generates a .bento file which is containerized on demand when using the AI model.

Containerized

Python Object

Training generates a .bento file; when using the AI model, the pickle file contained inside the .bento is used directly, mimicking previous behavior.

Python Object

Containerized

Not supported.

In addition, you can use the following environment variables to determine which mode new projects should use by default:

  • NEW_PROJECT_DEFAULT_BUILD_MODE

  • NEW_PROJECT_DEFAULT_SERVE_MODE

For both of these variables, 0 (default) is Python Object mode, and 1 is Containerized mode. These environment variables will not change the settings of existing projects.

Additional steps for Kubernetes setup

Kubernetes only: If you’re planning to use Kubernetes to serve your AI models, you will need to set up the Konfuzio AI Deployment Operator. This service is responsible for building and serving the AI models, and can be installed in the same Kubernetes cluster as the Konfuzio server or in a different one.

The operator is installed through the Konfuzio server helm chart, and can be enabled by setting the enableOperator value to true in the values.yaml file. This will install the operator in the same namespace as the Konfuzio server.

Build mode

You can either connect Konfuzio server to Docker and build your Bento archives locally, or defer the building to a remote service.

Local build

Recommended for Docker and Docker Compose setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=local
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket

Built images will be stored inside the specified Docker instance. No image registry is required. Note that this means that with workers running on multiple machines, each machine will have a separate build cache.

When using a containerized AI model, the Konfuzio server will:

  • Check if a built Docker image exists already in the Docker instance from DOCKER_HOST;

  • If it does not exist, it will build an image for the AI model using the Docker instance from DOCKER_HOST.

Remote build

Recommended for Kubernetes setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=remote

Additionally, you should configure the Konfuzio AI Deployment Operator. The Konfuzio server will automatically use the operator to build and serve the images.

Serve mode

You can either connect Konfuzio server to Docker and use it to serve the AI models, or use a Kubernetes cluster to serve them remotely, even in a separate environment.

Local serve

Recommended for Docker and Docker Compose setups.

These environment settings should be added:

BENTO_SERVING_MODE=local
BENTO_SERVING_SHUTDOWN_TIMEOUT=0
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket
DOCKER_NETWORK=mynetwork  # optional, connects the served containers to the specified network

Note that whichever path you use for DOCKER_HOST should be mounted inside the Konfuzio container from the Docker host, and that this gives permissions to the Konfuzio container to start up and shut down Docker containers on the host.

When using a containerized AI model, the Konfuzio server will:

  • Check that a container running this AI model exists already (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0), and reuse it if this the case;

  • If a container does not exist already, it will start it in the Docker instance from DOCKER_HOST;

  • It will communicate with the Docker instance via HTTP;

  • Once everything is done, it will shutdown the container (if BENTO_SERVING_SHUTDOWN_TIMEOUT == 0).

Remote serve

Recommended for Kubernetes setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=remote
BENTO_SERVING_REMOTE_KUBE_CONFIG=/home/.kube/config  # you can set this to the path to the kubeconfig file for a remote Kubernetes cluster. If not set, we will use the "in cluster" configuration

Additionally, you should configure the Konfuzio AI Deployment Operator. The Konfuzio server will automatically use the operator to build and serve the images.

Additional Kubernetes settings

The “AI model service settings” field, present in the Superuser AI Models section, can be used to pass additional options to the spec field of the KonfuzioAIDeployment. For example, you can configure your AI model to use more pods or resources by setting it to something like this:

{
  "autoscaling": {
    "maxReplicas": 4,
    "minReplicas": 2,
    "metrics": [
      {
        "type": "Resource",
        "resource": {
          "name": "cpu",
          "target": {
            "type": "Utilization",
            "averageUtilization": 60
          }
        }
      }
    ]
  },
  "resources": {
    "limits": {
      "memory": "8Gi"
    },
    "requests": {
      "memory": "1Gi",
      "cpu": "200m"
    }
  }
}

Strategies for starting and shutting down containers

In both local and remote serve, a strategy can be chosen on the AI model level to determine when and whether the relevant container is shutdown:

  • Shuts down after timeout (TIMEOUT): follows the BENTO_SERVING_SHUTDOWN_TIMEOUT environment variable, which determines how long the container should be kept alive after the last request, unless BENTO_SERVING_SHUTDOWN_TIMEOUT is 0, in which case it is the same as IMMEDIATE.

  • Shuts down immediately after processing (IMMEDIATE): the container is shut down immediately after the request is processed, regardless of BENTO_SERVING_SHUTDOWN_TIMEOUT setting. This means that if there are n concurrent extractions on the same AI model, n independent containers will start up and shut down once the extraction is done. This is functionally equivalent to the previous behavior of the Konfuzio server in Python Object mode.

  • Always online (ALWAYS_ON): the container will not be shut down after the request is processed, and will be kept alive indefinitely.

The Konfuzio server always checks if a container exists before starting a new one for TIMEOUT and ALWAYS_ON, and will reuse it if it does. IMMEDIATE will always start a new container.

(Kubernetes only) The number of containers for the same AI model can be controlled by the maxReplicas and minReplicas fields in the autoscaling section of the AI model service settings.

Changing the strategy setting on an AI model, making it inactive, or deleting it, will stop all running containers for that AI model.

Custom service URL

If you have a service conforming to the Konfuzio specification that resides outside the Konfuzio server environment (i.e. a different cloud provider, or even your own computer) you can bypass the Konfuzio deploying mechanism altogether:

  • Create an AI model via the Superuser AI Models interface.

  • Set the AI model service URL field to the publicly accessible URL of the deployed Konfuzio AI model.

The Konfuzio instance will make a POST request to this service, according to the specified request type, and process the response data as it would do within a local/remote service.

Access restrictions in custom services

To prevent unauthorized access to your custom service running outside of Konfuzio, you can use a combination of different strategies:

  1. Firewall your service. Using a reverse proxy on top of it, or a private network, you can restrict access to the IP(s) of the Konfuzio server, and deny all other requests.

  2. Implement custom authentication. By default, project credentials defined in the project settings are passed as HTTP headers in the POST calls made to the service, with the addition of an ENV_ prefix. For example, if you define a project credential called AUTHENTICATION with value of mypassword, the Konfuzio server will add ENV_AUTHENTICATION: mypassword to the headers of all extraction requests to the service. You can then parse these headers directly in your AI service (if using a custom one) or in your reverse proxy to implement custom logic for authenticating these calls.

Cron jobs

A cron job initiated by Celery will check every minute for containers that need to be removed (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0). Make sure that the Celery Beat worker is enabled on your Konfuzio setup.

Project metadata

AI models from the Konfuzio platform get saved with a categories_and_labels_data.json5 file that includes the project metadata the AI model was originally trained on. This file can be used when moving the AI model across environments to ensure that the destination environment contains all the necessary structure for the AI model to work correctly.

This information gets parsed and saved automatically when you upload a .bento file manually on a Konfuzio instance. On the settings for the destination Project, you can enable the Create labels and label sets option to automatically create missing labels and label sets according to the AI’s metadata.

When this option is enabled, existing Labels and Label Sets will be matched with the ones in the AI’s metadata information, and ones that do not exist will automatically be created and associated to the relevant Category/Label Set. The matching happens via name: renamed Labels and Label Sets will not be matched and will be recreated instead. Moreover, information about types and whether Labels can appear multiple times will not be changed if already existing.

The flowchart below shows how the project metadata is used to create the necessary structure for the AI model to work correctly.

flowchart TD A[Bento build] B[categories_and_labels_data.json<br>created<br><small>information comes from the Project instance of the pickle file</small>] C[Resave with bento] C-->B A-->B B-->D D[Project metadata is stored<br>on the AI model in the db] E[Extract a Document<br>with Bento via API] E-->I I-->I1 I-->I2 I1-->F I2-->L I[<code>create_labels_and_templates?</code>] I1[True] I2[False] F[<code>create_structure_from_schema</code><br>is called to generate <code>mappings</code><br>from the AI Project metadata<br>to the current Project state from db] G[<code>mappings</code> are stored<br>on the AI model in the db] H[<code>mappings</code> are used by<br><code>convert_response_to_annotations</code><br>to resolve Label/Label Set IDs] L[IDs of Labels/Label Sets between the current project and the project in the AI model match, so no mapping needed] D-->F F-->G G-->H

Upgrade strategies

If you have an existing Konfuzio installation, when upgrading to VERSION nothing will change by default, and the server will keep using pickle files for building and serving.

However, at a later point in the future, Konfuzio server will stop supporting older Python versions, so these models will become unusable at that point.

We recommend familiarizing yourself with the new concepts and start generating new AI models using Bento already today. In the next sections we will outline some possible strategies to ensure a smooth upgrade. You can also contact us if you wish to discuss additional upgrade support and/or custom migration strategies.

Enable Containerized build mode

You can start generating Bento files for new AI models without additional setup, by changing the build mode to Containerized and keeping the serve mode as Python Object. Bento archives will be generated, but only the pickle file inside of them will be used, mirroring previous behavior.

Once you’ve done the additional setup necessary to run containerized AIs, you can change your serve mode to Bento and containers will be generated from the previously created Bento archives.

Convert existing AI models

The save_with_bento command line utility can be used to convert existing pickles into Containerized AI models, which makes them future-proof for future upgrades. Run python manage.py save_with_bento --help for additional information.

Leave legacy AI models behind

For some very old (pre-Konfuzio SDK v0.2) models, conversion is not possible. In these cases, training new AI models (with Containerized build mode enabled) is the recommended option.

Additional resources