Dockerized AI models in Konfuzio

Note

This section describes an upcoming feature that is not yet available in the current version of Konfuzio. Please refer to the changelog for the latest updates.

Konfuzio version released-2024-08-11_09-56-15 introduces dockerized AI models, thanks to the underlying open source BentoML framework. While from a user point of view this change is transparent, developers and on-premise users should be aware of the differences and implications of this change.

Dockerized AI models are currently available as beta functionality for Extraction AIs only. Dockerized Categorization AIs and Splitting AIs are coming later this year.

Motivation

Model containerization offers several advantages when deploying your AI models. See the Konfuzio blog for a more detailed overview.

Overview of the Bento format

Archives

AI models produced with an SDK version higher than 0.3.12 and a Konfuzio Server version higher than released-2024-08-11_09-56-15 can produce a .bento file instead of a .pkl file. This is a compressed archive that internally still includes the .pkl file, together with other files:

  • bento.yaml: metadata for the bento archive.

  • openapi.yaml: machine parsable API specification for the AI service.

  • env/docker/: Dockerfile and entrypoint to build the AI model into a Docker image.

  • env/python/: Python version and dependency versions.

  • models/: Contains the Python pickle file(s) for the AI model.

  • src/categories_and_labels_data.json5: Metadata for the project this AI was trained on.

  • src/extraction/: Python code and utility functions that exposes the AI model as a REST API service.

When uploading or downloading AI models between instances or environments, .bento file are now the default.

You can find more information about the creation and serving of .bento files from the Konfuzio SDK in the Konfuzio SDK documentation.

Containers

.bento archives include a generated Dockerfile that is used to create a Docker image from the AI model.

The Konfuzio server (or Yatai server, depending on the setup) automatically builds .bento archives into Docker containers, but this operation can also be done locally starting from the .bento file, running bentoml containerize (see docs).

Image storage (such as a Docker registry) is treated similarly to a cache layer: Bento images are built on demand from .bento files, but Konfuzio server will try pulling a pre-built image from the registry (for Kubernetes deployments) or reusing an already built one (in case of Docker deployments) before resorting to building a .bento.

Services

Since the Konfuzio server and the AI models no longer share a runtime or code, communication between them happens via REST API, with predictable JSON requests and predictable JSON responses.

A full specification of available input/output interfaces for Konfuzio AI models can be found in our documentation.

Custom AI models running arbitrary code (even non-Python code) can be created as long as they expose an HTTP service that conforms to one of the interfaces supported by Konfuzio.

Setup

Bento functionality is currently experimental and can be enabled per-project by an administrator in the Superuser Project section of the interface. The default mode for both bulding and serving is “Pickle” (same as before); to enable “Bento” mode additional setup is required (see the following sections).

“Bento” mode can be enabled separately for building and serving AI models:

Build mode

Serve mode

Notes

Pickle

Pickle

Same as before: training generates a pickle file which is used directly when using the AI model.

Bento

Bento

Training generates a .bento file which is containerized on demand when using the AI model.

Bento

Pickle

Training generates a .bento file; when using the AI model, the pickle file contained inside the .bento is used directly, mimicking previous behavior.

Pickle

Bento

Not supported.

In addition, you can use the following environment variables to determine which mode new projects should use by default:

  • NEW_PROJECT_DEFAULT_BUILD_MODE

  • NEW_PROJECT_DEFAULT_SERVE_MODE

For both of these variables, 0 (default) is pickle mode, and 1 is Bento mode. These environment variables will not change the settings of existing projects.

Additional steps for Kubernetes setup

If you’re planning to use Kubernetes to serve your AI models, you will need to set up a Yatai service. This service is responsible for building and serving the AI models, and can be installed in the same Kubernetes cluster as the Konfuzio server or in a different one.

Please follow the instructions for installing yatai, yatai-image-builder and yatai-deployment. Once installation is finished and you’ve created a Yatai user, create an API token from the Yatai web interface: it will be needed in later steps.

Build mode

You can either connect Konfuzio server to Docker and build your Bento archives locally, or defer the building to a remote service.

Local build

Recommended for Docker and Docker Compose setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=local
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket

Built images will be stored inside the specified Docker instance. No image registry is required. Note that this means that with workers running on multiple machines, each machine will have a separate build cache.

When using a dockerized AI model, the Konfuzio server will:

  • Check if a built Docker image exists already in the Docker instance from DOCKER_HOST;

  • If it does not exist, it will build an image for the AI model using the Docker instance from DOCKER_HOST.

Remote build

Recommended for Kubernetes setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=remote

Additionally, you should configure a Yatai service. Once you’ve done so, you can use the provided values in your values.yaml in the Konfuzio k8s charts:

remoteYatai:
  enabled:
    True
  envs:
    YATAI_EMAIL: ""
    YATAI_API_TOKEN: ""
    YATAI_ENDPOINT: ""

When using a dockerized AI model, the Konfuzio server will:

  • Check if a built Docker image exists already in the Yatai service, and if so, pull it;

  • If it does not exist, it will send the .bento file to the Yatai service, which will build it on demand.

Serve mode

You can either connect Konfuzio server to Docker and use it to serve the AI models, or use a Kubernetes cluster to serve them remotely, even in a separate environment.

Local serve

Recommended for Docker and Docker Compose setups.

These environment settings should be added:

BENTO_SERVING_MODE=local
BENTO_SERVING_SHUTDOWN_TIMEOUT=0
DOCKER_HOST="/var/run/docker.sock"  # replace with path to docker socket
DOCKER_NETWORK=mynetwork  # optional, connects the served containers to the specified network

Note that whichever path you use for DOCKER_HOST should be mounted inside the Konfuzio container from the Docker host, and that this gives permissions to the Konfuzio container to start up and shut down Docker containers on the host.

When using a dockerized AI model, the Konfuzio server will:

  • Check that a container running this AI model exists already (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0), and reuse it if this the case;

  • If a container does not exist already, it will start it in the Docker instance from DOCKER_HOST;

  • It will communicate with the Docker instance via HTTP;

  • Once everything is done, it will shutdown the container (if BENTO_SERVING_SHUTDOWN_TIMEOUT == 0).

Remote serve

Recommended for Kubernetes setups.

These environment settings should be added:

BENTO_CONTAINER_BUILD_MODE=remote
BENTO_SERVING_REMOTE_KUBE_CONFIG=/home/.kube/config  # you can set this to the path to the kubeconfig file for a remote Kubernetes cluster. If not set, we will use the "in cluster" configuration

Additionally, you should configure a Yatai service. Once you’ve done so, you can use the provided values in your values.yaml in the Konfuzio k8s charts:

remoteYatai:
  enabled:
    True
  envs:
    YATAI_EMAIL: ""
    YATAI_API_TOKEN: ""
    YATAI_ENDPOINT: ""

When using a dockerized AI model, the Konfuzio server will:

  • Check that a BentoDeployment running this AI model exists already (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0), and reuse it if this the case;

  • If a BentoDeployment does not exist already, it will create its object in the Kubernetes cluster;

  • It will communicate with the pod serving the AI via HTTP;

  • Once everything is done, it will remove the BentoDeployment (if BENTO_SERVING_SHUTDOWN_TIMEOUT == 0).

Additional Kubernetes settings

The “AI model service settings” field, present in the Superuser AI Models section, can be used to pass additional options to the spec field of the BentoDeployment. For example, you can configure your AI model to use more pods or resources by setting it to something like this:

{
  "autoscaling": {
    "maxReplicas": 4,
    "minReplicas": 2,
    "metrics": [
      {
        "type": "Resource",
        "resource": {
          "name": "cpu",
          "target": {
            "type": "Utilization",
            "averageUtilization": 60
          }
        }
      }
    ]
  }
}

Custom service URL

If you have a service conforming to the Konfuzio specification that resides outside the Konfuzio server environment (i.e. a different cloud provider, or even your own computer) you can bypass the Konfuzio deploying mechanism altogether:

  • Create an AI model via the Superuser AI Models interface.

  • Set the AI model service URL field to the publicly accessible URL of the deployed Konfuzio AI model.

The Konfuzio instance will make a POST request to this service, according to the specified request type, and process the response data as it would do within a local/remote service.

Access restrictions in custom services

To prevent unauthorized access to your custom service running outside of Konfuzio, you can use a combination of different strategies:

  1. Firewall your service. Using a reverse proxy on top of it, or a private network, you can restrict access to the IP(s) of the Konfuzio server, and deny all other requests.

  2. Implement custom authentication. By default, project credentials defined in the project settings are passed as HTTP headers in the POST calls made to the service, with the addition of an ENV_ prefix. For example, if you define a project credential called AUTHENTICATION with value of mypassword, the Konfuzio server will add ENV_AUTHENTICATION: mypassword to the headers of all extraction requests to the service. You can then parse these headers directly in your AI service (if using a custom one) or in your reverse proxy to implement custom logic for authenticating these calls.

Cron jobs

A cron job initiated by Celery will check every minute for containers that need to be removed (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0). Make sure that the Celery Beat worker is enabled on your Konfuzio setup.

Project metadata

AI models from the Konfuzio platform get saved with a categories_and_labels_data.json5 file that includes the project metadata the AI model was originally trained on. This file can be used when moving the AI model across environments to ensure that the destination environment contains all the necessary structure for the AI model to work correctly.

This information gets parsed and saved automatically when you upload a .bento file manually on a Konfuzio instance. On the settings for the destination Project, you can enable the Create labels and label sets option to automatically create missing labels and label sets according to the AI’s metadata.

When this option is enabled, existing Labels and Label Sets will be matched with the ones in the AI’s metadata information, and ones that do not exist will automatically be created and associated to the relevant Category/Label Set. The matching happens via name: renamed Labels and Label Sets will not be matched and will be recreated instead. Moreover, information about types and whether Labels can appear multiple times will not be changed if already existing.

Upgrade strategies

If you have an existing Konfuzio installation, when upgrading to VERSION nothing will change by default, and the server will keep using pickle files for building and serving.

However, at a later point in the future, Konfuzio server will stop supporting older Python versions, so these models will become unusable at that point.

We recommend familiarizing yourself with the new concepts and start generating new AI models using Bento already today. In the next sections we will outline some possible strategies to ensure a smooth upgrade. You can also contact us if you wish to discuss additional upgrade support and/or custom migration strategies.

Enable Bento build mode

You can start generating Bento files for new AI models without additional setup, by changing the build mode to Bento and keeping the serve mode as pickle. Bento archives will be generated, but only the pickle file inside of them will be used, mirroring previous behavior.

Once you’ve done the additional setup necessary to run dockerized AIs, you can change your serve mode to Bento and containers will be generated from the previously created Bento archives.

Convert existing AI models

The save_with_bento command line utility can be used to convert existing pickles into Bento AI models, which makes them future-proof for future upgrades. Run python manage.py save_with_bento --help for additional information.

Leave legacy AI models behind

For some very old (pre-Konfuzio SDK v0.2) models, conversion is not possible. In these cases, training new AI models (with Bento build mode enabled) is the recommended option.

Additional resources