Containerized AI models in Konfuzio
Konfuzio version released-2024-08-11_09-56-15 introduces containerized AI models, thanks to the underlying open source BentoML framework. While from a user point of view this change is transparent, developers and on-premise users should be aware of the differences and implications of this change.
Motivation
Compared to pickle files, containerization offers several advantages when deploying your AI models:
Isolation: AI models are run in separate containers, ensuring that they do not interfere with each other or with the Konfuzio environment.
Scalability: Containers can be easily scaled up or down, depending on the load.
Portability: Containers bundle their dependencies and their original project metadata, ensuring that they can be moved across environments without issues.
Consistency: The AI model is run in the same environment it was trained in, ensuring that the results are consistent.
Security: Containers can be run in a secure environment, with access controls and other security measures in place.
Resource management: Containers can be run with resource limits, ensuring that they do not consume too many resources.
Containerized AI models can be run either with standard Docker Compose setups or in a Kubernetes cluster. Some parts of this documentation will be specific to Kubernetes setups, but the general principles apply to both setups.
See the Konfuzio blog for a more detailed overview.
Overview of the Bento format
Archives
AI models produced with an SDK version higher than 0.3.12 and a Konfuzio Server version higher than released-2024-08-11_09-56-15 can produce a .bento
file instead of a .pkl
file. This is a compressed archive that internally still includes the .pkl
file, together with other files:
bento.yaml: metadata for the bento archive.
openapi.yaml: machine parsable API specification for the AI service.
env/docker/: Dockerfile and entrypoint to build the AI model into a Docker image.
env/python/: Python version and dependency versions.
models/: Contains the Python pickle file(s) for the AI model.
src/categories_and_labels_data.json5: Metadata for the project this AI was trained on.
src/extraction/: Python code and utility functions that exposes the AI model as a REST API service.
When uploading or downloading AI models between instances or environments, .bento
file are now the default.
You can find more information about the creation and serving of .bento
files from the Konfuzio SDK in the Konfuzio SDK documentation.
Containers
.bento
archives include a generated Dockerfile
that is used to create a Docker image from the AI model.
Kubernetes only: The Konfuzio AI Deployment Operator automatically builds .bento
archives into Docker containers, but this operation can also be done locally starting from the .bento
file, running bentoml containerize
(see docs).
Image storage (such as a Docker registry) is treated similarly to a cache layer: Bento images are built on demand from .bento
files, but Konfuzio server will try pulling a pre-built image from the registry (for Kubernetes deployments) or reusing an already built one (in case of Docker deployments) before resorting to building a .bento
.
Services
Since the Konfuzio server and the AI models no longer share a runtime or code, communication between them happens via REST API, with predictable JSON requests and predictable JSON responses.
A full specification of available input/output interfaces for Konfuzio AI models can be found in our documentation.
Custom AI models running arbitrary code (even non-Python code) can be created as long as they expose an HTTP service that conforms to one of the interfaces supported by Konfuzio.
Operator
Kubernetes only: The Konfuzio AI Deployment Operator is a Kubernetes operator that manages the lifecycle of BentoML deployments in a Kubernetes cluster. It is responsible for creating, updating, and deleting BentoML deployments, as well as monitoring their health and status.
It’s composed of:
KonfuzioAIDeployment: a custom resource definition (CRD) that defines the desired state of a BentoML deployment.
Controller: a pod that watches for changes to KonfuzioAIDeployment resources and takes action to ensure that the actual state of the deployment matches the desired state.
Builder: a component that builds the Docker image on demand for the deployment from the
.bento
file.Buildkit Daemon: a component that runs the build process in a separate container, allowing for better resource management and isolation.
When a KonfuzioAIDeployment
resource is created, the operator will:
Check if the Docker image for the deployment already exists in the registry.
If it does not exist, it will build the image from the
.bento
file using the buildkit daemon, and push it to the registry.Once the image is available, the operator will create a Kubernetes deployment for the BentoML service, using the image and the configuration specified in the KonfuzioAIDeployment resource.
The operator will monitor the health and status of the deployment, and take action if it becomes unhealthy or fails.
If the deployment is deleted, usually because of inactivity, the operator will remove the associated Kubernetes resources.
If the deployment is updated, the operator will stop the old deployment and create a new one with the updated configuration.
In a Kubernetes setup, the Konfuzio server will communicate with the operator to create and manage the deployments. The operator will handle the details of building and deploying the Docker images.
The following diagram shows the architecture of the Konfuzio AI Deployment Operator:
..mermaid:
flowchart TB
classDef server stroke-width:2px,font-weight:bold
classDef container stroke-width:1px,shape:subprocess
classDef process stroke-width:1px
classDef decision stroke-width:1px,shape:diamond
classDef database stroke-width:1px,shape:cylinder
classDef endpoint stroke-width:1px,shape:rect
subgraph Konfuzio["Konfuzio Server"]
Z["Document"]@{ shape: doc } --> A
A[("Konfuzio Server")]:::server --> A1
A1{"AI deployment<br>exists in cluster?"}:::decision
end
subgraph Kubernetes["Kubernetes Cluster"]
B["KonfuzioAIDeployment<br>of mymodel.bento"]:::process
B1["Operator Instance"]@{ shape: cylinder }
C["Operator detects<br>new deployment"]:::process --> D
D{"mymodel.bento<br>exists in registry?"}:::decision
subgraph Builder["Builder"]
E["Builder Instance"]@{ shape: cylinder } --> F1
F1["Download .bento file"]:::process --> F2
F2["Build Docker image"]:::process --> F3
F3["Push image"]:::process
end
G["Operator<br>deploys AI Model"]:::process --> H
H["Pull image from<br>Docker registry"]:::process --> I
subgraph AI["AI deployment"]
I["Pod with AI Model"]@{ shape: cylinder }
J["Receive HTTP request<br>with document data"]:::process --> K
K["AI<br>extraction<br>process"]@{ shape: extract } --> L
L["Return AI predictions<br>as JSON response"]:::process
end
N{"More documents<br>to process?"}:::decision
O["Wait for timeout"]@{ shape: delay }
M["Resource<br>Termination"]@{ shape: double-circle }
end
subgraph Registry["Container Registry"]
REGISTRY[("Docker Registry")]:::database
end
A1 -->|"No"| B
A1 -->|"Yes"| I
B --> C
B1 -.- C
B1 -.- G
D -->|"Yes"| G
D -->|"No"| E
F3 --> REGISTRY
F3 --> G
REGISTRY --> H
I --> J
L --> A
L --> N
N -->|"Yes"| Z
N -->|"No"| O
O --> M
A --> J
Setup
Containerized functionality is currently experimental and can be enabled per-project by an administrator in the Superuser Project section of the interface. The default mode for both bulding and serving is “Python Object” (same as before); to enable “Containerized” mode additional setup is required (see the following sections).
“Containerized” mode can be enabled separately for building and serving AI models:
Build mode |
Serve mode |
Notes |
---|---|---|
Python Object |
Python Object |
Same as before: training generates a pickle file which is used directly when using the AI model. |
Containerized |
Containerized |
Training generates a .bento file which is containerized on demand when using the AI model. |
Containerized |
Python Object |
Training generates a .bento file; when using the AI model, the pickle file contained inside the .bento is used directly, mimicking previous behavior. |
Python Object |
Containerized |
Not supported. |
In addition, you can use the following environment variables to determine which mode new projects should use by default:
NEW_PROJECT_DEFAULT_BUILD_MODE
NEW_PROJECT_DEFAULT_SERVE_MODE
For both of these variables, 0
(default) is Python Object mode, and 1
is Containerized mode. These environment variables will not change the settings of existing projects.
Additional steps for Kubernetes setup
Kubernetes only: If you’re planning to use Kubernetes to serve your AI models, you will need to set up the Konfuzio AI Deployment Operator. This service is responsible for building and serving the AI models, and can be installed in the same Kubernetes cluster as the Konfuzio server or in a different one.
The operator is installed through the Konfuzio server helm chart, and can be enabled by setting the enableOperator
value to true
in the values.yaml
file. This will install the operator in the same namespace as the Konfuzio server.
Build mode
You can either connect Konfuzio server to Docker and build your Bento archives locally, or defer the building to a remote service.
Local build
Recommended for Docker and Docker Compose setups.
These environment settings should be added:
BENTO_CONTAINER_BUILD_MODE=local
DOCKER_HOST="/var/run/docker.sock" # replace with path to docker socket
Built images will be stored inside the specified Docker instance. No image registry is required. Note that this means that with workers running on multiple machines, each machine will have a separate build cache.
When using a containerized AI model, the Konfuzio server will:
Check if a built Docker image exists already in the Docker instance from
DOCKER_HOST
;If it does not exist, it will build an image for the AI model using the Docker instance from
DOCKER_HOST
.
Remote build
Recommended for Kubernetes setups.
These environment settings should be added:
BENTO_CONTAINER_BUILD_MODE=remote
Additionally, you should configure the Konfuzio AI Deployment Operator. The Konfuzio server will automatically use the operator to build and serve the images.
Serve mode
You can either connect Konfuzio server to Docker and use it to serve the AI models, or use a Kubernetes cluster to serve them remotely, even in a separate environment.
Local serve
Recommended for Docker and Docker Compose setups.
These environment settings should be added:
BENTO_SERVING_MODE=local
BENTO_SERVING_SHUTDOWN_TIMEOUT=0
DOCKER_HOST="/var/run/docker.sock" # replace with path to docker socket
DOCKER_NETWORK=mynetwork # optional, connects the served containers to the specified network
Note that whichever path you use for DOCKER_HOST
should be mounted inside the Konfuzio container from the Docker host, and that this gives permissions to the Konfuzio container to start up and shut down Docker containers on the host.
When using a containerized AI model, the Konfuzio server will:
Check that a container running this AI model exists already (if
BENTO_SERVING_SHUTDOWN_TIMEOUT > 0
), and reuse it if this the case;If a container does not exist already, it will start it in the Docker instance from
DOCKER_HOST
;It will communicate with the Docker instance via HTTP;
Once everything is done, it will shutdown the container (if
BENTO_SERVING_SHUTDOWN_TIMEOUT == 0
).
Remote serve
Recommended for Kubernetes setups.
These environment settings should be added:
BENTO_CONTAINER_BUILD_MODE=remote
BENTO_SERVING_REMOTE_KUBE_CONFIG=/home/.kube/config # you can set this to the path to the kubeconfig file for a remote Kubernetes cluster. If not set, we will use the "in cluster" configuration
Additionally, you should configure the Konfuzio AI Deployment Operator. The Konfuzio server will automatically use the operator to build and serve the images.
Additional Kubernetes settings
The “AI model service settings” field, present in the Superuser AI Models section, can be used to pass additional options to the spec field of the KonfuzioAIDeployment. For example, you can configure your AI model to use more pods or resources by setting it to something like this:
{
"autoscaling": {
"maxReplicas": 4,
"minReplicas": 2,
"metrics": [
{
"type": "Resource",
"resource": {
"name": "cpu",
"target": {
"type": "Utilization",
"averageUtilization": 60
}
}
}
]
},
"resources": {
"limits": {
"memory": "8Gi"
},
"requests": {
"memory": "1Gi",
"cpu": "200m"
}
}
}
Strategies for starting and shutting down containers
In both local and remote serve, a strategy can be chosen on the AI model level to determine when and whether the relevant container is shutdown:
Shuts down after timeout (
TIMEOUT
): follows theBENTO_SERVING_SHUTDOWN_TIMEOUT
environment variable, which determines how long the container should be kept alive after the last request, unlessBENTO_SERVING_SHUTDOWN_TIMEOUT
is 0, in which case it is the same asIMMEDIATE
.Shuts down immediately after processing (
IMMEDIATE
): the container is shut down immediately after the request is processed, regardless ofBENTO_SERVING_SHUTDOWN_TIMEOUT
setting. This means that if there are n concurrent extractions on the same AI model, n independent containers will start up and shut down once the extraction is done. This is functionally equivalent to the previous behavior of the Konfuzio server in Python Object mode.Always online (
ALWAYS_ON
): the container will not be shut down after the request is processed, and will be kept alive indefinitely.
The Konfuzio server always checks if a container exists before starting a new one for TIMEOUT
and ALWAYS_ON
, and will reuse it if it does. IMMEDIATE
will always start a new container.
(Kubernetes only) The number of containers for the same AI model can be controlled by the maxReplicas
and minReplicas
fields in the autoscaling
section of the AI model service settings.
Changing the strategy setting on an AI model, making it inactive, or deleting it, will stop all running containers for that AI model.
Custom service URL
If you have a service conforming to the Konfuzio specification that resides outside the Konfuzio server environment (i.e. a different cloud provider, or even your own computer) you can bypass the Konfuzio deploying mechanism altogether:
Create an AI model via the Superuser AI Models interface.
Set the AI model service URL field to the publicly accessible URL of the deployed Konfuzio AI model.
The Konfuzio instance will make a POST request to this service, according to the specified request type, and process the response data as it would do within a local/remote service.
Access restrictions in custom services
To prevent unauthorized access to your custom service running outside of Konfuzio, you can use a combination of different strategies:
Firewall your service. Using a reverse proxy on top of it, or a private network, you can restrict access to the IP(s) of the Konfuzio server, and deny all other requests.
Implement custom authentication. By default, project credentials defined in the project settings are passed as HTTP headers in the POST calls made to the service, with the addition of an
ENV_
prefix. For example, if you define a project credential calledAUTHENTICATION
with value ofmypassword
, the Konfuzio server will addENV_AUTHENTICATION: mypassword
to the headers of all extraction requests to the service. You can then parse these headers directly in your AI service (if using a custom one) or in your reverse proxy to implement custom logic for authenticating these calls.
Cron jobs
A cron job initiated by Celery will check every minute for containers that need to be removed (if BENTO_SERVING_SHUTDOWN_TIMEOUT > 0
). Make sure that the Celery Beat worker is enabled on your Konfuzio setup.
Project metadata
AI models from the Konfuzio platform get saved with a categories_and_labels_data.json5
file that includes the project metadata the AI model was originally trained on. This file can be used when moving the AI model across environments to ensure that the destination environment contains all the necessary structure for the AI model to work correctly.
This information gets parsed and saved automatically when you upload a .bento
file manually on a Konfuzio instance. On the settings for the destination Project, you can enable the Create labels and label sets option to automatically create missing labels and label sets according to the AI’s metadata.
When this option is enabled, existing Labels and Label Sets will be matched with the ones in the AI’s metadata information, and ones that do not exist will automatically be created and associated to the relevant Category/Label Set. The matching happens via name: renamed Labels and Label Sets will not be matched and will be recreated instead. Moreover, information about types and whether Labels can appear multiple times will not be changed if already existing.
The flowchart below shows how the project metadata is used to create the necessary structure for the AI model to work correctly.
Upgrade strategies
If you have an existing Konfuzio installation, when upgrading to VERSION nothing will change by default, and the server will keep using pickle files for building and serving.
However, at a later point in the future, Konfuzio server will stop supporting older Python versions, so these models will become unusable at that point.
We recommend familiarizing yourself with the new concepts and start generating new AI models using Bento already today. In the next sections we will outline some possible strategies to ensure a smooth upgrade. You can also contact us if you wish to discuss additional upgrade support and/or custom migration strategies.
Enable Containerized build mode
You can start generating Bento files for new AI models without additional setup, by changing the build mode to Containerized and keeping the serve mode as Python Object. Bento archives will be generated, but only the pickle file inside of them will be used, mirroring previous behavior.
Once you’ve done the additional setup necessary to run containerized AIs, you can change your serve mode to Bento and containers will be generated from the previously created Bento archives.
Convert existing AI models
The save_with_bento
command line utility can be used to convert existing pickles into Containerized AI models, which makes them future-proof for future upgrades. Run python manage.py save_with_bento --help
for additional information.
Leave legacy AI models behind
For some very old (pre-Konfuzio SDK v0.2) models, conversion is not possible. In these cases, training new AI models (with Containerized build mode enabled) is the recommended option.