{ "cells": [ { "cell_type": "markdown", "id": "c1fe8523", "metadata": {}, "source": [ "## Object Detection\n", "\n", "---\n", "\n", "**Prerequisites:**\n", "\n", "- General understanding of object detection and training of neural networks.\n", "- A COCO formatted object detection dataset.\n", "\n", "**Difficulty:** Hard\n", "\n", "**Goal:** Train, test and optimize an object detection model for a production use case based on a COCO formatted dataset.\n", "\n", "---\n", "\n", "### Environment\n", "To get up and running quickly, you can create a new Google Colab notebook to follow along. \\\n", "\"Create\n", "\n", "As an alternative, you can install and run the code locally or in an environment of your choice.\n", "\n", "### Introduction\n", "\n", "This tutorial provides a step by step guide on how to train an object detection model. 🕵 \\\n", "The focus here is on Document structures but you can use the code to train on a dataset of different domains as well. \n", "\n", "![object-detection-example](object_detection_example.png)\n", "\n", "### Overview 🌐\n", "\n", "We will use the state of the art object detection model [YOLO-NAS](https://konfuzio.com/en/yolo-nas-object-detection-model/). \\\n", "It was developed to incorporate high speed and accuracy, which makes it a very good fit for production use cases. For the training of the model we will use the [`super-gradients`](https://github.com/Deci-AI/super-gradients) library, which is provided by the creators of YOLO-NAS. 😎\n", "\n", "The use case for which we train the model is checkbox detection in form Documents. ☑ \\\n", "You can train the model for another use case, as long as you stick to the COCO dataset format.\n", "\n", "At the end we will export the trained model, so that you can use it in another environment without the training library dependencies.\n", "\n", "Here is an example of what the model will be capable of after training. The color cyan stands for detected checked boxes and the color green for detected empty boxes, while the numbers indicate the detection confidence. 😃\n", "\n", "![Example of checkbox detection - 1](checkbox_example_handwritten_1.png)\n", "\n", "### Dependency installation 💿\n", "\n", "For training and testing the model, the following dependencies are needed." ] }, { "cell_type": "code", "execution_count": null, "id": "452f7c76", "metadata": { "lines_to_next_cell": 0, "tags": [ "remove-output" ] }, "outputs": [], "source": [ "%%bash\n", "pip install -q super-gradients\n", "pip install -q pycocotools\n", "pip install -q onnx\n", "pip install -q onnxruntime" ] }, { "cell_type": "markdown", "id": "1d785145", "metadata": {}, "source": [ "Due to the later export of the model into the ONNX format, the model can be tested and deployed with [ONNX Runtime](https://onnxruntime.ai/docs/) and therefore `super-gradients` and `pycocotools` dependencies are not needed for production.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c3bc4ea1", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%%bash\n", "# This is needed for the Development Center build pipeline\n", "pip install opencv-python-headless==4.8.1.78" ] }, { "cell_type": "markdown", "id": "e9872425", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Imports 🔽\n", "\n", "The following imports are needed to train, export and test the model." ] }, { "cell_type": "code", "execution_count": null, "id": "1eb3b7b7", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# General imports\n", "from pathlib import Path\n", "import datetime\n", "\n", "# Data imports\n", "from super_gradients.training.datasets.detection_datasets.coco_format_detection import COCOFormatDetectionDataset\n", "from super_gradients.training.utils.collate_fn.crowd_detection_collate_fn import CrowdDetectionCollateFN\n", "from super_gradients.training.transforms.transforms import DetectionMosaic, DetectionRandomAffine, DetectionHSV, \\\n", " DetectionHorizontalFlip, DetectionVerticalFlip, DetectionPaddedRescale, DetectionStandardize, DetectionTargetsFormatTransform\n", "from super_gradients.training import dataloaders\n", "from super_gradients.training.datasets.datasets_utils import worker_init_reset_seed\n", "\n", "# Training imports\n", "from super_gradients.training import Trainer\n", "from super_gradients.common.object_names import Models\n", "from super_gradients.training import models\n", "from super_gradients.training.losses import PPYoloELoss\n", "from super_gradients.training.metrics import DetectionMetrics_050\n", "from super_gradients.training.utils.distributed_training_utils import setup_device\n", "from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback\n", "\n", "# Test and export imports\n", "import torch\n", "import torchvision\n", "import onnx\n", "import numpy as np\n", "from PIL import Image\n", "from onnxruntime import InferenceSession" ] }, { "cell_type": "markdown", "id": "8d10c814", "metadata": {}, "source": [ "### Hyperparameter setting ⚙\n", "\n", "First we define an experiment name for the current selection of hyperparameters. This will ensure that once we iterate on different hyperparameters, the experiments are getting saved into separate directories based on their time stamp.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "0065a790", "metadata": {}, "outputs": [], "source": [ "# experiment name definition\n", "t = datetime.datetime.now()\n", "EXPERIEMENT_NAME = f\"{t.year}-{t.month}-{t.day}-{t.hour}-{t.minute}-checkbox-detector\"" ] }, { "cell_type": "markdown", "id": "489703f7", "metadata": {}, "source": [ "We define a set of hyperparameters for the model, the training and the data. You can change them and run multiple training runs and see what works best for you. For that, it is recommended to use some train tracking tool like [wandb](https://wandb.ai/site) or [mlflow](https://mlflow.org/).\n", "\n", "Here is a quick break down of the parameters and why we use them." ] }, { "cell_type": "markdown", "id": "bb09fae9", "metadata": {}, "source": [ "| Name | Value | Description & Purpose | |\n", "| -------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --- |\n", "| MODEL_NAME | \"YOLO_NAS_S\" | Model architecture variant. You can choose between `YOLO_NAS_S`, `YOLO_NAS_M` and `YOLO_NAS_L`. The model architecture differs by its \"size\", S is the **s**mallest while L is the **l**argest version to choose. The choice will effect the latency and performance of the model. [See here](https://docs.deci.ai/super-gradients/latest/YOLONAS.html) for and overview on. | |\n", "| SIZE | 1280 | Image input size of the model (`SIZE`x`SIZE`). The usual input size `SIZE` for the model is `640`, which works for normally sized objects just fine. We use `1280` as input size for the detection of really small checkboxes, feel free to change it back. | |\n", "| WARMUP_INITIAL_LR | 5e-4 | Learning rate which the training starts with. | |\n", "| INITIAL_LR | 1e-3 | Learning rate after `LR_WARMUP_EPOCHS` | |\n", "| COSINE_FINAL_LR_RATIO | 0.01 | Ratio of the learning rate at end of training with respect to the initial learning rate. | |\n", "| ZERO_WEIGHT_DECAY_ON_BIAS_AND_BN | True | Sets weight decay on bias and batch norm to zero. | |\n", "| LR_WARMUP_EPOCHS | 1 | Number of warm-up epochs. | |\n", "| OPTIMIZER_WEIGHT_DECAY | 1e-4 | Weight decay of the optimizer. | |\n", "| EMA | True | Use [Exponential Moving Average (EMA)](https://docs.deci.ai/super-gradients/latest/documentation/source/EMA.html). | |\n", "| EMA_DECAY | 0.9999 | Decay for EMA. | |\n", "| MAX_EPOCHS | 20 | Number of training epochs. | |\n", "| BATCH_SIZE_TRAIN | 2 | Batch size used for training. | |\n", "| BATCH_SIZE_TEST | 6 | Batch size used for validation. | |\n", "| MIXED_PRECISION | True | Use [Automatic Mixed Precision (AMP)](https://docs.deci.ai/super-gradients/latest/documentation/source/average_mixed_precision.html) for improved training. \n", "| IGNORE_EMPTY_ANNOTATIONS | True | If there are train samples without any annotations, they are not used for training.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3e3712c2", "metadata": {}, "outputs": [], "source": [ "# Model params\n", "MODEL_NAME = \"YOLO_NAS_S\"\n", "SIZE = 1280\n", "\n", "# Training params\n", "WARMUP_INITIAL_LR = 5e-4\n", "INITIAL_LR = 1e-3\n", "COSINE_FINAL_LR_RATIO = 0.01\n", "ZERO_WEIGHT_DECAY_ON_BIAS_AND_BN = True\n", "LR_WARMUP_EPOCHS = 1\n", "OPTIMIZER_WEIGHT_DECAY = 1e-4\n", "EMA = True\n", "EMA_DECAY = 0.9999\n", "MAX_EPOCHS = 20\n", "BATCH_SIZE_TRAIN = 2\n", "BATCH_SIZE_TEST = 6\n", "MIXED_PRECISION = True\n", "\n", "# Data params\n", "IGNORE_EMPTY_ANNOTATIONS = True" ] }, { "cell_type": "markdown", "id": "2e8275a0", "metadata": {}, "source": [ "### Dataset and dataloader 🔃\n", "\n", "We need to define which data the model should be trained on. For this tutorial to work, the dataset needs to be formatted in the widely used [COCO format](https://cocodataset.org/#format-data).\n", "Once you have your data ready, you need to adapt the following lines, so that the path to your dataset is defined correctly." ] }, { "cell_type": "code", "execution_count": null, "id": "59023b17", "metadata": {}, "outputs": [], "source": [ "# Base path to dataset\n", "BASE_PATH= Path(\".\").absolute() # modify accordingly\n", "\n", "# Path to specific dataset\n", "TRAIN_FOLDER = \"dataset\"\n", "TRAIN_ANNOTATION_FILE = \"train.json\"\n", "\n", "TEST_FOLDER = \"dataset\"\n", "TEST_ANNOTATION_FILE = \"val.json\"" ] }, { "cell_type": "markdown", "id": "5d450a44", "metadata": {}, "source": [ "Then let's convert the provided information into paths for the dataloader and check if everything is correct." ] }, { "cell_type": "code", "execution_count": null, "id": "251fea86", "metadata": {}, "outputs": [], "source": [ "# train path\n", "train_path = BASE_PATH / TRAIN_FOLDER\n", "train_img_path = train_path / \"images\"\n", "train_ann_path = train_path / TRAIN_ANNOTATION_FILE\n", "\n", "# test path\n", "test_path = BASE_PATH / TEST_FOLDER\n", "test_img_path = test_path / \"images\"\n", "test_ann_path = test_path / TEST_ANNOTATION_FILE\n", "\n", "# checks\n", "assert train_path.exists(), f\"Train path {train_path} does not exist\"\n", "assert train_img_path.exists(), f\"Train image path {train_img_path} does not exist\"\n", "assert train_ann_path.exists(), f\"Train annotation path {train_ann_path} does not exist\"\n", "assert test_path.exists(), f\"Train path {test_path} does not exist\"\n", "assert test_img_path.exists(), f\"Train image path {test_img_path} does not exist\"\n", "assert test_ann_path.exists(), f\"Train annotation path {test_ann_path} does not exist\"" ] }, { "cell_type": "markdown", "id": "591a3a32", "metadata": {}, "source": [ "After we have the data and the path definition, we instantiate the dataset. Most important to point out here are the transforms, which contain image transformation and augmentations of the images before they are passed to the model.\n", "We **recommend not to change** the `DetectionPaddedRescale`, `DetectionStandardize`, `DetectionTargetsFormatTransform` transforms, because the later implementation of the exported model depends on them. \\\n", "However you can of course adapt and change the augmentation transforms `DetectionMosaic`, `DetectionRandomAffine`, `DetectionHSV`, `DetectionHorizontalFlip`, `DetectionVerticalFlip`, according to your use case. The chosen augmentations here are meant to work for the task of object detection for Document structures, specifically checkboxes." ] }, { "cell_type": "code", "execution_count": null, "id": "13801cba", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# train dataset\n", "trainset = COCOFormatDetectionDataset(data_dir=str(train_path),\n", " images_dir=\"\",\n", " json_annotation_file=str(train_ann_path),\n", " input_dim=None,\n", " ignore_empty_annotations=IGNORE_EMPTY_ANNOTATIONS,\n", " transforms=[\n", " DetectionMosaic(prob=1., input_dim=(SIZE, SIZE)),\n", " DetectionRandomAffine(degrees=0.5, scales=(0.9, 1.1), shear=0.0,\n", " target_size=(SIZE, SIZE),\n", " filter_box_candidates=False, border_value=114),\n", " DetectionHSV(prob=1., hgain=1, vgain=6, sgain=6),\n", " DetectionHorizontalFlip(prob=0.5),\n", " DetectionVerticalFlip(prob=0.5),\n", " DetectionPaddedRescale(input_dim=(SIZE, SIZE)),\n", " DetectionStandardize(max_value=255),\n", " DetectionTargetsFormatTransform(input_dim=(SIZE, SIZE),\n", " output_format=\"LABEL_CXCYWH\")\n", " ])\n", "\n", "# validation dataset\n", "valset = COCOFormatDetectionDataset(data_dir=str(test_path),\n", " images_dir=\"\",\n", " json_annotation_file=str(test_ann_path),\n", " input_dim=None,\n", " ignore_empty_annotations=False,\n", " transforms=[\n", " DetectionPaddedRescale(input_dim=(SIZE, SIZE)),\n", " DetectionStandardize(max_value=255),\n", " DetectionTargetsFormatTransform(input_dim=(SIZE, SIZE),\n", " output_format=\"LABEL_CXCYWH\")\n", " ]\n", " )" ] }, { "cell_type": "markdown", "id": "c639d490", "metadata": {}, "source": [ "Based on the dataset we infer the number of object classes the model should detect." ] }, { "cell_type": "code", "execution_count": null, "id": "782d2819", "metadata": {}, "outputs": [], "source": [ "# get number of classes from dataset\n", "num_classes = len(trainset.classes)" ] }, { "cell_type": "markdown", "id": "3de5101e", "metadata": {}, "source": [ "And as a last step we instantiate the dataloader. Pay attention to the `min_samples` with value of `512`. This parameter forces the training, to use at least the `min_samples` number of images for each epoch. This is useful for smaller datasets, as we have it for this showcase. " ] }, { "cell_type": "code", "execution_count": null, "id": "a17c33c8", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# define train and validation loader for konfuzio testing pipeline (minimal batch size and epochs)\n", "\n", "train_loader = dataloaders.get(dataset=trainset, dataloader_params={\n", " \"shuffle\": True,\n", " \"batch_size\": 1,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed,\n", " \"min_samples\": 1\n", "})\n", "\n", "valid_loader = dataloaders.get(dataset=valset, dataloader_params={\n", " \"shuffle\": False,\n", " \"batch_size\": 1,\n", " \"num_workers\": 1,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed\n", "})\n", "\n", "MAX_EPOCHS = 1 # overwrite for test pipeline" ] }, { "cell_type": "code", "execution_count": null, "id": "b26cf13f", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "# train dataloader\n", "train_loader = dataloaders.get(dataset=trainset, dataloader_params={\n", " \"shuffle\": True,\n", " \"batch_size\": BATCH_SIZE_TRAIN,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed,\n", " \"min_samples\": 512\n", "})\n", "\n", "#validation dataloader\n", "valid_loader = dataloaders.get(dataset=valset, dataloader_params={\n", " \"shuffle\": False,\n", " \"batch_size\": BATCH_SIZE_TEST,\n", " \"num_workers\": 2,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed\n", "})" ] }, { "cell_type": "markdown", "id": "67e47c8b", "metadata": {}, "source": [ "### Training 🏋\n", "\n", "Now we prepare the model and training parameters based on the hyperparameters we have chosen previously." ] }, { "cell_type": "code", "execution_count": null, "id": "7324d423", "metadata": {}, "outputs": [], "source": [ "# training parameter definition for trainer\n", "train_params = {\n", " \"warmup_initial_lr\": WARMUP_INITIAL_LR,\n", " \"initial_lr\": INITIAL_LR,\n", " \"lr_mode\": \"cosine\",\n", " \"cosine_final_lr_ratio\": COSINE_FINAL_LR_RATIO,\n", " \"optimizer\": \"AdamW\",\n", " \"zero_weight_decay_on_bias_and_bn\": ZERO_WEIGHT_DECAY_ON_BIAS_AND_BN,\n", " \"lr_warmup_epochs\": LR_WARMUP_EPOCHS,\n", " \"warmup_mode\": \"linear_epoch_step\",\n", " \"optimizer_params\": {\"weight_decay\": OPTIMIZER_WEIGHT_DECAY},\n", " \"ema\": EMA,\n", " \"ema_params\": {\"decay\": EMA_DECAY, \"decay_type\": \"threshold\"},\n", " \"max_epochs\": MAX_EPOCHS,\n", " \"mixed_precision\": MIXED_PRECISION,\n", " \"loss\": PPYoloELoss(use_static_assigner=False, num_classes=num_classes, reg_max=16),\n", " \"valid_metrics_list\": [\n", " DetectionMetrics_050(score_thres=0.1, num_cls=num_classes, normalize_targets=True,\n", " post_prediction_callback=PPYoloEPostPredictionCallback(score_threshold=0.01,\n", " nms_top_k=1000,\n", " max_predictions=300,\n", " nms_threshold=0.7))],\n", " \"metric_to_watch\": 'F1@0.50',\n", " }" ] }, { "cell_type": "markdown", "id": "c9a4a059", "metadata": {}, "source": [ "Definition which model type should be used." ] }, { "cell_type": "code", "execution_count": null, "id": "7dc4d275", "metadata": {}, "outputs": [], "source": [ "# model selection\n", "if MODEL_NAME==\"YOLO_NAS_S\":\n", " model = Models.YOLO_NAS_S\n", "elif MODEL_NAME==\"YOLO_NAS_M\":\n", " model = Models.YOLO_NAS_M\n", "elif MODEL_NAME==\"YOLO_NAS_L\":\n", " model = Models.YOLO_NAS_L" ] }, { "cell_type": "markdown", "id": "94486562", "metadata": {}, "source": [ "Define what device should be used for training. " ] }, { "cell_type": "code", "execution_count": null, "id": "483738d9", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "# select cpu as device for konfuzio test pipeline\n", "setup_device(device=\"cpu\", num_gpus = 0)" ] }, { "cell_type": "code", "execution_count": null, "id": "5559134c", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "# device selection\n", "setup_device(device=\"cuda\")" ] }, { "cell_type": "markdown", "id": "641cb3d7", "metadata": {}, "source": [ "> **Tip** 👍 \\\n", "> Use a GPU for training, processing on CPU will be prohibitively expensive." ] }, { "cell_type": "markdown", "id": "5a4796a9", "metadata": {}, "source": [ "Now we are all set and ready for training! 🚀 \\\n", "So let's define the trainer, the model and start training.." ] }, { "cell_type": "code", "execution_count": null, "id": "df47588a", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# define trainer\n", "trainer = Trainer(experiment_name=EXPERIEMENT_NAME, ckpt_root_dir=\"./checkpoints_dir\")\n", "\n", "# define model\n", "yolo_model = models.get(model, num_classes=num_classes, pretrained_weights=None)\n", "\n", "# start training\n", "trainer.train(model=yolo_model, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)" ] }, { "cell_type": "markdown", "id": "23e04d0b", "metadata": {}, "source": [ "> **Important**❗ \\\n", "> If you want to use the model commercially, you need to ensure that the model for training is not initialized with pretrained weights from `super-gradients`. This is the case if you set `pretrained_weights=None` during instantiation of the super-gradients model." ] }, { "cell_type": "markdown", "id": "457dff02", "metadata": {}, "source": [ "### Export and saving ➡\n", "\n", "After the training has finished, we want to export the saved model into the ONNX format, mainly to be independent of the training library and so that we have our model in a universal format which can be deployed on different platforms easily.\n", "\n", "For that we first set some general parameters like batch size during production and which model we want to use (`latest`, `best`)." ] }, { "cell_type": "code", "execution_count": null, "id": "1826a5d5", "metadata": {}, "outputs": [], "source": [ "# export parameters\n", "BATCH_SIZE = 1\n", "CHANNELS = 3\n", "WEIGHTS_FILE = \"ckpt_best.pth\"\n", "EXPORT_NAME = \"yolo_model.onnx\"" ] }, { "cell_type": "markdown", "id": "1eecdfa3", "metadata": {}, "source": [ "Then we get the path to the checkpoint, which was saved during training." ] }, { "cell_type": "code", "execution_count": null, "id": "7a26f1c2", "metadata": {}, "outputs": [], "source": [ "# get path to trained model\n", "checkpoint_dir = Path(trainer.sg_logger._local_dir).absolute()\n", "checkpoint_path = checkpoint_dir / WEIGHTS_FILE\n", "assert checkpoint_path.exists(), f\"No checkpoint file found in {checkpoint_path}. Check if the train run was successful.\"" ] }, { "cell_type": "markdown", "id": "08eec92a", "metadata": {}, "source": [ "Now we can load and export the model to ONNX." ] }, { "cell_type": "code", "execution_count": null, "id": "aee46c0a", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# load the trained model\n", "yolo_model = models.get(model, num_classes=num_classes, checkpoint_path=str(checkpoint_path))\n", "yolo_model.to(trainer.device)\n", "\n", "# define dummy input\n", "dummy_input = torch.randn(\n", " BATCH_SIZE, CHANNELS, SIZE, SIZE, device=trainer.device\n", ")\n", "\n", "# define input and output names\n", "input_names = [\"input\"]\n", "output_names = [\"output\"]\n", "\n", "# export the model to ONNX format\n", "torch.onnx.export(\n", " yolo_model,\n", " dummy_input,\n", " EXPORT_NAME,\n", " verbose=False,\n", " input_names=input_names,\n", " output_names=output_names,\n", ")\n", "assert Path(EXPORT_NAME).exists(), \"\\nModel export was not successful.\\n\"\n", "\n", "# check the ONNX model\n", "model_onnx = onnx.load(EXPORT_NAME)\n", "onnx.checker.check_model(model_onnx)\n", "\n", "print(\"\\nModel exported to ONNX format.\\n\")" ] }, { "cell_type": "markdown", "id": "b0ffe331", "metadata": {}, "source": [ "### Load and test 🏁\n", "\n", "To use the model in the ONNX format without `super-gradients` we need to define the pre-processing, especially the transforms used during training, the model session and the post-processing like [thresholding](https://data-intelligence.hashnode.dev/threshold-function-explained) and [non-maximum-suppression](https://learnopencv.com/non-maximum-suppression-theory-and-implementation-in-pytorch/) ourselves. This is not needed for testing and experimenting with the model but comes in handy for using the model in production, due to less dependencies, more control as well as optimization in model size and runtime." ] }, { "cell_type": "code", "execution_count": null, "id": "911f5023", "metadata": {}, "outputs": [], "source": [ "class Detector:\n", " \"\"\"Detect checkboxes in images using a pre-trained model.\"\"\"\n", " def __init__(self, onnx_path, input_shape, num_classes, threshold=0.7):\n", " \"\"\"Initialize the CheckboxDetector with a pre-trained model and default parameter.\"\"\"\n", " self.session = InferenceSession(onnx_path)\n", " self.input_shape = input_shape\n", " self.threshold = threshold\n", " self.num_classes = num_classes \n", "\n", " def __call__(self, image):\n", " \"\"\"Run model inference and pre/post processing.\"\"\"\n", " input_image = self._preprocess(image)\n", " outputs = self.session.run(None, {\"input\": input_image})\n", " cls_conf, bboxes = self._postprocess(outputs, image.size)\n", " return cls_conf, bboxes\n", "\n", " def _threshold(self, cls_conf, bboxes):\n", " \"\"\"Filter detections based on confidence threshold.\"\"\"\n", " idx = np.argwhere(cls_conf > self.threshold)\n", " cls_conf = cls_conf[idx[:, 0]]\n", " bboxes = bboxes[idx[:, 0]]\n", " return cls_conf, bboxes\n", "\n", " def _nms(self, cls_conf, bboxes):\n", " \"\"\"Apply Non-Maximum-Suppression to detections.\"\"\"\n", " indices = torchvision.ops.nms(\n", " torch.from_numpy(bboxes),\n", " torch.from_numpy(cls_conf.max(1)),\n", " iou_threshold=0.5,\n", " ).numpy()\n", " cls_conf = cls_conf[indices]\n", " bboxes = bboxes[indices]\n", " return cls_conf, bboxes\n", "\n", " def _rescale(self, image, output_shape):\n", " \"\"\"Rescale image to a specified output shape.\"\"\"\n", " height, width = image.shape[:2]\n", " scale_factor = min(output_shape[0] / height, output_shape[1] / width)\n", " if scale_factor != 1.0:\n", " new_height, new_width = (\n", " round(height * scale_factor),\n", " round(width * scale_factor),\n", " )\n", " image = Image.fromarray(image)\n", " image = image.resize((new_width, new_height), Image.LANCZOS)\n", " image = np.array(image)\n", " return image\n", "\n", " def _bottom_right_pad(self, image, output_shape, pad_value = (114, 114, 114)):\n", " \"\"\"Pad image on the bottom and right to reach the output shape.\"\"\"\n", " height, width = image.shape[:2]\n", " pad_height = output_shape[0] - height\n", " pad_width = output_shape[1] - width\n", "\n", " pad_h = (0, pad_height) # top=0, bottom=pad_height\n", " pad_w = (0, pad_width) # left=0, right=pad_width\n", "\n", " constant_values = ((pad_value, pad_value), (pad_value, pad_value), (0, 0))\n", " constant_values = np.array(constant_values, dtype=np.object_)\n", "\n", " padding_values = (pad_h, pad_w, (0, 0))\n", " processed_image = np.pad(\n", " image,\n", " pad_width=padding_values,\n", " mode=\"constant\",\n", " constant_values=constant_values,\n", " )\n", "\n", " return processed_image\n", "\n", " def _permute(self, image, permutation = (2, 0, 1)):\n", " \"\"\"Permute the image channels.\"\"\"\n", " processed_image = np.ascontiguousarray(image.transpose(permutation))\n", " return processed_image\n", "\n", " def _standardize(self, image, max_value=255):\n", " \"\"\"Standardize the pixel values of image.\"\"\"\n", " processed_image = (image / max_value).astype(np.float32)\n", " return processed_image\n", "\n", " def _preprocess(self, image):\n", " \"\"\"Preprocesses image with all transforms as during training before passing it to the model.\"\"\"\n", " if image.mode == \"P\":\n", " image = image.convert(\"RGB\")\n", " image = np.array(image)[\n", " :, :, ::-1\n", " ] # convert to np and BGR as during training\n", " image = self._rescale(image, output_shape=self.input_shape)\n", " image = self._bottom_right_pad(\n", " image, output_shape=self.input_shape, pad_value=(114, 114, 114)\n", " )\n", " image = self._permute(image, permutation=(2, 0, 1))\n", " image = self._standardize(image, max_value=255)\n", " image = image[np.newaxis, ...] # add batch dimension\n", " return image\n", "\n", " def _postprocess(self, outputs, image_shape):\n", " \"\"\"Postprocesses the model's outputs to obtain final detections.\"\"\"\n", " bboxes = outputs[0][0,:,:]\n", " cls_conf = outputs[1][0,:,:]\n", " cls_conf, bboxes = self._threshold(cls_conf, bboxes)\n", "\n", " if len(cls_conf) > 1:\n", " cls_conf, bboxes = self._nms(cls_conf, bboxes)\n", " #Define and apply scale for the bounding boxes to the original image size\n", " scaler = max(\n", " (\n", " image_shape[1] / self.input_shape[1],\n", " image_shape[0] / self.input_shape[0],\n", " )\n", " )\n", " bboxes *= scaler\n", " bboxes = np.array(\n", " [(int(b[0]), int(b[1]), int(b[2]), int(b[3])) for b in bboxes]\n", " )\n", " return cls_conf, bboxes" ] }, { "cell_type": "markdown", "id": "a452fefb", "metadata": {}, "source": [ "Now we can instantiate a new model/detector based on the above class and the saved ONNX model. 🤖" ] }, { "cell_type": "code", "execution_count": null, "id": "36176b33", "metadata": {}, "outputs": [], "source": [ "# instantiate detector\n", "detector = Detector(onnx_path=EXPORT_NAME, input_shape=(SIZE,SIZE), num_classes=num_classes, threshold=0.7)" ] }, { "cell_type": "markdown", "id": "b4ee27c0", "metadata": {}, "source": [ "To test the checkbox detector we first load an image. 🖼" ] }, { "cell_type": "code", "execution_count": null, "id": "8d03014a", "metadata": {}, "outputs": [], "source": [ "#load test image\n", "sample_img_path = Path(\"./test_image.png\")\n", "assert sample_img_path.exists(), f\"Image file with path {sample_img_path} not found.\"\n", "sample_img = Image.open(str(sample_img_path), mode='r')" ] }, { "cell_type": "markdown", "id": "7c12ee17", "metadata": {}, "source": [ "Then we run inference and apply some pre-processing for the visualization. 🎯" ] }, { "cell_type": "code", "execution_count": null, "id": "e6be6828", "metadata": {}, "outputs": [], "source": [ "# run inference\n", "cls_conf, bboxes = detector(sample_img)\n", "checked = [True if c[0] > c[1] else False for c in cls_conf]\n", "score = cls_conf.max(1)" ] }, { "cell_type": "markdown", "id": "d031c0f8", "metadata": {}, "source": [ "The used visualization function is as follows. " ] }, { "cell_type": "code", "execution_count": null, "id": "f103fa74", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# vizualization\n", "import matplotlib.pyplot as plt\n", "import copy\n", "#%matplotlib inline # comment in if you run in Google Colab\n", "import matplotlib as mpl\n", "from PIL import Image, ImageDraw\n", "mpl.rcParams['figure.dpi']= 600\n", "\n", "colors = [(0,1,0), (0,1,1)]\n", "colorst = [(1,1,0), (1,0,1)]\n", "\n", "def plot_results(pil_img, scores, labels, boxes, name=None):\n", " plt.figure(figsize=(2,1))\n", " fig, ax = plt.subplots(1,2)\n", " ax[0].axis('off')\n", " ax[0].imshow(copy.deepcopy(pil_img))\n", " for score, label, (xmin, ymin, xmax, ymax) in zip(scores, labels, boxes):\n", " ax[1].add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,\n", " fill=False, color=colors[int(label)], linewidth=1))\n", " text = f'{score:0.2f}'\n", " ax[1].text(xmin, ymin, text, fontsize=2,\n", " bbox=dict(alpha=0.0))\n", "\n", " ax[1].axis('off')\n", " ax[1].imshow(pil_img)\n", "\n", " #fig.show() # comment in if you run in Google Colab\n", " fig.savefig(f'{name}')" ] }, { "cell_type": "markdown", "id": "9aabbbb0", "metadata": {}, "source": [ "Finally we run the visualization based on the detectors output. 😃" ] }, { "cell_type": "code", "execution_count": null, "id": "99bf886e", "metadata": { "tags": [ "remove-output" ] }, "outputs": [], "source": [ "# show result\n", "plot_results(sample_img, score, checked, bboxes, \"Example of checkbox detection\")" ] }, { "cell_type": "markdown", "id": "ae7f16d4", "metadata": {}, "source": [ "![Example of checkbox detection - 2](checkbox_example_computer_filled.png)\n", "\n", "### Use Case 🔥\n", "\n", "Now that we have a trained object detection model, which is capable of detecting checkboxes, let's briefly discuss the usecase in form Documents. \\\n", "The information that matters is not just the checkbox and if it is checked or unchecked, but also the related information. The related information (Annotations in blue) can be detected with the default Extraction AI of Konfuzio and then be mapped to its according checkbox by a refined overall distance calculation and the [Hungarian algorithm](https://medium.com/@riya.tendulkar/the-assignment-problem-using-hungarian-algorithm-4f105729af18) (pink line).\n", "\n", "![Example of use case - 1](checkbox_example_use_case.png)" ] }, { "cell_type": "markdown", "id": "0e9c202c", "metadata": {}, "source": [ "### Conclusion 💭\n", "In this tutorial, we have trained, optimized and tested the object detection model YOLO-NAS on a COCO dataset. Below is the full code to accomplish this task:" ] }, { "cell_type": "markdown", "id": "8c8a40ca", "metadata": { "lines_to_next_cell": 0 }, "source": [ "**Dependency installation**" ] }, { "cell_type": "code", "execution_count": null, "id": "3801be5b", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "%%bash\n", "pip install -q super-gradients\n", "pip install -q pycocotools\n", "pip install -q onnx\n", "pip install -q onnxruntime" ] }, { "cell_type": "code", "execution_count": null, "id": "17742a64", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "# General imports\n", "from pathlib import Path\n", "import datetime\n", "\n", "# Data imports\n", "from super_gradients.training.datasets.detection_datasets.coco_format_detection import COCOFormatDetectionDataset\n", "from super_gradients.training.utils.collate_fn.crowd_detection_collate_fn import CrowdDetectionCollateFN\n", "from super_gradients.training.transforms.transforms import DetectionMosaic, DetectionRandomAffine, DetectionHSV, \\\n", " DetectionHorizontalFlip, DetectionVerticalFlip, DetectionPaddedRescale, DetectionStandardize, DetectionTargetsFormatTransform\n", "from super_gradients.training import dataloaders\n", "from super_gradients.training.datasets.datasets_utils import worker_init_reset_seed\n", "\n", "# Training imports\n", "from super_gradients.training import Trainer\n", "from super_gradients.common.object_names import Models\n", "from super_gradients.training import models\n", "from super_gradients.training.losses import PPYoloELoss\n", "from super_gradients.training.metrics import DetectionMetrics_050\n", "from super_gradients.training.utils.distributed_training_utils import setup_device\n", "from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback\n", "\n", "# Test and export imports\n", "import torch\n", "import torchvision\n", "import onnx\n", "import numpy as np\n", "from PIL import Image\n", "from onnxruntime import InferenceSession\n", "\n", "# experiment name definition\n", "t = datetime.datetime.now()\n", "EXPERIEMENT_NAME = f\"{t.year}-{t.month}-{t.day}-{t.hour}-{t.minute}-checkbox-detector\"\n", "\n", "# Model params\n", "MODEL_NAME = \"YOLO_NAS_S\"\n", "SIZE = 1280\n", "\n", "# Training params\n", "WARMUP_INITIAL_LR = 5e-4\n", "INITIAL_LR = 1e-3\n", "COSINE_FINAL_LR_RATIO = 0.01\n", "ZERO_WEIGHT_DECAY_ON_BIAS_AND_BN = True\n", "LR_WARMUP_EPOCHS = 1\n", "OPTIMIZER_WEIGHT_DECAY = 1e-4\n", "EMA = True\n", "EMA_DECAY = 0.9999\n", "MAX_EPOCHS = 20\n", "BATCH_SIZE_TRAIN = 2\n", "BATCH_SIZE_TEST = 6\n", "MIXED_PRECISION = True\n", "\n", "# Data params\n", "IGNORE_EMPTY_ANNOTATIONS = True\n", "\n", "# Base path to dataset\n", "BASE_PATH= Path(\".\").absolute() # modify accordingly\n", "\n", "# Path to specific dataset\n", "TRAIN_FOLDER = \"dataset\"\n", "TRAIN_ANNOTATION_FILE = \"train.json\"\n", "\n", "TEST_FOLDER = \"dataset\"\n", "TEST_ANNOTATION_FILE = \"val.json\"\n", "\n", "# train path\n", "train_path = BASE_PATH / TRAIN_FOLDER\n", "train_img_path = train_path / \"images\"\n", "train_ann_path = train_path / TRAIN_ANNOTATION_FILE\n", "\n", "# test path\n", "test_path = BASE_PATH / TEST_FOLDER\n", "test_img_path = test_path / \"images\"\n", "test_ann_path = test_path / TEST_ANNOTATION_FILE\n", "\n", "# checks\n", "assert train_path.exists(), f\"Train path {train_path} does not exist\"\n", "assert train_img_path.exists(), f\"Train image path {train_img_path} does not exist\"\n", "assert train_ann_path.exists(), f\"Train annotation path {train_ann_path} does not exist\"\n", "assert test_path.exists(), f\"Train path {test_path} does not exist\"\n", "assert test_img_path.exists(), f\"Train image path {test_img_path} does not exist\"\n", "assert test_ann_path.exists(), f\"Train annotation path {test_ann_path} does not exist\"\n", "\n", "# train dataset\n", "trainset = COCOFormatDetectionDataset(data_dir=str(train_path),\n", " images_dir=\"\",\n", " json_annotation_file=str(train_ann_path),\n", " input_dim=None,\n", " ignore_empty_annotations=IGNORE_EMPTY_ANNOTATIONS,\n", " transforms=[\n", " DetectionMosaic(prob=1., input_dim=(SIZE, SIZE)),\n", " DetectionRandomAffine(degrees=0.5, scales=(0.9, 1.1), shear=0.0,\n", " target_size=(SIZE, SIZE),\n", " filter_box_candidates=False, border_value=114),\n", " DetectionHSV(prob=1., hgain=1, vgain=6, sgain=6),\n", " DetectionHorizontalFlip(prob=0.5),\n", " DetectionVerticalFlip(prob=0.5),\n", " DetectionPaddedRescale(input_dim=(SIZE, SIZE)),\n", " DetectionStandardize(max_value=255),\n", " DetectionTargetsFormatTransform(input_dim=(SIZE, SIZE),\n", " output_format=\"LABEL_CXCYWH\")\n", " ])\n", "\n", "# validation dataset\n", "valset = COCOFormatDetectionDataset(data_dir=str(test_path),\n", " images_dir=\"\",\n", " json_annotation_file=str(test_ann_path),\n", " input_dim=None,\n", " ignore_empty_annotations=False,\n", " transforms=[\n", " DetectionPaddedRescale(input_dim=(SIZE, SIZE)),\n", " DetectionStandardize(max_value=255),\n", " DetectionTargetsFormatTransform(input_dim=(SIZE, SIZE),\n", " output_format=\"LABEL_CXCYWH\")\n", " ]\n", " )\n", "\n", "# get number of classes from dataset\n", "num_classes = len(trainset.classes)\n", "\n", "# train dataloader\n", "train_loader = dataloaders.get(dataset=trainset, dataloader_params={\n", " \"shuffle\": True,\n", " \"batch_size\": BATCH_SIZE_TRAIN,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed,\n", " \"min_samples\": 512\n", "})\n", "\n", "#validation dataloader\n", "valid_loader = dataloaders.get(dataset=valset, dataloader_params={\n", " \"shuffle\": False,\n", " \"batch_size\": BATCH_SIZE_TEST,\n", " \"num_workers\": 2,\n", " \"drop_last\": False,\n", " \"pin_memory\": True,\n", " \"collate_fn\": CrowdDetectionCollateFN(),\n", " \"worker_init_fn\": worker_init_reset_seed\n", "})\n", "\n", "# training parameter definition for trainer\n", "train_params = {\n", " \"warmup_initial_lr\": WARMUP_INITIAL_LR,\n", " \"initial_lr\": INITIAL_LR,\n", " \"lr_mode\": \"cosine\",\n", " \"cosine_final_lr_ratio\": COSINE_FINAL_LR_RATIO,\n", " \"optimizer\": \"AdamW\",\n", " \"zero_weight_decay_on_bias_and_bn\": ZERO_WEIGHT_DECAY_ON_BIAS_AND_BN,\n", " \"lr_warmup_epochs\": LR_WARMUP_EPOCHS,\n", " \"warmup_mode\": \"linear_epoch_step\",\n", " \"optimizer_params\": {\"weight_decay\": OPTIMIZER_WEIGHT_DECAY},\n", " \"ema\": EMA,\n", " \"ema_params\": {\"decay\": EMA_DECAY, \"decay_type\": \"threshold\"},\n", " \"max_epochs\": MAX_EPOCHS,\n", " \"mixed_precision\": MIXED_PRECISION,\n", " \"loss\": PPYoloELoss(use_static_assigner=False, num_classes=num_classes, reg_max=16),\n", " \"valid_metrics_list\": [\n", " DetectionMetrics_050(score_thres=0.1, num_cls=num_classes, normalize_targets=True,\n", " post_prediction_callback=PPYoloEPostPredictionCallback(score_threshold=0.01,\n", " nms_top_k=1000,\n", " max_predictions=300,\n", " nms_threshold=0.7))],\n", " \"metric_to_watch\": 'F1@0.50',\n", " }\n", "\n", "# model selection\n", "if MODEL_NAME==\"YOLO_NAS_S\":\n", " model = Models.YOLO_NAS_S\n", "elif MODEL_NAME==\"YOLO_NAS_M\":\n", " model = Models.YOLO_NAS_M\n", "elif MODEL_NAME==\"YOLO_NAS_L\":\n", " model = Models.YOLO_NAS_L\n", "\n", "# device selection\n", "setup_device(device=\"cuda\")\n", "\n", "# define trainer\n", "trainer = Trainer(experiment_name=EXPERIEMENT_NAME, ckpt_root_dir=\"./checkpoints_dir\")\n", "\n", "# define model\n", "yolo_model = models.get(model, num_classes=num_classes, pretrained_weights=None)\n", "\n", "# start training\n", "trainer.train(model=yolo_model, training_params=train_params, train_loader=train_loader, valid_loader=valid_loader)\n", "\n", "# export parameters\n", "BATCH_SIZE = 1\n", "CHANNELS = 3\n", "WEIGHTS_FILE = \"ckpt_best.pth\"\n", "EXPORT_NAME = \"yolo_model.onnx\"\n", "\n", "# get path to trained model\n", "checkpoint_dir = Path(trainer.sg_logger._local_dir).absolute()\n", "checkpoint_path = checkpoint_dir / WEIGHTS_FILE\n", "assert checkpoint_path.exists(), f\"No checkpoint file found in {checkpoint_path}. Check if the train run was successful.\"\n", "\n", "# load the trained model\n", "yolo_model = models.get(model, num_classes=num_classes, checkpoint_path=str(checkpoint_path))\n", "yolo_model.to(trainer.device)\n", "\n", "# define dummy input\n", "dummy_input = torch.randn(\n", " BATCH_SIZE, CHANNELS, SIZE, SIZE, device=trainer.device\n", ")\n", "\n", "# define input and output names\n", "input_names = [\"input\"]\n", "output_names = [\"output\"]\n", "\n", "# export the model to ONNX format\n", "torch.onnx.export(\n", " yolo_model,\n", " dummy_input,\n", " EXPORT_NAME,\n", " verbose=False,\n", " input_names=input_names,\n", " output_names=output_names,\n", ")\n", "assert Path(EXPORT_NAME).exists(), \"\\nModel export was not successful.\\n\"\n", "\n", "# check the ONNX model\n", "model_onnx = onnx.load(EXPORT_NAME)\n", "onnx.checker.check_model(model_onnx)\n", "\n", "print(\"\\nModel exported to ONNX format.\\n\")\n", "\n", "class Detector:\n", " \"\"\"Detect checkboxes in images using a pre-trained model.\"\"\"\n", " def __init__(self, onnx_path, input_shape, num_classes, threshold=0.7):\n", " \"\"\"Initialize the CheckboxDetector with a pre-trained model and default parameter.\"\"\"\n", " self.session = InferenceSession(onnx_path)\n", " self.input_shape = input_shape\n", " self.threshold = threshold\n", " self.num_classes = num_classes \n", "\n", " def __call__(self, image):\n", " \"\"\"Run model inference and pre/post processing.\"\"\"\n", " input_image = self._preprocess(image)\n", " outputs = self.session.run(None, {\"input\": input_image})\n", " cls_conf, bboxes = self._postprocess(outputs, image.size)\n", " return cls_conf, bboxes\n", "\n", " def _threshold(self, cls_conf, bboxes):\n", " \"\"\"Filter detections based on confidence threshold.\"\"\"\n", " idx = np.argwhere(cls_conf > self.threshold)\n", " cls_conf = cls_conf[idx[:, 0]]\n", " bboxes = bboxes[idx[:, 0]]\n", " return cls_conf, bboxes\n", "\n", " def _nms(self, cls_conf, bboxes):\n", " \"\"\"Apply Non-Maximum-Suppression to detections.\"\"\"\n", " indices = torchvision.ops.nms(\n", " torch.from_numpy(bboxes),\n", " torch.from_numpy(cls_conf.max(1)),\n", " iou_threshold=0.5,\n", " ).numpy()\n", " cls_conf = cls_conf[indices]\n", " bboxes = bboxes[indices]\n", " return cls_conf, bboxes\n", "\n", " def _rescale(self, image, output_shape):\n", " \"\"\"Rescale image to a specified output shape.\"\"\"\n", " height, width = image.shape[:2]\n", " scale_factor = min(output_shape[0] / height, output_shape[1] / width)\n", " if scale_factor != 1.0:\n", " new_height, new_width = (\n", " round(height * scale_factor),\n", " round(width * scale_factor),\n", " )\n", " image = Image.fromarray(image)\n", " image = image.resize((new_width, new_height), Image.LANCZOS)\n", " image = np.array(image)\n", " return image\n", "\n", " def _bottom_right_pad(self, image, output_shape, pad_value = (114, 114, 114)):\n", " \"\"\"Pad image on the bottom and right to reach the output shape.\"\"\"\n", " height, width = image.shape[:2]\n", " pad_height = output_shape[0] - height\n", " pad_width = output_shape[1] - width\n", "\n", " pad_h = (0, pad_height) # top=0, bottom=pad_height\n", " pad_w = (0, pad_width) # left=0, right=pad_width\n", "\n", " constant_values = ((pad_value, pad_value), (pad_value, pad_value), (0, 0))\n", " constant_values = np.array(constant_values, dtype=np.object_)\n", "\n", " padding_values = (pad_h, pad_w, (0, 0))\n", " processed_image = np.pad(\n", " image,\n", " pad_width=padding_values,\n", " mode=\"constant\",\n", " constant_values=constant_values,\n", " )\n", "\n", " return processed_image\n", "\n", " def _permute(self, image, permutation = (2, 0, 1)):\n", " \"\"\"Permute the image channels.\"\"\"\n", " processed_image = np.ascontiguousarray(image.transpose(permutation))\n", " return processed_image\n", "\n", " def _standardize(self, image, max_value=255):\n", " \"\"\"Standardize the pixel values of image.\"\"\"\n", " processed_image = (image / max_value).astype(np.float32)\n", " return processed_image\n", "\n", " def _preprocess(self, image):\n", " \"\"\"Preprocesses image with all transforms as during training before passing it to the model.\"\"\"\n", " if image.mode == \"P\":\n", " image = image.convert(\"RGB\")\n", " image = np.array(image)[\n", " :, :, ::-1\n", " ] # convert to np and BGR as during training\n", " image = self._rescale(image, output_shape=self.input_shape)\n", " image = self._bottom_right_pad(\n", " image, output_shape=self.input_shape, pad_value=(114, 114, 114)\n", " )\n", " image = self._permute(image, permutation=(2, 0, 1))\n", " image = self._standardize(image, max_value=255)\n", " image = image[np.newaxis, ...] # add batch dimension\n", " return image\n", "\n", " def _postprocess(self, outputs, image_shape):\n", " \"\"\"Postprocesses the model's outputs to obtain final detections.\"\"\"\n", " bboxes = outputs[0][0,:,:]\n", " cls_conf = outputs[1][0,:,:]\n", " cls_conf, bboxes = self._threshold(cls_conf, bboxes)\n", "\n", " if len(cls_conf) > 1:\n", " cls_conf, bboxes = self._nms(cls_conf, bboxes)\n", " #Define and apply scale for the bounding boxes to the original image size\n", " scaler = max(\n", " (\n", " image_shape[1] / self.input_shape[1],\n", " image_shape[0] / self.input_shape[0],\n", " )\n", " )\n", " bboxes *= scaler\n", " bboxes = np.array(\n", " [(int(b[0]), int(b[1]), int(b[2]), int(b[3])) for b in bboxes]\n", " )\n", " return cls_conf, bboxes\n", "\n", "# instantiate detector\n", "detector = Detector(onnx_path=EXPORT_NAME, input_shape=(SIZE,SIZE), num_classes=num_classes, threshold=0.7)\n", "\n", "#load test image\n", "sample_img_path = Path(\"./test_image.png\")\n", "assert sample_img_path.exists(), f\"Image file with path {sample_img_path} not found.\"\n", "sample_img = Image.open(str(sample_img_path), mode='r')\n", "\n", "# run inference\n", "cls_conf, bboxes = detector(sample_img)\n", "checked = [True if c[0] > c[1] else False for c in cls_conf]\n", "score = cls_conf.max(1)\n", "\n", "# vizualization\n", "import matplotlib.pyplot as plt\n", "import copy\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "from PIL import Image, ImageDraw\n", "mpl.rcParams['figure.dpi']= 600\n", "\n", "colors = [(0,1,0), (0,1,1)]\n", "colorst = [(1,1,0), (1,0,1)]\n", "\n", "def plot_results(pil_img, scores, labels, boxes, name=None):\n", " plt.figure(figsize=(2,1))\n", " fig, ax = plt.subplots(1,2)\n", " ax[0].axis('off')\n", " ax[0].imshow(copy.deepcopy(pil_img))\n", " for score, label, (xmin, ymin, xmax, ymax) in zip(scores, labels, boxes):\n", " ax[1].add_patch(plt.Rectangle((xmin, ymin), xmax - xmin, ymax - ymin,\n", " fill=False, color=colors[int(label)], linewidth=1))\n", " text = f'{score:0.2f}'\n", " ax[1].text(xmin, ymin, text, fontsize=2,\n", " bbox=dict(alpha=0.0))\n", "\n", " ax[1].axis('off')\n", " ax[1].imshow(pil_img)\n", "\n", " #fig.show() # comment in if you run in Google Colab\n", " fig.savefig(f'{name}')\n", "\n", "# show result\n", "plot_results(sample_img, score, checked, bboxes, \"Example of checkbox detection\")" ] }, { "cell_type": "markdown", "id": "c05aa941", "metadata": {}, "source": [ "### What's next? ⏭\n", "\n", "- [Learn how to upload a custom AI](https://dev.konfuzio.com/sdk/tutorials/upload-your-ai/index.html)\n", "- [Get to know how to create any custom Extraction AI](https://dev.konfuzio.com/sdk/tutorials/information_extraction/index.html#train-a-custom-date-extraction-ai)" ] }, { "cell_type": "markdown", "id": "24056418", "metadata": { "lines_to_next_cell": 0 }, "source": [ "" ] }, { "cell_type": "code", "execution_count": null, "id": "bb931c1c", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import subprocess\n", "\n", "subprocess.run([\"rm\", \"./Example of checkbox detection.png\"])\n", "subprocess.run([\"rm\", EXPORT_NAME])\n", "subprocess.run([\"rm\", \"-r\", \"./checkpoints_dir\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }