{ "cells": [ { "cell_type": "markdown", "id": "cb32f634", "metadata": {}, "source": [ "## Barcode scanner\n", "\n", "---\n", "\n", "**Prerequisites:** \n", "\n", "- Data Layer concepts of Konfuzio: Project, Annotation, Document, Span, Bbox\n", "- AI Layer concepts of Konfuzio: Information Extraction\n", "- `zxing-cpp` installed\n", "\n", "**Difficulty:** Medium\n", "\n", "**Goal:** Create a custom Extraction AI that is able to extract barcodes from the Documents.\n", "\n", "---\n", "\n", "### Environment\n", "You need to install the Konfuzio SDK before diving into the tutorial. \\\n", "To get up and running quickly, you can use our Colab Quick Start notebook. \\\n", "\"Open\n", "\n", "As an alternative you can follow the [installation section](../get_started.html#install-sdk) to install and initialize the Konfuzio SDK locally or on an environment of your choice.\n", "\n", "### Introduction\n", "\n", "In this tutorial, we'll walk through the creation of a Barcode Extraction AI using the `zxing-cpp` library. We'll implement a `BarcodeAnnotation` class, an Extraction AI logic, and integrate the AI with the SDK. This AI will be able to detect barcodes from Documents and generate bounding box Annotations for the detected barcodes.\n", "\n", "The final result on the [DVUI](https://dev.konfuzio.com/dvui/index.html#what-is-the-konfuzio-document-validation-ui) would look like this:\n", "\n", "![Result](barcode_scanner_example.png)\n", "\n", "### Setting up the BarcodeAnnotation class\n", "\n", "The first step is to create a BarcodeAnnotation class that inherits from [Annotation](https://dev.konfuzio.com/sdk/sourcecode.html?highlight=annotation#annotation). This is needed because Annotation class is based on Spans and its Bounding Boxes are computed using these Spans.\n", "\n", "In our case, we want to use custom Bounding Boxes that are computed using the `zxing-cpp` library, which is why we need to override the `bboxes` property of the Annotation class to return our custom Bounding Boxes. This will later be used by the Server as well as the DVUI to annotate the barcodes in the Document.\n", "\n", "The difference between Span-based `bboxes` (dashed-line boxes) and custom `bboxes` (yellow box) is illustrated in the following image:\n", "\n", "![Example](barcode_example.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "3a5cfceb", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "from typing import Dict, List\n", "from konfuzio_sdk.data import Annotation\n", "\n", "\n", "class BarcodeAnnotation(Annotation):\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(*args, **kwargs)\n", " self.custom_bboxes = kwargs.get(\"custom_bboxes\", [])\n", "\n", " @property\n", " def bboxes(self) -> List[Dict]:\n", " return self.custom_bboxes" ] }, { "cell_type": "markdown", "id": "90a2d8aa", "metadata": {}, "source": [ "### Define BarcodeExtractionAI class \n", "\n", "The second step is to create a custom [Extraction AI](https://dev.konfuzio.com/sdk/sourcecode.html#extraction-ai) class that uses the `CustomAnnotation` class. Firstly, we'll define a method to extract Bounding Boxes and barcode text from an image using the `zxing-cpp` library.\n", "Inside this method, we loop through the results returned by `zxingcpp` and extract the raw bounding boxes, which then are transformed into Bboxes native to the Konfuzio SDK format." ] }, { "cell_type": "code", "execution_count": null, "id": "b332d145", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ], "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "def get_bboxes_from_image(self, image, page_index):\n", " from zxingcpp import read_barcodes\n", "\n", " bboxes_list = []\n", " barcodes_lib_results = read_barcodes(image)\n", "\n", " for result in barcodes_lib_results:\n", " position = str(result.position).rstrip(\"\\x00\")\n", " barcode_text_value = str(result.text).rstrip(\"\\x00\")\n", " top_right = position.split()[1].split(\"x\")\n", " bottom_left = position.split()[-1].split(\"x\")\n", " bbox_dict = self.get_bbox_dict_from_coords(\n", " top_right, bottom_left, page_index, image, barcode_text_value\n", " )\n", " bboxes_list.append(bbox_dict)\n", " return bboxes_list" ] }, { "cell_type": "markdown", "id": "68b243c6", "metadata": {}, "source": [ "Next, we'll implement the method to create the Bbox dictionary from the output of `zxing-cpp`. Inside this method, we use the coordinates of the Bboxes to create the Bbox dictionaries which later will be used for positioning the `CustomAnnotation` instances." ] }, { "cell_type": "code", "execution_count": null, "id": "d4e4864f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "def get_bbox_dict_from_coords(\n", " top_right, bottom_left, page_index, image, barcode_text_value\n", "):\n", " x1 = int(top_right[0])\n", " y1 = int(top_right[1])\n", " x0 = int(bottom_left[0])\n", " y0 = int(bottom_left[1])\n", "\n", " top = y1\n", " bottom = y0\n", "\n", " temp_y0 = image.height - y0\n", " temp_y1 = image.height - y1\n", " y0 = temp_y0\n", " y1 = temp_y1\n", "\n", " bbox_dict = {\n", " \"x0\": x0,\n", " \"x1\": x1,\n", " \"y0\": y0,\n", " \"y1\": y1,\n", " \"top\": top,\n", " \"bottom\": bottom,\n", " \"page_index\": page_index,\n", " \"offset_string\": barcode_text_value,\n", " \"custom_offset_string\": True,\n", " }\n", " return bbox_dict" ] }, { "cell_type": "markdown", "id": "83efd2a6", "metadata": {}, "source": [ "In case the machine which the AI will run on does not have necessary dependencies installed, we'll implement the method to install the `zxing-cpp` library." ] }, { "cell_type": "code", "execution_count": null, "id": "d5a0ae7f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "def install_dependencies():\n", " try:\n", " import subprocess\n", "\n", " package_name = \"zxing-cpp\"\n", " subprocess.check_call([\"pip\", \"install\", package_name])\n", " print(f\"The package {package_name} is ready to be used.\")\n", " except:\n", " raise Exception(\n", " \"An error occured while installing the zxing-cpp library. Please install it manually.\"\n", " )" ] }, { "cell_type": "markdown", "id": "e3498d98", "metadata": {}, "source": [ "Lastly, we'll create a function that will be used to check if the Extraction AI is ready. This is needed by the server to know when to start the Extraction process." ] }, { "cell_type": "code", "execution_count": null, "id": "076c0cde", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "def check_is_ready(self) -> bool:\n", " try:\n", " self.install_dependencies()\n", " import zxingcpp\n", " return True\n", " except:\n", " return False" ] }, { "cell_type": "markdown", "id": "85726855", "metadata": {}, "source": [ "For the whole code of the class, scroll down to the Conclusion.\n", "\n", "### Save the Barcode scanner\n", "\n", "Let's create the main script to save our Extraction AI that will be used to process Documents in Konfuzio. For that, we need to define the Project the AI will be used with and then save the Extraction AI as a compressed [pickle](https://docs.python.org/3/library/pickle.html) file that we will upload to the Server later." ] }, { "cell_type": "code", "execution_count": null, "id": "16b6aaae", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "from konfuzio_sdk.data import Project\n", "\n", "project = Project(id_=project_id, update=True, strict_data_validation=False)\n", "barcode_extraction_ai = BarcodeExtractionAI(category=project.categories[0])\n", "pickle_model_path = barcode_extraction_ai.save()" ] }, { "cell_type": "markdown", "id": "b4fa116d", "metadata": {}, "source": [ "This is an example of how the output of this Barcode Scanner will look like on the DVUI:\n", "\n", "![Output](barcode_scanner_example.png)\n", "\n", "### Conclusion\n", "In this tutorial, we have walked through the steps for building the Barcode scanner AI. Below is the full code to accomplish this task:" ] }, { "cell_type": "code", "execution_count": null, "id": "02320f80", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "YOUR_PROJECT_ID = 46" ] }, { "cell_type": "code", "execution_count": null, "id": "36c0508d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-output" ], "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "from typing import Dict, List\n", "from konfuzio_sdk.trainer.information_extraction import AbstractExtractionAI\n", "from konfuzio_sdk.data import Document, Category, Annotation, AnnotationSet, Project\n", "from konfuzio_sdk.tokenizer.regex import WhitespaceTokenizer\n", "\n", "class BarcodeAnnotation(Annotation):\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(*args, **kwargs)\n", " self.custom_bboxes = kwargs.get(\"custom_bboxes\", [])\n", "\n", " @property\n", " def bboxes(self) -> List[Dict]:\n", " return self.custom_bboxes\n", "\n", "\n", "class BarcodeExtractionAI(AbstractExtractionAI):\n", " \"\"\"\n", " A Wrapper to extract Barcodes from Documents using zxing-cpp library.\n", " \"\"\"\n", "\n", " # you must set this to True if your AI requires pages images\n", " requires_images = True\n", "\n", " def __init__(self, category: Category, *args, **kwargs):\n", " super().__init__(category)\n", " self.tokenizer = WhitespaceTokenizer()\n", "\n", " def fit(self):\n", " # no training is needed since the zxing-cpp library can be used directly for extraction\n", " pass\n", "\n", " def extract(self, document: Document) -> Document:\n", " self.check_is_ready()\n", " result_document = super().extract(document)\n", " result_document._text = \"this should be a long text or at least twice the number of barcodes in the document\"\n", " barcode_label = self.project.get_label_by_name(\"Barcode\")\n", " barcode_label_set = self.project.get_label_set_by_name(\"Barcodes Set\")\n", " barcode_annotation_set = AnnotationSet(\n", " document=result_document, label_set=barcode_label_set\n", " )\n", " for page_index, page in enumerate(document.pages()):\n", " page_width = page.width\n", " page_height = page.height\n", " image = page.get_image(update=True)\n", " image = image.convert(\"RGB\")\n", " image = image.resize((int(page_width), int(page_height)))\n", " page_bboxes_list = self.get_bboxes_from_image(image, page_index)\n", " for bbox_index, bbox_dict in enumerate(page_bboxes_list):\n", " _ = BarcodeAnnotation(\n", " document=result_document,\n", " annotation_set=barcode_annotation_set,\n", " spans=[],\n", " start_offset=bbox_index + 1,\n", " end_offset=bbox_index + 2,\n", " label=barcode_label,\n", " label_set=barcode_label_set,\n", " confidence=1.0,\n", " bboxes=None,\n", " custom_bboxes=[bbox_dict],\n", " )\n", "\n", " return result_document\n", "\n", " def get_bboxes_from_image(self, image, page_index):\n", " from zxingcpp import read_barcodes\n", " bboxes_list = []\n", " barcodes_lib_results = read_barcodes(image)\n", " for result in barcodes_lib_results:\n", " position = str(result.position).rstrip(\"\\x00\")\n", " barcode_text_value = str(result.text).rstrip(\"\\x00\")\n", " top_right = position.split()[1].split(\"x\")\n", " bottom_left = position.split()[-1].split(\"x\")\n", " bbox_dict = self.get_bbox_dict_from_coords(\n", " top_right, bottom_left, page_index, image, barcode_text_value\n", " )\n", " bboxes_list.append(bbox_dict)\n", " return bboxes_list\n", "\n", " def get_bbox_dict_from_coords(\n", " self, top_right, bottom_left, page_index, image, barcode_text_value\n", " ):\n", " x1 = int(top_right[0])\n", " y1 = int(top_right[1])\n", " x0 = int(bottom_left[0])\n", " y0 = int(bottom_left[1])\n", " top = y1\n", " bottom = y0\n", " temp_y0 = image.height - y0\n", " temp_y1 = image.height - y1\n", " y0 = temp_y0\n", " y1 = temp_y1\n", " bbox_dict = {\n", " \"x0\": x0,\n", " \"x1\": x1,\n", " \"y0\": y0,\n", " \"y1\": y1,\n", " \"top\": top,\n", " \"bottom\": bottom,\n", " \"page_index\": page_index,\n", " \"offset_string\": barcode_text_value,\n", " \"custom_offset_string\": True,\n", " }\n", " return bbox_dict\n", "\n", " def install_dependencies(self):\n", " try:\n", " import subprocess\n", " package_name = \"zxing-cpp\"\n", " subprocess.check_call(\n", " [\"pip\", \"install\", package_name])\n", " print(f\"The package {package_name} is ready to be used.\")\n", " except:\n", " raise Exception(\n", " \"An error occured while installing the zxing-cpp library. Please install it manually.\"\n", " )\n", "\n", " def check_is_ready(self) -> bool:\n", " try:\n", " self.install_dependencies()\n", " import zxingcpp\n", " return True\n", " except:\n", " return False\n", "\n", "project = Project(id_=YOUR_PROJECT_ID, strict_data_validation=False)\n", "barcode_extraction_ai = BarcodeExtractionAI(category=project.categories[0])\n", "pickle_model_path = barcode_extraction_ai.save()" ] }, { "cell_type": "markdown", "id": "b611f509", "metadata": {}, "source": [ "### What's next?\n", "\n", "- [Learn how to upload a custom AI](https://dev.konfuzio.com/sdk/tutorials/upload-your-ai/index.html)\n", "- [Get to know how to create any custom Extraction AI](https://dev.konfuzio.com/sdk/tutorials/information_extraction/index.html#train-a-custom-date-extraction-ai)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }