{ "cells": [ { "cell_type": "markdown", "id": "eebc6816", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## Create, change and delete Annotations using the wrapped API call\n", "\n", "---\n", "\n", "**Prerequisites:**\n", "- Data Layer concepts of Konfuzio: Project, Document, Annotation, Span, Bounding Box, Label, Label Set, Annotation Set\n", "- Understanding of concepts of REST API\n", "\n", "**Difficulty:** Medium\n", "\n", "**Goal:** Explain how to create different types of Annotation (textual, visual) using the methods from the SDK listed in\n", "`konfuzio_sdk.api`, how to change or delete them.\n", "\n", "---\n", "\n", "### Environment\n", "You need to install the Konfuzio SDK before diving into the tutorial. \\\n", "To get up and running quickly, you can use our Colab Quick Start notebook. \\\n", "\"Open\n", "\n", "As an alternative you can follow the [installation section](../get_started.html#install-sdk) to install and initialize the Konfuzio SDK locally or on an environment of your choice.\n", "\n", "### Introduction\n", "\n", "There are several ways to create an Annotation in a Document: using the SmartView or DVUI on Konfuzio's app or an \n", "on-prem installation, via the `Annotation` class in the SDK or via the direct call to the API endpoint \n", "`api/v3/annotations`. In Server documentation, we already provide an [instruction](https://dev.konfuzio.com/web/api-v3.html#create-an-annotation) \n", "on creating an Annotation via the POST request using `curl`; in this tutorial, we will explain how to make this \n", "request using the methods from `konfuzio_sdk.api` which serves as a wrapper around the calls to the API.\n", "\n", "Let's start by making necessary imports:" ] }, { "cell_type": "code", "execution_count": null, "id": "a4f10f9f", "metadata": { "lines_to_next_cell": 0, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import logging\n", "from konfuzio_sdk.api import restore_snapshot\n", "from konfuzio_sdk.data import Project\n", "\n", "logging.getLogger(\"konfuzio_sdk\").setLevel(logging.ERROR)\n", "YOUR_PROJECT_ID = restore_snapshot(snapshot_id=65)\n", "project = Project(id_=YOUR_PROJECT_ID)\n", "original_document_text = Project(id_=46).get_document_by_id(44823).text\n", "YOUR_DOCUMENT_ID = [document for document in project.documents if document.text == original_document_text][0].id_\n", "YOUR_LABEL_ID = project.get_label_by_name('Steuer-Brutto').id_\n", "YOUR_LABEL_SET_ID = project.get_label_set_by_name('Lohnabrechnung').id_\n", "NEW_LABEL_ID = YOUR_LABEL_ID\n", "\n", "project.get_document_by_id(YOUR_DOCUMENT_ID).get_bbox()" ] }, { "cell_type": "code", "execution_count": null, "id": "6b0e5b41", "metadata": {}, "outputs": [], "source": [ "import json \n", "\n", "from konfuzio_sdk.api import post_document_annotation, delete_document_annotation\n", "from konfuzio_sdk.data import Span, Project" ] }, { "cell_type": "markdown", "id": "f5ab9935", "metadata": { "lines_to_next_cell": 0 }, "source": [ "To create any Annotation, it is necessary to provide several fields to `post_document_annotation`:\n", "\n", "- `document_id`: ID of a Document in which Annotation is created\n", "- `label`: ID of a Label assigned to the Annotation\n", "- `spans`: Coordinates of Bounding Boxes representing the position of the Annotation in the Document. \n", "- either `label_set` or `annotation_set`: provide an ID of a Label Set if you want to create Annotation within a new \n", "Annotation Set, or provide an ID of an existing Annotation Set if you want to add Annotation into it without creating a\n", "new Annotation Set.\n", "\n", "### Creating a textual Annotation\n", "\n", "To create an Annotation that is based on existing text of a Document, let's firstly define the test Document and the \n", "Span that will be passed as the `spans` argument. You can define one or more Spans." ] }, { "cell_type": "code", "execution_count": null, "id": "1f5aac0a", "metadata": {}, "outputs": [], "source": [ "project = Project(id_=YOUR_PROJECT_ID)\n", "test_document = project.get_document_by_id(YOUR_DOCUMENT_ID)\n", "spans = [Span(document=test_document, start_offset=3067, end_offset=3074)]" ] }, { "cell_type": "markdown", "id": "2274e36e", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Next, let's specify arguments for a POST request that creates Annotations and send it to the server. We want to create\n", "an Annotation within a new Annotation Set so we specify `label_set_id`." ] }, { "cell_type": "code", "execution_count": null, "id": "50a22903", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=spans, label_id=YOUR_LABEL_ID, confidence=100.0,\n", " label_set_id=YOUR_LABEL_SET_ID)" ] }, { "cell_type": "markdown", "id": "bd310c03", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Let's check if an Annotation has been created successfully and has a Span coinciding with the one created by us above:" ] }, { "cell_type": "code", "execution_count": null, "id": "00dce24c", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "response = json.loads(response.text)\n", "print(response['span'])" ] }, { "cell_type": "code", "execution_count": null, "id": "008fd760", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "negative_id = delete_document_annotation(response['id'])\n", "assert delete_document_annotation(negative_id, delete_from_database=True).status_code == 204" ] }, { "cell_type": "markdown", "id": "1a657a80", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Creating a visual Annotation\n", "\n", "To create an Annotation that is based on Bounding Boxes' coordinates, let's create a dictionary of coordinates that will\n", "be passed as the `spans` argument. You can define one or more Bounding Boxes. Note that you don't need to specify \n", "offsets, only the `page_index` is needed." ] }, { "cell_type": "code", "execution_count": null, "id": "25259eaa", "metadata": { "lines_to_next_cell": 0, "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "original_document_text = Project(id_=46).get_document_by_id(44834).text\n", "YOUR_DOCUMENT_ID = [document for document in project.documents if document.text == original_document_text][0].id_\n", "YOUR_LABEL_ID = project.get_label_by_name('Bezeichnung').id_" ] }, { "cell_type": "code", "execution_count": null, "id": "42fc2ef7", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "bboxes = [\n", " {'page_index': 0, 'x0': 457, 'x1': 480, 'y0': 290, 'y1': 303},\n", " {'page_index': 0, 'x0': 452.16, 'x1': 482.64, 'y0': 306, 'y1': 313,}\n", "]" ] }, { "cell_type": "markdown", "id": "d93b5ed0", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Next, we specify arguments for a POST request to create an Annotation and send it to the server. We want to create\n", "an Annotation within a new Annotation Set, so we specify `label_set_id`." ] }, { "cell_type": "code", "execution_count": null, "id": "adf8d5f5", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=bboxes, label_id=YOUR_LABEL_ID, confidence=100.0,\n", " label_set_id=YOUR_LABEL_SET_ID)" ] }, { "cell_type": "markdown", "id": "e2cdb386", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Let's check if an Annotation has been created successfully:" ] }, { "cell_type": "code", "execution_count": null, "id": "877dbc11", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "response = json.loads(response.text)\n", "print(response['span'])" ] }, { "cell_type": "code", "execution_count": null, "id": "3c0fb956", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "assert delete_document_annotation(response['id'], delete_from_database=True)\n", "YOUR_ANNOTATION_ID = test_document.annotations(label=project.get_label_by_id(NEW_LABEL_ID))[0].id_" ] }, { "cell_type": "markdown", "id": "9a0f8b2f", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Change an Annotation\n", "To update details of an Annotation, use `change_document_annotation` method from `konfuzio_sdk.api`. You can specify \n", "a Label, a Label Set, an Annotation Set, `is_correct` and `revised` statuses, Span list and selection Bbox to be \n", "updated to a new value." ] }, { "cell_type": "code", "execution_count": null, "id": "7470b3fb", "metadata": { "lines_to_next_cell": 0 }, "outputs": [], "source": [ "from konfuzio_sdk.api import change_document_annotation\n", "\n", "response = change_document_annotation(annotation_id=YOUR_ANNOTATION_ID, label=NEW_LABEL_ID)" ] }, { "cell_type": "markdown", "id": "bfbad6d2", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Let's check if an Annotation's Label was changed successfully:" ] }, { "cell_type": "code", "execution_count": null, "id": "0390b409", "metadata": {}, "outputs": [], "source": [ "print(response.json()['label'])" ] }, { "cell_type": "markdown", "id": "62190f46", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Delete an Annotation\n", "To delete an Annotation, use `delete_document_annotation` method from `konfuzio_sdk.api`. This method runs in two modes:\n", "soft deletion (does not delete from the database, just deletes from approved Annotations viewed in the Document, \n", "creating a negative Annotation instead) and hard deletion (deletes Annotations permanently from the database). For AI \n", "training purposes, we recommend setting `delete_from_database` to False if you don't want to remove an Annotation \n", "permanently." ] }, { "cell_type": "code", "execution_count": null, "id": "536976a2", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "from konfuzio_sdk.api import delete_document_annotation\n", "\n", "# soft-delete and create a negative Annotation\n", "negative_id = delete_document_annotation(annotation_id=YOUR_ANNOTATION_ID)\n", "# hard-delete and remove a negative Annotation from DB permanently\n", "assert delete_document_annotation(negative_id, delete_from_database=True).status_code == 204" ] }, { "cell_type": "markdown", "id": "4f71614b", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Conclusion\n", "\n", "In this tutorial, we have explained how to create different types of Annotations using native Konfuzio SDK's wrappers\n", "around the API calls to the server. Here is the full code for the tutorial:" ] }, { "cell_type": "code", "execution_count": null, "id": "f4756380", "metadata": { "tags": [ "skip-execution", "nbval-skip" ] }, "outputs": [], "source": [ "import json \n", "\n", "from konfuzio_sdk.api import post_document_annotation, delete_document_annotation, change_document_annotation\n", "from konfuzio_sdk.data import Span, Project\n", "\n", "test_document = Project(id_=YOUR_PROJECT_ID).get_document_by_id(YOUR_DOCUMENT_ID)\n", "spans = [Span(document=test_document, start_offset=3067, end_offset=3074)]\n", "response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=spans, label_id=YOUR_LABEL_ID, confidence=100.0,\n", " label_set_id=YOUR_LABEL_SET_ID)\n", "response = json.loads(response.text)\n", "print(response['span'])\n", "\n", "bboxes = [\n", " {'page_index': 0, 'x0': 457, 'x1': 480, 'y0': 290, 'y1': 303},\n", " {'page_index': 0, 'x0': 452.16, 'x1': 482.64, 'y0': 306, 'y1': 313,}\n", "]\n", "response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=bboxes, label_id=YOUR_LABEL_ID, confidence=100.0,\n", " label_set_id=YOUR_LABEL_SET_ID)\n", "response = json.loads(response.text)\n", "print(response['span'])\n", "\n", "response = change_document_annotation(annotation_id=YOUR_ANNOTATION_ID, label=NEW_LABEL_ID)\n", "print(response.json()['label'])\n", "\n", "# soft-delete and create a negative Annotation\n", "negative_id = delete_document_annotation(annotation_id=YOUR_ANNOTATION_ID)\n", "# hard-delete and remove a negative Annotation from DB permanently\n", "assert delete_document_annotation(negative_id, delete_from_database=True).status_code == 204" ] }, { "cell_type": "markdown", "id": "afaadd7e", "metadata": {}, "source": [ "### What's next?\n", "\n", "- [Learn more about Konfuzio's REST API and its possibilities](https://dev.konfuzio.com/web/api-v3.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }