Create, change and delete Annotations using the wrapped API call¶
Prerequisites:
Data Layer concepts of Konfuzio: Project, Document, Annotation, Span, Bounding Box, Label, Label Set, Annotation Set
Understanding of concepts of REST API
Difficulty: Medium
Goal: Explain how to create different types of Annotation (textual, visual) using the methods from the SDK listed in
konfuzio_sdk.api
, how to change or delete them.
Environment¶
You need to install the Konfuzio SDK before diving into the tutorial.
To get up and running quickly, you can use our Colab Quick Start notebook.
As an alternative you can follow the installation section to install and initialize the Konfuzio SDK locally or on an environment of your choice.
Introduction¶
There are several ways to create an Annotation in a Document: using the SmartView or DVUI on Konfuzio’s app or an
on-prem installation, via the Annotation
class in the SDK or via the direct call to the API endpoint
api/v3/annotations
. In Server documentation, we already provide an instruction
on creating an Annotation via the POST request using curl
; in this tutorial, we will explain how to make this
request using the methods from konfuzio_sdk.api
which serves as a wrapper around the calls to the API.
Let’s start by making necessary imports:
import json
from konfuzio_sdk.api import post_document_annotation, delete_document_annotation
from konfuzio_sdk.data import Span, Project
To create any Annotation, it is necessary to provide several fields to post_document_annotation
:
document_id
: ID of a Document in which Annotation is createdlabel
: ID of a Label assigned to the Annotationspans
: Coordinates of Bounding Boxes representing the position of the Annotation in the Document.either
label_set
orannotation_set
: provide an ID of a Label Set if you want to create Annotation within a new Annotation Set, or provide an ID of an existing Annotation Set if you want to add Annotation into it without creating a new Annotation Set.
Creating a textual Annotation¶
To create an Annotation that is based on existing text of a Document, let’s firstly define the test Document and the
Span that will be passed as the spans
argument. You can define one or more Spans.
project = Project(id_=YOUR_PROJECT_ID)
test_document = project.get_document_by_id(YOUR_DOCUMENT_ID)
spans = [Span(document=test_document, start_offset=3067, end_offset=3074)]
Next, let’s specify arguments for a POST request that creates Annotations and send it to the server. We want to create
an Annotation within a new Annotation Set so we specify label_set_id
.
response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=spans, label_id=YOUR_LABEL_ID, confidence=100.0,
label_set_id=YOUR_LABEL_SET_ID)
Let’s check if an Annotation has been created successfully and has a Span coinciding with the one created by us above:
response = json.loads(response.text)
print(response['span'])
[{'x0': 300.48, 'y0': 232.833, 'x1': 320.16, 'y1': 239.833, 'page_index': 0, 'start_offset': 3067, 'end_offset': 3074, 'offset_string': 'Lohnart', 'offset_string_original': 'Lohnart'}]
Creating a visual Annotation¶
To create an Annotation that is based on Bounding Boxes’ coordinates, let’s create a dictionary of coordinates that will
be passed as the spans
argument. You can define one or more Bounding Boxes. Note that you don’t need to specify
offsets, only the page_index
is needed.
bboxes = [
{'page_index': 0, 'x0': 457, 'x1': 480, 'y0': 290, 'y1': 303},
{'page_index': 0, 'x0': 452.16, 'x1': 482.64, 'y0': 306, 'y1': 313,}
]
Next, we specify arguments for a POST request to create an Annotation and send it to the server. We want to create
an Annotation within a new Annotation Set, so we specify label_set_id
.
response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=bboxes, label_id=YOUR_LABEL_ID, confidence=100.0,
label_set_id=YOUR_LABEL_SET_ID)
Let’s check if an Annotation has been created successfully:
response = json.loads(response.text)
print(response['span'])
[{'x0': 452.16, 'y0': 306.035, 'x1': 482.64, 'y1': 312.035, 'page_index': 0, 'start_offset': 2514, 'end_offset': 2525, 'offset_string': 'PV-Beitrag®', 'offset_string_original': 'PV-Beitrag®'}, {'x0': 462.12, 'y0': 292.593, 'x1': 479.04, 'y1': 300.593, 'page_index': 0, 'start_offset': 2619, 'end_offset': 2622, 'offset_string': '825', 'offset_string_original': '825'}]
Change an Annotation¶
To update details of an Annotation, use change_document_annotation
method from konfuzio_sdk.api
. You can specify
a Label, a Label Set, an Annotation Set, is_correct
and revised
statuses, Span list and selection Bbox to be
updated to a new value.
from konfuzio_sdk.api import change_document_annotation
response = change_document_annotation(annotation_id=YOUR_ANNOTATION_ID, label=NEW_LABEL_ID)
Let’s check if an Annotation’s Label was changed successfully:
print(response.json()['label'])
{'id': 299898, 'name': 'Steuer-Brutto', 'has_multiple_top_candidates': False, 'data_type': 'Text', 'threshold': 0.1}
Delete an Annotation¶
To delete an Annotation, use delete_document_annotation
method from konfuzio_sdk.api
. This method runs in two modes:
soft deletion (does not delete from the database, just deletes from approved Annotations viewed in the Document,
creating a negative Annotation instead) and hard deletion (deletes Annotations permanently from the database). For AI
training purposes, we recommend setting delete_from_database
to False if you don’t want to remove an Annotation
permanently.
from konfuzio_sdk.api import delete_document_annotation
# soft-delete and create a negative Annotation
negative_id = delete_document_annotation(annotation_id=YOUR_ANNOTATION_ID)
# hard-delete and remove a negative Annotation from DB permanently
assert delete_document_annotation(negative_id, delete_from_database=True).status_code == 204
Conclusion¶
In this tutorial, we have explained how to create different types of Annotations using native Konfuzio SDK’s wrappers around the API calls to the server. Here is the full code for the tutorial:
import json
from konfuzio_sdk.api import post_document_annotation, delete_document_annotation, change_document_annotation
from konfuzio_sdk.data import Span, Project
test_document = Project(id_=YOUR_PROJECT_ID).get_document_by_id(YOUR_DOCUMENT_ID)
spans = [Span(document=test_document, start_offset=3067, end_offset=3074)]
response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=spans, label_id=YOUR_LABEL_ID, confidence=100.0,
label_set_id=YOUR_LABEL_SET_ID)
response = json.loads(response.text)
print(response['span'])
bboxes = [
{'page_index': 0, 'x0': 457, 'x1': 480, 'y0': 290, 'y1': 303},
{'page_index': 0, 'x0': 452.16, 'x1': 482.64, 'y0': 306, 'y1': 313,}
]
response = post_document_annotation(document_id=YOUR_DOCUMENT_ID, spans=bboxes, label_id=YOUR_LABEL_ID, confidence=100.0,
label_set_id=YOUR_LABEL_SET_ID)
response = json.loads(response.text)
print(response['span'])
response = change_document_annotation(annotation_id=YOUR_ANNOTATION_ID, label=NEW_LABEL_ID)
print(response.json()['label'])
# soft-delete and create a negative Annotation
negative_id = delete_document_annotation(annotation_id=YOUR_ANNOTATION_ID)
# hard-delete and remove a negative Annotation from DB permanently
assert delete_document_annotation(negative_id, delete_from_database=True).status_code == 204