Set the Category manually


Prerequisites:

  • Data Layer concepts of Konfuzio SDK: Project, Category, Document

Difficulty: Easy

Goal: Learn how to set, change and remove Category of a Document and its Pages manually.


Environment

You need to install the Konfuzio SDK before diving into the tutorial.
To get up and running quickly, you can use our Colab Quick Start notebook.
Open In Colab

As an alternative you can follow the installation section to install and initialize the Konfuzio SDK locally or on an environment of your choice.

Introduction

When creating a new Document, the first step is to assign a Category to it. In this tutorial you will find out how to do it manually.

You can initialize a Document with a specific Category:

from konfuzio_sdk.data import Project, Document

project = Project(id_=YOUR_PROJECT_ID)
my_category = project.get_category_by_id(YOUR_CATEGORY_ID)
my_document = Document(text="My text.", project=project, category=my_category)
assert my_document.category == my_category
print(my_document.category)
Category: Lohnabrechnung (63)

You can also use Document.set_category to set a Document’s Category after it has been initialized. This will count as if a human manually revised it.

Note: a Document’s Category can be changed via set_category only if the original Category has been set to no_category. Otherwise, an attempt to change a Category will cause an error.

document = project.get_document_by_id(YOUR_DOCUMENT_ID)
document.set_category(None)
assert document.category == project.no_category
document.set_category(my_category)
assert document.category == my_category
assert document.category_is_revised is True
print(document.category)
Category: Lohnabrechnung (63)

Each Page’s Category will also be changed to a Category set to this Document.

for page in document.pages():
    assert page.category == my_category
    print(page.category)
Category: Lohnabrechnung (63)

If a Document is initialized with no Category, it will automatically be set to NO_CATEGORY. Another Category can be manually set later.

Conclusion

In this tutorial, we walked you through the steps of manually setting and changing the Category of a Document and its Pages. Below is the full code to accomplish this task:

from konfuzio_sdk.data import Project, Document

project = Project(id_=YOUR_PROJECT_ID)
my_category = project.get_category_by_id(YOUR_CATEGORY_ID)

my_document = Document(text="My text.", project=project, category=my_category)
assert my_document.category == my_category

document = project.get_document_by_id(YOUR_DOCUMENT_ID)
document.set_category(None)
assert document.category == project.no_category
document.set_category(my_category)
assert document.category == my_category
assert document.category_is_revised is True

for page in document.pages():
    assert page.category == my_category

What’s next?