{ "cells": [ { "cell_type": "markdown", "id": "265c3aca", "metadata": {}, "source": [ "## Set the Category manually\n", "\n", "---\n", "\n", "**Prerequisites:** \n", "- Data Layer concepts of Konfuzio SDK: Project, Category, Document\n", "\n", "**Difficulty:** Easy\n", "\n", "**Goal:** Learn how to set, change and remove Category of a Document and its Pages manually.\n", "\n", "---\n", "\n", "### Environment\n", "You need to install the Konfuzio SDK before diving into the tutorial. \\\n", "To get up and running quickly, you can use our Colab Quick Start notebook. \\\n", "\"Open\n", "\n", "As an alternative you can follow the [installation section](../get_started.html#install-sdk) to install and initialize the Konfuzio SDK locally or on an environment of your choice.\n", "\n", "### Introduction\n", "\n", "When creating a new Document, the first step is to assign a Category to it. In this tutorial you will find out how to do it manually.\n", "\n", "You can initialize a Document with a specific Category:" ] }, { "cell_type": "code", "execution_count": null, "id": "0a0bcf9d", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import logging\n", "import konfuzio_sdk\n", "\n", "logging.getLogger(\"konfuzio_sdk\").setLevel(logging.ERROR)\n", "\n", "YOUR_PROJECT_ID = 46\n", "YOUR_CATEGORY_ID = 63\n", "YOUR_DOCUMENT_ID = 44865" ] }, { "cell_type": "code", "execution_count": null, "id": "ff7b0b1d", "metadata": {}, "outputs": [], "source": [ "from konfuzio_sdk.data import Project, Document\n", "\n", "project = Project(id_=YOUR_PROJECT_ID)\n", "my_category = project.get_category_by_id(YOUR_CATEGORY_ID)\n", "my_document = Document(text=\"My text.\", project=project, category=my_category)\n", "assert my_document.category == my_category\n", "print(my_document.category)" ] }, { "cell_type": "markdown", "id": "43b1ef09", "metadata": {}, "source": [ "You can also use `Document.set_category` to set a Document’s Category after it has been initialized. This will count as if a human manually revised it.\n", "\n", "*Note:* a Document’s Category can be changed via set_category only if the original Category has been set to no_category. Otherwise, an attempt to change a Category will cause an error." ] }, { "cell_type": "code", "execution_count": null, "id": "7b9c42c1", "metadata": {}, "outputs": [], "source": [ "document = project.get_document_by_id(YOUR_DOCUMENT_ID)\n", "document.set_category(None)\n", "assert document.category == project.no_category\n", "document.set_category(my_category)\n", "assert document.category == my_category\n", "assert document.category_is_revised is True\n", "print(document.category)" ] }, { "cell_type": "markdown", "id": "dabc091a", "metadata": {}, "source": [ "Each Page's Category will also be changed to a Category set to this Document." ] }, { "cell_type": "code", "execution_count": null, "id": "d1f7e2fd", "metadata": {}, "outputs": [], "source": [ "for page in document.pages():\n", " assert page.category == my_category\n", " print(page.category)" ] }, { "cell_type": "markdown", "id": "1883cfad", "metadata": {}, "source": [ "If a Document is initialized with no Category, it will automatically be set to NO_CATEGORY. Another Category can be manually set later." ] }, { "cell_type": "markdown", "id": "874f5e34", "metadata": {}, "source": [ "### Conclusion\n", "In this tutorial, we walked you through the steps of manually setting and changing the Category of a Document and its Pages. Below is the full code to accomplish this task:" ] }, { "cell_type": "code", "execution_count": null, "id": "f23d40db", "metadata": { "tags": [ "skip-execution" ] }, "outputs": [], "source": [ "from konfuzio_sdk.data import Project, Document\n", "\n", "project = Project(id_=YOUR_PROJECT_ID)\n", "my_category = project.get_category_by_id(YOUR_CATEGORY_ID)\n", "\n", "my_document = Document(text=\"My text.\", project=project, category=my_category)\n", "assert my_document.category == my_category\n", "\n", "document = project.get_document_by_id(YOUR_DOCUMENT_ID)\n", "document.set_category(None)\n", "assert document.category == project.no_category\n", "document.set_category(my_category)\n", "assert document.category == my_category\n", "assert document.category_is_revised is True\n", "\n", "for page in document.pages():\n", " assert page.category == my_category" ] }, { "cell_type": "markdown", "id": "2960e6ad", "metadata": {}, "source": [ "### What's next?\n", "\n", "- [Learn how to categorize Documents automatically using Categorization AI](https://dev.konfuzio.com/sdk/tutorials/document_categorization/index.html)\n", "- [Create your own custom Categorization AI](https://dev.konfuzio.com/sdk/tutorials/create-custom-categorization-ai/index.html)" ] } ], "metadata": { "kernelspec": { "display_name": "konfuzio", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }