{ "cells": [ { "cell_type": "markdown", "id": "30b32967", "metadata": {}, "source": [ "## Evaluate a Splitting AI\n", "\n", "---\n", "\n", "**Prerequisites:** \n", "\n", "- Data Layer concepts of Konfuzio: Document, Project, Category, Page\n", "- AI Layer concepts of Konfuzio: File Splitting\n", "- Understanding of metrics for evaluating an AI's performance\n", "\n", "**Difficulty:** Medium\n", "\n", "**Goal:** Introduce the `FileSplittingEvaluation` class and explain how to measure Splitting AIs' performances using it.\n", "\n", "---\n", "\n", "### Environment\n", "You need to install the Konfuzio SDK before diving into the tutorial. \\\n", "To get up and running quickly, you can use our Colab Quick Start notebook. \\\n", "\"Open\n", "\n", "As an alternative you can follow the [installation section](../get_started.html#install-sdk) to install and initialize the Konfuzio SDK locally or on an environment of your choice.\n", "\n", "### Introduction\n", "\n", "`FileSplittingEvaluation` class can be used to evaluate performance of Splitting AIs, returning a \n", "set of metrics that includes precision, recall, F1 measure, True Positives, False Positives, True Negatives, and False \n", "Negatives. \n", "\n", "The class's methods `calculate()` and `calculate_by_category()` are run at initialization. The class receives two lists \n", "of Documents as an input – first list consists of ground-truth Documents where all first Pages are marked as such, \n", "second is of Documents on Pages of which File Splitting Model ran a prediction of them being first or non-first. \n", "\n", "### FileSplittingEvaluation class\n", "\n", "Let's initialize the class:" ] }, { "cell_type": "code", "execution_count": null, "id": "76ead829", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ], "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "evaluation = FileSplittingEvaluation(\n", " ground_truth_documents=YOUR_GROUND_TRUTH_LIST, prediction_documents=YOUR_PREDICTION_LIST\n", " )" ] }, { "cell_type": "markdown", "id": "dfb0067d", "metadata": { "lines_to_next_cell": 0 }, "source": [ "The class compares each pair of Pages. If a Page is labeled as first and the model also predicted it as first, it is \n", "considered a True Positive. If a Page is labeled as first but the model predicted it as non-first, it is considered a \n", "False Negative. If a Page is labeled as non-first but the model predicted it as first, it is considered a False \n", "Positive. If a Page is labeled as non-first and the model also predicted it as non-first, it is considered a True \n", "Negative.\n", "\n", "| | predicted correctly | predicted incorrectly |\n", "| ------ | ------ | ------ |\n", "| first Page | TP | FN |\n", "| non-first Page | TN | FP |\n", "\n", "### Example of evaluation input and output \n", "\n", "Suppose in our test dataset we have 2 Documents of 2 Categories: one 3-paged, consisting of a single file (-> it has \n", "only one ground-truth first Page) of a first Category, and one 5-paged, consisting of three files: two 2-paged and one \n", "1-paged (-> it has three ground-truth first Pages), of a second Category.\n", "\n", "![Document 1](document_example_1.png)\n", "\n", "_First document_\n", "\n", "![Document 2](document_example_2.png)\n", "\n", "_Second document_\n", "\n", "Let's create these mock Documents. This example builds the Documents from scratch and without uploading a supported file;\n", "if you uploaded your Documents to the Konfuzio Server, you can retrieve and use them." ] }, { "cell_type": "code", "execution_count": null, "id": "70a4fb9f", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "from konfuzio_sdk.samples import LocalTextProject\n", "YOUR_PROJECT = LocalTextProject()\n", "YOUR_CATEGORY_1 = YOUR_PROJECT.get_category_by_id(3)\n", "YOUR_CATEGORY_2 = YOUR_PROJECT.get_category_by_id(4)" ] }, { "cell_type": "code", "execution_count": null, "id": "a061507a", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-output" ] }, "outputs": [], "source": [ "from konfuzio_sdk.data import Document, Page\n", "from konfuzio_sdk.evaluate import FileSplittingEvaluation, EvaluationCalculator\n", "from konfuzio_sdk.trainer.file_splitting import SplittingAI\n", "\n", "text_1 = \"Hi all,\\nI like bread.\\nI hope to get everything done soon.\\nHave you seen it?\"\n", "document_1 = Document(id_=20, project=YOUR_PROJECT, category=YOUR_CATEGORY_1, text=text_1, dataset_status=3)\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=0, end_offset=21, number=1, copy_of_id=29\n", ")\n", "\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=22, end_offset=57, number=2, copy_of_id=30\n", ")\n", "\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=58, end_offset=75, number=3, copy_of_id=31\n", ")\n", "\n", "text_2 = \"Evening,\\nthank you for coming.\\nI like fish.\\nI need it.\\nEvening.\"\n", "document_2 = Document(id_=21, project=YOUR_PROJECT, category=YOUR_CATEGORY_2, text=text_2, dataset_status=3)\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=0, end_offset=8, number=1, copy_of_id=32\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=9, end_offset=30, number=2, copy_of_id=33\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=31, end_offset=43, number=3, copy_of_id=34\n", ")\n", "_.is_first_page = True\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=44, end_offset=54, number=4, copy_of_id=35\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=55, end_offset=63, number=5, copy_of_id=36\n", ")\n", "_.is_first_page = True" ] }, { "cell_type": "markdown", "id": "f656f173", "metadata": {}, "source": [ "### Running Evaluation on predicted Documents' Pages\n", "\n", "We need to pass two lists of Documents into the `FileSplittingEvaluation` class. So, before that, we need to run each \n", "Page of the Documents through the model's prediction.\n", "\n", "Let's say the evaluation gave good results, with only one first Page being predicted as non-first and all the other \n", "Pages being predicted correctly. An example of how the evaluation would be implemented would be:" ] }, { "cell_type": "code", "execution_count": null, "id": "2b4de59f", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import logging\n", "from konfuzio_sdk.trainer.file_splitting import ContextAwareFileSplittingModel\n", "from konfuzio_sdk.tokenizer.regex import ConnectedTextTokenizer\n", "YOUR_MODEL = ContextAwareFileSplittingModel(categories=[YOUR_CATEGORY_1, YOUR_CATEGORY_2], tokenizer=ConnectedTextTokenizer())\n", "YOUR_MODEL.fit(allow_empty_categories=True)\n", "logging.getLogger(\"konfuzio_sdk\").setLevel(logging.CRITICAL)" ] }, { "cell_type": "code", "execution_count": null, "id": "8483ac4b", "metadata": { "editable": true, "slideshow": { "slide_type": "" } }, "outputs": [], "source": [ "splitting_ai = SplittingAI(YOUR_MODEL)\n", "pred_1: Document = splitting_ai.propose_split_documents(document_1, return_pages=True)[0]\n", "pred_2: Document = splitting_ai.propose_split_documents(document_2, return_pages=True)[0]\n", "\n", "evaluation = FileSplittingEvaluation(\n", " ground_truth_documents=[document_1, document_2], prediction_documents=[pred_1, pred_2]\n", ")\n", "print('True positives: ' + str(evaluation.tp()))\n", "print('True negatives: ' + str(evaluation.tn()))\n", "print('False positives: ' + str(evaluation.fp()))\n", "print('False negatives: ' + str(evaluation.fn()))\n", "print('Precision: ' + str(evaluation.precision()))\n", "print('Recall: ' + str(evaluation.recall()))\n", "print('F1 score: ' + str(evaluation.f1()))" ] }, { "cell_type": "markdown", "id": "0e39c2d4", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Our results could be reflected in a following table:\n", "\n", "| TPs | TNs | FPs | FNs | precision | recall | F1 |\n", "| ---- |-----|-----| ----- | ---- | ---- |-------|\n", "| 3 | 4 | 0 | 1 | 1 | 0.75 | 0.85 |\n", "\n", "If we want to see evaluation results by Category, the implementation of the Evaluation would look like this:" ] }, { "cell_type": "code", "execution_count": null, "id": "5b418510", "metadata": { "editable": true, "slideshow": { "slide_type": "" } }, "outputs": [], "source": [ "print('True positives for Category 1: ' + str(evaluation.tp(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.tp(search=YOUR_CATEGORY_2)))\n", "print('True negatives for Category 1: ' + str(evaluation.tn(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.tn(search=YOUR_CATEGORY_2)))\n", "print('False positives for Category 1: ' + str(evaluation.fp(search=YOUR_CATEGORY_1))+ ', for Category 2: ' + str(evaluation.fp(search=YOUR_CATEGORY_2)))\n", "print('False negatives for Category 1: ' + str(evaluation.fn(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.fn(search=YOUR_CATEGORY_2)))\n", "print('Precision for Category 1: ' + str(evaluation.precision(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.precision(search=YOUR_CATEGORY_2)))\n", "print('Recall for Category 1: ' + str(evaluation.recall(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.recall(search=YOUR_CATEGORY_2)))\n", "print('F1 score for Category 1: ' + str(evaluation.f1(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.f1(search=YOUR_CATEGORY_2)))" ] }, { "cell_type": "markdown", "id": "08f0fac4", "metadata": {}, "source": [ "The output could be reflected in a following table:\n", "\n", "| Category | TPs | TNs | FPs | FNs | precision | recall | F1 |\n", "| ---- |-----|-----|-----|-----| ---- |--------|-----|\n", "| Category 1 | 1 | 2 | 0 | 0 | 1 | 1 | 1 |\n", "| Category 2 | 2 | 2 | 0 | 1 | 1 | 0.66 | 0.8 |\n", "\n", "To log metrics after evaluation, you can call `EvaluationCalculator`'s method `metrics_logging` (you would need to \n", "specify the metrics accordingly at the class's initialization). Example usage:" ] }, { "cell_type": "code", "execution_count": null, "id": "5b74f2a6", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "logging.getLogger(\"konfuzio_sdk\").setLevel(logging.INFO)" ] }, { "cell_type": "code", "execution_count": null, "id": "919d1a05", "metadata": { "editable": true, "slideshow": { "slide_type": "" } }, "outputs": [], "source": [ "EvaluationCalculator(tp=3, fp=0, fn=1, tn=4).metrics_logging()" ] }, { "cell_type": "markdown", "id": "5729c5f3", "metadata": {}, "source": [ "### Conclusion\n", "In this tutorial, we have walked through the essential steps for evaluating the performance of File Splitting AI using FileSplittingEvaluation class. Below is the full code to accomplish this task:" ] }, { "cell_type": "code", "execution_count": null, "id": "796c7fc6", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "skip-execution", "nbval-skip" ], "vscode": { "languageId": "plaintext" } }, "outputs": [], "source": [ "from konfuzio_sdk.data import Document, Page\n", "from konfuzio_sdk.evaluate import FileSplittingEvaluation, EvaluationCalculator\n", "from konfuzio_sdk.trainer.file_splitting import SplittingAI\n", "\n", "text_1 = \"Hi all,\\nI like bread.\\nI hope to get everything done soon.\\nHave you seen it?\"\n", "document_1 = Document(id_=20, project=YOUR_PROJECT, category=YOUR_CATEGORY_1, text=text_1, dataset_status=3)\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=0, end_offset=21, number=1, copy_of_id=29\n", ")\n", "\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=22, end_offset=57, number=2, copy_of_id=30\n", ")\n", "\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_1, start_offset=58, end_offset=75, number=3, copy_of_id=31\n", ")\n", "\n", "text_2 = \"Evening,\\nthank you for coming.\\nI like fish.\\nI need it.\\nEvening.\"\n", "document_2 = Document(id_=21, project=YOUR_PROJECT, category=YOUR_CATEGORY_2, text=text_2, dataset_status=3)\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=0, end_offset=8, number=1, copy_of_id=32\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=9, end_offset=30, number=2, copy_of_id=33\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=31, end_offset=43, number=3, copy_of_id=34\n", ")\n", "_.is_first_page = True\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=44, end_offset=54, number=4, copy_of_id=35\n", ")\n", "_ = Page(\n", " id_=None, original_size=(320, 240), document=document_2, start_offset=55, end_offset=63, number=5, copy_of_id=36\n", ")\n", "_.is_first_page = True\n", "\n", "splitting_ai = SplittingAI(YOUR_MODEL)\n", "pred_1: Document = splitting_ai.propose_split_documents(document_1, return_pages=True)[0]\n", "pred_2: Document = splitting_ai.propose_split_documents(document_2, return_pages=True)[0]\n", "\n", "YOUR_GROUND_TRUTH_LIST = [document_1, document_2]\n", "YOUR_PREDICTION_LIST = [pred_1, pred_2]\n", "\n", "evaluation = FileSplittingEvaluation(\n", " ground_truth_documents=YOUR_GROUND_TRUTH_LIST, prediction_documents=YOUR_PREDICTION_LIST\n", " )\n", "\n", "print('True positives: ' + str(evaluation.tp()))\n", "print('True negatives: ' + str(evaluation.tn()))\n", "print('False positives: ' + str(evaluation.fp()))\n", "print('False negatives: ' + str(evaluation.fn()))\n", "print('Precision: ' + str(evaluation.precision()))\n", "print('Recall: ' + str(evaluation.recall()))\n", "print('F1 score: ' + str(evaluation.\n", "\n", "print('True positives for Category 1: ' + str(evaluation.tp(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.tp(search=YOUR_CATEGORY_2)))\n", "print('True negatives for Category 1: ' + str(evaluation.tn(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.tn(search=YOUR_CATEGORY_2)))\n", "print('False positives for Category 1: ' + str(evaluation.fp(search=YOUR_CATEGORY_1))+ ', for Category 2: ' + str(evaluation.fp(search=YOUR_CATEGORY_2)))\n", "print('False negatives for Category 1: ' + str(evaluation.fn(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.fn(search=YOUR_CATEGORY_2)))\n", "print('Precision for Category 1: ' + str(evaluation.precision(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.precision(search=YOUR_CATEGORY_2)))\n", "print('Recall for Category 1: ' + str(evaluation.recall(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.recall(search=YOUR_CATEGORY_2)))\n", "print('F1 score for Category 1: ' + str(evaluation.f1(search=YOUR_CATEGORY_1)) + ', for Category 2: ' + str(evaluation.f1(search=YOUR_CATEGORY_2)))\n", "\n", "EvaluationCalculator(tp=3, fp=0, fn=1, tn=4).metrics_logging()" ] }, { "cell_type": "markdown", "id": "ef232c0c", "metadata": {}, "source": [ "### What's next?\n", "\n", "- [Upload an evaluated AI to Konfuzio app or an on-prem installation](https://dev.konfuzio.com/sdk/tutorials/upload-your-ai/index.html)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }