Changelog

All notable changes in the server of app.konfuzio.com will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Calendar Versioning.

2021-10-16_13-20-12

Added

  • Improve detection of annotations which consist of multiple words

  • Date filtering for project documents API endpoint

  • Filtering of labels and label sets according to the category of a document (in the SmartView)

Fixed

  • Selection of characters in SmartView incomplete when editing an annotation

  • Dark Mode setting of browser not compatible with Konfuzio Server

  • Some case where the document list was not reloaded automatically

2021-10-07_11-42-29

Added

  • More advanced task priorities and improved worker ressource usage

  • Auto-reload of new uploaded documents

2021-09-28_09-29-43

Fixed

  • Evaluation does not complete if no test documents are specified

2021-09-24_13-53-32

Fixed

  • Incompleted evaluation

  • Formatting of the “Check your browser” page for logged out users.

2021-09-16_12-25-23

Fixed

  • Adding of categories to existing label sets

2021-09-08_16-01-46

Added

  • Migration scripts for user permissions and e-mail templates

2021-09-07_12-24-26

Added

  • Support for SMTP e-Mail backends via environment variables

Fixed

  • DOS protection prevents start of Konfuzio server

2021-09-05_20-57-31

Added

  • Autosave for any change on the document list page

  • German language support

  • Finetuning of exctraction AIs via parameters

  • New fields AI quality and data quality

  • More detailed evaluation

  • Description Field for extraction AI, label sets, categorization AI, categories

Changed

  • Rename project inviations to members

  • Rename the dataset status form “OCR Error” to “Excluded”

  • Start training per extraction AI

  • Get more insights via the document detail page

2021-08-10_17-08-11

Changed

  • Deactive adoption of template settings according to AI model if not explicitly allowed.

2021-08-10_11-19-33

Added

  • Maximum number of pages per document

Fixed

  • Slow processing of extraction tasks

  • Evaluation when multiple annotations are present

2021-07-28_18-53-12

Changed

  • Make word-based tokenizer the default for new projects

2021-07-23_09-33-20

Fixed

  • Usage of word-base tokenizer

  • Duplicated hints

2021-07-20_17-29-23

Fixed

  • Edited annotation were excluded from the training process

2021-07-15_17-29-25

Added

  • Support to reuse label sets across categories

Changed

  • Allow “rerun extraction” on test and training documents

  • Remove “project statistic csv export” as it is redundant to document csv export

  • Include evaluation for training data in the AI model evaluation report

Fixed

  • Fixed a bug where the EXIF attribute orientation corrupted the bounding boxes images

  • “accept top annotations” does not update human created annotations

2021-07-02_18-13-01

Changed

  • Rate limits for task system

2021-06-29_22-14-33

Added

  • HTTP codes to API interface

Fixed

  • Content type description for some API endpoints

2021-06-22_22-45-48

Added

  • A experimental version of a training health report

2021-06-20_15-14-31

Fixed

  • Failed retraninings for some projects

  • Increased disk usage due to an cache deletion issue

  • Filtering of project invotations according to currently selected project

  • Clarify return types in API documentation

2021-05-26_20-16-02

Added

  • Show confidence for categorization results

  • Show evaluation of categorization Ai models

  • Track version (number of retrainings) for all Ai models

  • Track project and template origin of AiModel

2021-05-24_13-42-45

Changed

  • Use business evaluation implementation from training package

  • Loading time for CSV export evaluation reduced by saving it in the database.

2021-05-18_16-37-25

Added

  • Global project switcher

  • “Top candidates” filter in SmartView

  • “Change dataset” functionality in SmartView

  • Landing page in case the user has no projects (i.e. just registered)

  • Language switcher (not enabled yet)

  • Initial support for German translations (not enabled yet)

Fixed

  • Label threshold is now limited from 0.0 to 1.0

Changed

  • New design for login/signup/reset password pages

  • Design improvements in the control panel and SmartView

  • New logo and favicon

  • API documentation has been improved with types and examples; is now based on OpenAPI 3

  • Updated frontend dependencies and tooling

Removed

  • admin_importer, copy_extraction_as_annotation and related functions have been removed

2021-05-04_12-37-16

Fixed

  • Calculation of true negative when using multiple templates.

2021-04-28_12-27-19

Added

  • Filter for top annotations in SmartView

Changed

  • Dont allow training if there are no training documents

2021-04-25_20-09-04

Added

  • Protect signup with captcha

Fixed

  • Editing of annotation if there are already declined annotations.

2021-04-19_22-32-19

Added

  • Add label creation endpoint

  • Token-based authentication for the API

2021-04-03_09-46-56

Added

  • Show Django sidebar in Smartview and template view.

Changed

  • Save extraction results in a more efficient way.

  • Show a warning if an annotation with a custom offset string is created

  • Shwo loading indicator in the smartview search

Fixed

  • Default template dropdown sometimes disabled when creating a Template

  • Rare case where the document list could not be loaded

2021-03-15_15-12-04

Added

  • Add option to accept all annotations.

2021-03-07_21-32-41

Added

Fixed

  • Typo in privacy policy

  • Confirmation message when deleting labels

  • Performance of csv export

Changed

  • Delete old unrevised annotations when rerunning AiModel.

2021-02-25_09-28-07

Added

  • Option to select tokenizer for training (ProjectAdmin)

  • Option to add training parameters (SuperuserProjectAdmin)

Changed

  • Set a documents category_template on new documents if there is only one category_template available

  • Improved delete / accept performance of annotations

Fixed

  • Count of annotations on the LabelAdmin

2021-02-15_18-56-51

Changed

  • Show category template as empty when actual empty (instead of displaying the first available template)

  • Improved Smartview performance by changing entity loading

Added

  • Project name added to SectionLabel in the AiModelAdmin

  • Assign user to documents (“Assignee”). Can be enabled in the ProjectSuperuserAdmin

  • Add status field to the AiModel (“Training”, “Failed”, “Done”)

  • Dont allow new retraining if there is a training in progress AiModel.

2021-02-13_18-18-52

Changed

  • Use annotation permalink in LabelAdmin

Fixed

  • OCR Read API did not use text embeddings when available

  • Files with misssing fonts could not be processed

  • Creation of small annotations when accepting or declining

2021-02-10_13-52-15

Added

  • Admin action for Microsoft Graph API / Planner API

Fixed

  • SuperUserDocumentAdmin performance

  • OutOfMemory errors in the categorization

2021-02-03_17-07-23

Added

  • Permalink for annotations

  • Add an additional routine to fix corrupted pds

  • Improved frontend error tracking

Fixed

  • Validation when edting an annotation

Changed

  • Renamed option ‘priority_ocr’ to ‘priority_processing’

  • Allow rerun extraction for documents with revised annotations

  • Allow deletion default templates

2021-01-26_18-07-11

Added

  • Add column ‘category’ to csv export

2021-01-20_11-17-24

Added

  • Show selection bounding boxes for automtic created annotations

2021-01-14_22-06-52

Added

  • Visual annotations: images and area can now be annotate

Fixed

  • Loading time for Smartview

2021-01-13_23-26-03

Fixed

  • Retraining now assigns AIModels to templates even if they was no before

Added

  • Add Message when doing evaluation which tells the user if test set is empty.

2021-01-12_21-13-48

Fixed

  • Google Analytics integration

  • Empty Textextraction for ParagraphExtractions

2021-01-10_18-36-49

Fixed

  • Disable link formatting by sendgrid.

2021-01-08_22-30-10

Fixed

  • Bbox calculation in ParagraphModel

  • Evaluation sometimes not running

  • Speedup annotation creating

2021-01-05_11-53-22

Changed

  • Two column Annotation selection is now possible

Added

  • ParagraphModel introduced in addition to the Extraction- & CategoryModels, this is set per project via the SuperUserDocumentAdmin.

  • Option to update the document document text, this is set per project via the SuperUserDocumentAdmin.

  • Document Segmentation API Endpoint

2020-12-22_19-04-04

Changed

  • Email Template are now managed within the application.

  • Major improvement and refactor in the underlying training package.

Fixed

  • Link to imprint on SignUp

  • Smartview when scrolling horizontally

2020-12-16_20-17-30

Added

  • Search for Smartview

Fixed

  • TemplateCreationForm does not allow to select parent template

2020-12-16_09-44-30

Added

  • Searchbar for SuperuserProjectAdmin

  • Add link to flower (task monitoring) for superusers

  • Add support for GoogleTag Manager

  • Create Support Ticket for Retraining and Invitation of new Users

Changed

  • Increase SoftTimeLimit for extraction (necessary for large documents >500 pages).

Fixed

  • Fix bbox generation fox Paragraph Annotations

  • Fixed Evaluation not triggered for new AiModels

2020-12-10_13-15-14

Added

  • Sentry error reporting for Javascript Frontend (i.e. Smartview)

  • Allow to add Project specific document CategorizationModel

Changed

  • Document Search now considers filenames and shows links to Dashhboard, Labeling and Smartview

  • Allow deletion of Labels

Fixed

  • Allow “None” as confidence for rule-base ExtractionModels

2020-12-01_21-08-32

Added

  • Proof of Concept Microsoft Graph API connection (for logged in users): app.konfuzio.com/graph

  • Button to upload demo Documents

  • SuperuserProjectAdmin added (same like previous ProjectAdmin, however only accessible for Superusers only)

  • Google Analytics Tag for app.konfuzio.com

Changed

  • Default permission Group “CanReadProject” replaced with “CanCreateReadUpdateProject”. New users can now create new Projects.

  • Project Page for “normal” user does not show technical fields like “ocr” and “text_layout” anymore.

  • Dont show file endings like ‘.pkl’ for AiModels

2020-11-26_19-43-14

Fixed

  • Missing bbox attribute in Document API (prevents retraining via training package)

  • Running of proper ExtractionModel in Multi-Document-Template project

  • Loading time for the Document page (still room for improvements)

Added

  • Slightly better Categorization model.

2020-11-20_20-05-47

Added

  • A public registration page: https://app.konfuzio.com/accounts/signup

  • A Internal registration page to create users manually and faster: https://app.konfuzio.com/register/ (you need to be logged in to see this page)

  • Users can invite new users to a project via “ProjectInvitations”

  • Password reset functionality

Fixed

  • The Smartview is much faster

  • Improved creation of Templates and additional validation logic template inconsistencies.

Changed

  • Save bbox and entity per page in order to improve performance

2020-11-09_18-04-28

Added

  • Support for more than one default Template in a project

  • Categorization for multi Template projects

  • Links to related models in the Project, AIModel, Label and Template view

  • Internal user registration form, app.konfuzio.com/register

Changed

  • AiModel belongs now to DefaultTemplates instead of project

2020-10-27_10-37-15

Changed

  • Documents are now soft-deleted. There is a hard delete option in the SuperuserDocumentAdmin.

  • AiModel are made active automatically for matching DefaultTemplates if the AIMode is better than before.

2020-10-21_08-53-42

Fixed

  • Loading time when updating a project.

2020-10-19_22-46-49

Changed

  • Increase max allowed workflow time from 90 to 180 seconds.

Fixed

  • sucess messages for ‘rerun_workflow’ admin action

  • loading time of AiModel

  • csv export

Added

  • add hocr fied to document api.

  • add a project option to hide the Smartview and Labeling tool.

2020-10-14_11-39-17

Changed

  • AIModel can be uploaded and evaluted before setting active for a project

2020-10-13_15-10-22

Added

  • Multilanguage Support (DE/EN) in the backend (actuall translation are not included yet)

Changed

  • ‘create_labels_and_templates’ is now a project option (false by default).

  • Gunicorn workers restart after 500 requests.

  • Flower dashboard is running in separated container now

Fixed

  • Fix upload_ai_model to upload files larger than 2GB

  • Loading speed for SequenceAnnotation Admin

2020-10-03_15-18-47

Fixed

  • Recover tasks in case celery worker crashes

2020-10-01_12-02-37

Fixed

  • Internet Explorer warning badge

  • ‘Not machine-readable’ was not detecting 0 as proper value for normalization.

Changed

  • Remove extraction count from AiModel admin.

  • Refactor annotation accept/delete buttons to separate components and SVG

2020-09-16_18-19-53

Added

  • Additional normalization formats

  • Sentry message if retraining is triggered.

  • Detectron (fully imlemented) and preparation for visual classification results in SuperUserDocumetAdmin

Changed

  • Dont raise sentry error if document got deleted during workflow

Fixed

2020-09-11_13-47-51

Added

  • Add sentry message if project retraining is triggered.

  • Fix cpu minute calculation.

2020-09-09_16-12-44

Added

Changed

  • Allow extractions which does not have an accuracy.

  • On the dashboard: Dont show section.position column if all extractions have the same. Dont show accuracy column if all extraction does not have one.

  • Dont show retraining webhook url (on the project detail page). Display is with **** like it is password.

2020-09-08_22-38-19

Added

  • Per-project measuring of cpu time.

  • Additional date-formats for normalization.

  • First draft of boolean-formats for normalization.

2020-09-08_09-17-00

Added

  • Document Filter added for ‘human feedback required’ and ‘100% machine readable.

  • Additional normalization formats for numbers.

  • Document Categorization Classifier added to DocumentSuperUserAdmin

Changed

  • For the document view and Smartview, rename ‘possibly incorrect’ to ‘not machine-readable’

  • For the document view and Smartview, rename ‘pending review’ to ‘require feedback’

  • For the document view, divide column NOTES into FEEDBACK REQUIRED and NOT MACHINE-READABLE

Fixed

  • Dont raise an error if ai_model predict section with a template that does not exist.

2020-09-07_17-48-22

Fixed