All notable changes in the server of app.konfuzio.com will be documented in this file.
Speed up runtime of Extraction AIs
Fix an issue which causes some Extraction AIs to crash on multipage documents.
Fix an issue that prevents the calculation of bounding boxes for small or slightly rotated characters.
Allow to set a default assignee for uploaded documents
Allow to notify users via email when they get assigned to documents
Top annotation filter in the SmartView now takes accepted Annotations into account
Errors messages in case a document could not be processed are now displayed correctly
New Extraction AIs are saved in a more efficient way
Show the user who started an AI training on the detail page of an AI
Allow to set a time (in days) after which documents are automatically deleted
Allow to rotate pages via API
Add thumbnail images for document pages
Links to deleted annotation will now redirect to the respective document
This version uses Konfuzio Trainer in version v.0.3.15 and the Konfuzio Python SDK in version v.0.1.15.
Option to enforce running OCR even if text embeddings are present
Improved error messages in case a document cannot be processed
Option to exclude email content when using the email-integration
Option to make document accessible via public link
Beta Version of APIV3
Beta Version of new document dashboard (bases on Konfuzio-Capture-Vue)
Automatic rotation of pages (#8980)
For on-premise Users, now the Postgresql 10 is the minimum version
Improved Extraction AI
On-Premise container run now as non-root and using read-only fileystem
Improved mouse pointer in the Smartview
An issue where empty Annotation Sets could appear on Documents
An issue where conflicting annotions could be created
An issue where negative annotations where not correctly being deleted (#9127)
Rare cases where OCR text included some characters mutiple times
Add assignee attribute of a Document to the API
“Rerun extraction” via the user interface applies new annotations now also to training and test documents
Add option to filter for related annotation sets
Sorting of annotation sets in the csv export
Document API endpoint returning declined annotations
Added api_name to Label API
Link to documentation page
Missing translation on document list page
Evaluation did not complete for AIs with a large amount of training data
For on-premise installations, the OCR method for new projects is choosen based on the available OCR solutions.
For on-premise installations, the project import considers now declined annotations
For on-premise installations, Superusers can see the Konfuzio Server version and how many pages and documents have been processed.
Text summarization endpoint.
Categorization AI parameters in the Project view
An issue where the reload after uploading new documents does not happen
“Sentence” option to the available detection modes
An error where an invalid date in the document text stoppped the training process
E-mails without an attachment have not been processed.
CSV export for ProRis by Inveos
Allow deletion of characters of an annotation without excluding it from the training process
An option to specify the category of a document when uploading it via API (and thereby skipping the categorization)
The GET document API endpoint now returns the annotation displayed in the SmartView (instead of only showing the extraction AI results)
CSV export compatible with ProRis by Inveos
Improve detection of annotations which consist of multiple words
Date filtering for project documents API endpoint
Filtering of labels and label sets according to the category of a document (in the SmartView)
Selection of characters in SmartView incomplete when editing an annotation
Dark Mode setting of browser not compatible with Konfuzio Server
Some case where the document list was not reloaded automatically
More advanced task priorities and improved worker ressource usage
Auto-reload of new uploaded documents
Evaluation does not complete if no test documents are specified
Formatting of the “Check your browser” page for logged out users.
Adding of categories to existing label sets
Migration scripts for user permissions and e-mail templates
Support for SMTP e-Mail backends via environment variables
DOS protection prevents start of Konfuzio server
Autosave for any change on the document list page
German language support
Finetuning of exctraction AIs via parameters
New fields AI quality and data quality
More detailed evaluation
Description Field for extraction AI, label sets, categorization AI, categories
Rename project inviations to members
Rename the dataset status form “OCR Error” to “Excluded”
Start training per extraction AI
Get more insights via the document detail page
Deactive adoption of template settings according to AI model if not explicitly allowed.
Maximum number of pages per document
Slow processing of extraction tasks
Evaluation when multiple annotations are present
Make word-based tokenizer the default for new projects
Usage of word-base tokenizer
Edited annotation were excluded from the training process
Support to reuse label sets across categories
Allow “rerun extraction” on test and training documents
Remove “project statistic csv export” as it is redundant to document csv export
Include evaluation for training data in the AI model evaluation report
Fixed a bug where the EXIF attribute orientation corrupted the bounding boxes images
“accept top annotations” does not update human created annotations
Rate limits for task system
HTTP codes to API interface
Content type description for some API endpoints
A experimental version of a training health report
Failed retraninings for some projects
Increased disk usage due to an cache deletion issue
Filtering of project invotations according to currently selected project
Clarify return types in API documentation
Show confidence for categorization results
Show evaluation of categorization Ai models
Track version (number of retrainings) for all Ai models
Track project and template origin of AiModel
Use business evaluation implementation from training package
Loading time for CSV export evaluation reduced by saving it in the database.
Global project switcher
“Top candidates” filter in SmartView
“Change dataset” functionality in SmartView
Landing page in case the user has no projects (i.e. just registered)
Language switcher (not enabled yet)
Initial support for German translations (not enabled yet)
Label threshold is now limited from 0.0 to 1.0
New design for login/signup/reset password pages
Design improvements in the control panel and SmartView
New logo and favicon
API documentation has been improved with types and examples; is now based on OpenAPI 3
Updated frontend dependencies and tooling
copy_extraction_as_annotationand related functions have been removed
Calculation of true negative when using multiple templates.
Filter for top annotations in SmartView
Dont allow training if there are no training documents
Protect signup with captcha
Editing of annotation if there are already declined annotations.
Add label creation endpoint
Token-based authentication for the API
Show Django sidebar in Smartview and template view.
Save extraction results in a more efficient way.
Show a warning if an annotation with a custom offset string is created
Shwo loading indicator in the smartview search
Default template dropdown sometimes disabled when creating a Template
Rare case where the document list could not be loaded
Add option to accept all annotations.
Option to retrain project categorization model
Improved OCR settings
System check page https://app.konfuzio.com/check/
Confirmation message when deleting labels
Performance of csv export
Delete old unrevised annotations when rerunning AiModel.
Option to select tokenizer for training (ProjectAdmin)
Option to add training parameters (SuperuserProjectAdmin)
Set a documents category_template on new documents if there is only one category_template available
Improved delete / accept performance of annotations
Count of annotations on the LabelAdmin
Show category template as empty when actual empty (instead of displaying the first available template)
Improved Smartview performance by changing entity loading
Project name added to SectionLabel in the AiModelAdmin
Assign user to documents (“Assignee”). Can be enabled in the ProjectSuperuserAdmin
Add status field to the AiModel (“Training”, “Failed”, “Done”)
Dont allow new retraining if there is a training in progress AiModel.
Use annotation permalink in LabelAdmin
OCR Read API did not use text embeddings when available
Files with misssing fonts could not be processed
Creation of small annotations when accepting or declining
Admin action for Microsoft Graph API / Planner API
OutOfMemory errors in the categorization
Permalink for annotations
Add an additional routine to fix corrupted pds
Improved frontend error tracking
Validation when edting an annotation
Renamed option ‘priority_ocr’ to ‘priority_processing’
Allow rerun extraction for documents with revised annotations
Allow deletion default templates
Add column ‘category’ to csv export
Show selection bounding boxes for automtic created annotations
Visual annotations: images and area can now be annotate
Loading time for Smartview
Retraining now assigns AIModels to templates even if they was no before
Add Message when doing evaluation which tells the user if test set is empty.
Google Analytics integration
Empty Textextraction for ParagraphExtractions
Disable link formatting by sendgrid.
Bbox calculation in ParagraphModel
Evaluation sometimes not running
Speedup annotation creating
Two column Annotation selection is now possible
ParagraphModel introduced in addition to the Extraction- & CategoryModels, this is set per project via the SuperUserDocumentAdmin.
Option to update the document document text, this is set per project via the SuperUserDocumentAdmin.
Document Segmentation API Endpoint
Email Template are now managed within the application.
Major improvement and refactor in the underlying training package.
Link to imprint on SignUp
Smartview when scrolling horizontally
Search for Smartview
TemplateCreationForm does not allow to select parent template
Searchbar for SuperuserProjectAdmin
Add link to flower (task monitoring) for superusers
Add support for GoogleTag Manager
Create Support Ticket for Retraining and Invitation of new Users
Increase SoftTimeLimit for extraction (necessary for large documents >500 pages).
Fix bbox generation fox Paragraph Annotations
Fixed Evaluation not triggered for new AiModels
Allow to add Project specific document CategorizationModel
Document Search now considers filenames and shows links to Dashhboard, Labeling and Smartview
Allow deletion of Labels
Allow “None” as confidence for rule-base ExtractionModels
Proof of Concept Microsoft Graph API connection (for logged in users): app.konfuzio.com/graph
Button to upload demo Documents
SuperuserProjectAdmin added (same like previous ProjectAdmin, however only accessible for Superusers only)
Google Analytics Tag for app.konfuzio.com
Default permission Group “CanReadProject” replaced with “CanCreateReadUpdateProject”. New users can now create new Projects.
Project Page for “normal” user does not show technical fields like “ocr” and “text_layout” anymore.
Dont show file endings like ‘.pkl’ for AiModels
Missing bbox attribute in Document API (prevents retraining via training package)
Running of proper ExtractionModel in Multi-Document-Template project
Loading time for the Document page (still room for improvements)
Slightly better Categorization model.
A public registration page: https://app.konfuzio.com/accounts/signup
A Internal registration page to create users manually and faster:
https://app.konfuzio.com/register/(you need to be logged in to see this page)
Users can invite new users to a project via “ProjectInvitations”
Password reset functionality
The Smartview is much faster
Improved creation of Templates and additional validation logic template inconsistencies.
Save bbox and entity per page in order to improve performance
Support for more than one default Template in a project
Categorization for multi Template projects
Links to related models in the Project, AIModel, Label and Template view
Internal user registration form, app.konfuzio.com/register
AiModel belongs now to DefaultTemplates instead of project
Documents are now soft-deleted. There is a hard delete option in the SuperuserDocumentAdmin.
AiModel are made active automatically for matching DefaultTemplates if the AIMode is better than before.
Loading time when updating a project.
Increase max allowed workflow time from 90 to 180 seconds.
sucess messages for ‘rerun_workflow’ admin action
loading time of AiModel
add hocr fied to document api.
add a project option to hide the Smartview and Labeling tool.
AIModel can be uploaded and evaluted before setting active for a project
Multilanguage Support (DE/EN) in the backend (actuall translation are not included yet)
‘create_labels_and_templates’ is now a project option (false by default).
Gunicorn workers restart after 500 requests.
Flower dashboard is running in separated container now
Fix upload_ai_model to upload files larger than 2GB
Loading speed for SequenceAnnotation Admin
Recover tasks in case celery worker crashes
Internet Explorer warning badge
‘Not machine-readable’ was not detecting 0 as proper value for normalization.
Remove extraction count from AiModel admin.
Refactor annotation accept/delete buttons to separate components and SVG
Additional normalization formats
Sentry message if retraining is triggered.
Detectron (fully imlemented) and preparation for visual classification results in SuperUserDocumetAdmin
Dont raise sentry error if document got deleted during workflow
Add sentry message if project retraining is triggered.
Fix cpu minute calculation.
Allow extractions which does not have an accuracy.
On the dashboard: Dont show section.position column if all extractions have the same. Dont show accuracy column if all extraction does not have one.
Dont show retraining webhook url (on the project detail page). Display is with **** like it is password.
Per-project measuring of cpu time.
Additional date-formats for normalization.
First draft of boolean-formats for normalization.
Document Filter added for ‘human feedback required’ and ‘100% machine readable.
Additional normalization formats for numbers.
Document Categorization Classifier added to DocumentSuperUserAdmin
For the document view and Smartview, rename ‘possibly incorrect’ to ‘not machine-readable’
For the document view and Smartview, rename ‘pending review’ to ‘require feedback’
For the document view, divide column NOTES into FEEDBACK REQUIRED and NOT MACHINE-READABLE
Dont raise an error if ai_model predict section with a template that does not exist.