ubiai deep learning

Introducing UBIAI easy to use text annotation for NLP applications

Sep 4, 2020

Whether it’s entity recognition, chatbot training, entity sentiment analysis or text classification, etc., annotating text to train and fine tune a model for your own use is crucial. Therefore, choosing the right annotation tool with low UI friction and maximum automation is of the utmost importance.

Today, we introduce a new text annotation tool for NLP of UBIAI that offers easy-to-use UI, multilingual support (including Arabic and Chinese) complemented with auto-annotation functionality.

UBIAI Intro

Multilingual Support

UBIAI supports multiple language annotation with specific tokenization for each language, for example Arabic tokenization is different than Chinese which is different than English.

NLP data training platform free On-Premise & cloud packages

Annotated English Document

NLP data training platform free On-Premise & cloud packages

Annotated Arabic Document

NLP data training platform free On-Premise & cloud packages

Chinese Document

When creating a new project, simply specify the language and upload your documents. Your documents will then be automatically tokenized depending on the chosen language.

Multiple Upload Format

Only very few annotation tools offer the flexibility to upload documents in different formats. UBIAI offers multiple upload formats:

 

  1. TXT,PDF, HTML and DOCX
  2. JSON: you can upload a JSON file with existing entities. This is useful if you have a pre-annotated JSON file that you would like to import to continue the annotation
  3. CSV: you can upload a csv file containing one document per row. This is useful to upload documents in bulk
  4. ZIP: you can upload a zip file containing TXT, PDF or HTML. This is useful to upload documents in bulk
NLP data training platform free On-Premise & cloud packages

Multiple Upload Formats

Intuitive UI

The text annotation for NLP interface is the core of any annotation tool as it is where the annotator spends the majority of his/her time. Having a seamless, easy-to-use, low friction interface is a must.

NLP data training platform free On-Premise & cloud packages

UBIAI provides a sleek interface with real time auto-saving during annotation. In addition, with auto-detection enabled, the tool will search and annotate similar words as soon as you highlight a specific word.

Pre-annotation of text annotation for NLP

Dictionary
For each entity type you can associate one or more dictionary to automatically recognize and annotate words contained in said dictionary. You can either input the dictionary element manually or upload a csv list containing all the associated words with their corresponding entity type (see example below):

NLP data training platform free On-Premise & cloud packages

CSV Dictionary

Rule Based Matching:

 

With rule based matching you will be able to pre-annotate your documents instantly using a combination of multiple tags such as Part Of Speech (POS), regular expressions, patterns (email, number, phone number, etc…). The list of all the possible attributes with their description can be found in the documentation.

NLP data training platform free On-Premise & cloud packages

Rule Based Matching

Machine Learning Auto-Annotation:

 

In order to speed up the annotation process, UBAI offers the ability to auto-annotate your documents using a spaCy model. All you have to do is:

 

  1. Select project from which the training corpus will be used
  2. Select a pre-trained model, you have the option to start from a blank or a pre-trained English model en_core_web_sm.
  3. Select the training/evaluation partition from the annotated data to train/evaluate the model.
  4. Configure the training by specifying the number of iterations (default is 10), dropout and batch size.
  5. You have the option to auto-annotate your document after the model finishes training by checking the “Annotate Your Documents after Finish Train” button. Note: For efficient model training, it is recommended to annotate at least 10% of your total documents.
NLP data training platform free On-Premise & cloud packages

Model Training Interface

After training, UBIAI will directly evaluate the model based on the train/validation partition. The precision, recall, and F score for each entity will be displayed:

To track model performance over time, press on the “view log” button below the model name.

NLP data training platform free On-Premise & cloud packages

Model Performance

Multiple Export Formats:

The main limitation of the existing annotation tools is the limited amount of annotation exports. With UBIAI, you have the option to export your annotation to the following formats:

  1. Amazon Comprehend format (see tutorial here)
  2. JSON format
  3. SpaCy format
  4. Stanford CoreNLP format
  5. IOB format including IOB Part Of Speech (POS) and IOB Chatbot
  6. Stanford CoreNLP format

A zip file containing the annotation along with the documents used during annotation will be downloaded; you will need to unzip the file before using the annotation to train a model. Below is an example of the annotation export using the IOB POS:

NLP data training platform free On-Premise & cloud packages

IOB POS Export

Real Time Analysis:

With real time analysis, you will be able to test your trained model on the spot without leaving the tool. This is useful to quickly check the performance of the model on real production text.

NLP data training platform free On-Premise & cloud packages

Real Time Analysis Entity Extraction

Team Collaboration:

It is needless to say that team collaboration is essential to not only speed up the annotation process but it also mitigates annotator bias by leveraging group annotation to infer the underlying truth.

UBIAI offers the option to collaborate with team members easily by creating collaborations for your projects.

NLP data training platform free On-Premise & cloud packages

Collaboration Platform

Final Note

Our mission at UBIAI is to make easy-to-use Natural Language Processing (NLP) tools to help developers and companies try out machine learning ideas quickly and apply them to real world problems without wasting time in coding. We believe that providing accessible and low cost tools will democratize NLP across the globe and allow for better and intelligent decision making. To this effect, we are committed to offering the tool with full features free of charge to researchers and people in academia.

 

We are constantly improving the tool and in need of beta testers. Please give it a try at https://ubiai.tools and give us your feedback!