Join our new webinar “Harnessing AI Agents for Advanced Fraud Detection” on Feb 13th at 9AM PT  ||  Register today ->

ubiai deep learning
3D-rendering-of-artificial-neural-network

Comparing gliNER with LLM zero- shot labeling for Named Entity Recognition

April 18th, 2024

Named Entity Recognition (NER) plays a crucial role in various natural language processing tasks by identifying and categorizing entities such as person names, organizations, and locations within textual data. Two prominent approaches in the realm of NER, gliNER and LLM zeroshot labeling, offer distinct methodologies for entity extraction. gliNER relies on traditional text processing techniques and linguistic rules, while LLM zero- shot labeling leverages large pretrained language models to perform NER without the need for finetuning on specific entity types. In this comparative analysis, we delve into the operational principles, applications, and advantages of both approaches to provide insights into their suitability for different use cases.

Understanding gliNER

Definition of gliNER and its operational principles

gliNER is a Named Entity Recognition (NER) system that relies on traditional text processing and linguistic modeling techniques. It operates by analyzing the lexical and grammatical context of a text to identify and classify named entities such as person names, organizations, locations, etc. The gliNER process typically involves the use of linguistic rules and lexicons to detect entities

Techniques Utilized by gliNER :

Grammatical tagging: gliNER uses grammatical tagging to label each word in the text with information such as its grammatical class (noun, verb, adjective, etc.), which helps to understand the syntactic structure of the text

 

Dependency parsing: This technique allows gliNER to analyze dependency relationships between words in a sentence, helping to determine which words are related to which entities

 

Use of lexicons and linguistic rules: gliNER relies on word lexicons and linguistic rules to detect named entities. These lexicons contain lists of words that match specific entities, while linguistic rules define syntactic patterns for identifying entities in the text

→By combining these techniques, gliNER achieves accurate recognition of named entities in various textual contexts, making it a versatile tool for many applications in the field of natural language processing.

Illustration through Example:

Concrete Example of gliNER’s Application in an NER Context To demonstrate the practical application of gliNER in a Named Entity Recognition (NER) context, consider the following scenario

Scenario: A news agency wants to analyze a collection of news articles to extract named entities such as person names, locations, and organizations for further analysis and categorization

Usage of gliNER

Data Preparation: The news articles are preprocessed to remove noise and irrelevant information, leaving only the text content to be analyzed

Application of gliNER: gliNER is applied to the preprocessed news articles to identify and extract named entities present within the text. This involves utilizing gliNER’s linguistic modeling techniques and rule- based approaches to recognize entities based on context and linguistic patterns

Named Entity Extraction: Once gliNER has processed the news articles, it generates a list of identified named entities along with their corresponding entity types (e.g., person, location, organization)

Analysis and Categorization: The extracted named entities are then analyzed and categorized based on their relevance and importance to the news agency’s objectives. This may involve further processing or filtering based on specific criteria or requirements

Practical Demonstration

To demonstrate the practical usage of gliNER for Named Entity Recognition (NER), we begin by installing the gliNER package:

				
					!pip install gliNER 
				
			

This code snippet below imports the GLINER module and initializes a pre- trained model for named entity recognition. It puts the model into evaluation mode and confirms successful initialization by printing ok.

				
					model = GLINER.from_pretrained ("urchade/gliner_mediumv2.1") 
model.eval() 
print("ok") 

				
			

Once gliNER is installed, we can proceed to import the module and apply it to extract named entities from a sample news article:

				
					import gliNER 
# Sample news article text news article 
= 
"President Joe Biden announced new measures to combat climate cha 
# Apply gliNER to extract named entities 
extracted entities 
= 
gliNER.detect entities (news article) 
# Display the extracted entities 
print ("Named Entities Detected:") 
for entity, entity_type in extracted entities. items (): 
print (f" (entity): {entity_type}") 
				
			

Finally, we extract these entities, demonstrating the model’s capability to predict named entities across multiple examples with consistency and reliability:

				
					text 
- 
Editing Comparing gliNER with LLM zero-shot labeling for Named Entity Recognition - Medium 
President Joe Biden announced new measures to combat climate change in Washington, DC.' 
labels 
= 
["person", "LOCATION"] 
entities = model.predict_entities (text, labels, threshold=0.4) 
for entity in entities: 
print(entity["text"], "=>", entity["label"]) 
President Joe Biden => person Washington, DC => LOCATION 
				
			

Now, we delve deeper into the illustration with a slightly more complex example: CV extraction. The code for this example is provided in the  notebook below 

				
					Successfully set up a new entity and worked closely with legal and external bodies. 
Managed the recruiting process internally - interviews / 2nd interviews / trials etc. 
• wrote and delivered weekly training sessions for sales team. 
SKILLS 
Communication 
HR Policies 
• Management 
" 
Leadership 
English: fluent 
French: native speaker 
Spanish: beginner level 
• Horse riding 
Going to the theatre 
• Gardening 
labels = ["NAME", "NUMBER" "SKILLS" "LANGUAGES" 
3 
J 
"HOBBIES", "EXPERIENCE"] 
entities = model.predict_entities (text, labels, threshold=0.4) 
for entity in entities: 
print(entity["text"], "=>", entity["label"]) 
BEMMA SMITH => NAME 
English => LANGUAGES French => LANGUAGES Spanish => LANGUAGES 
Horse riding => SKILLS 
Going to the theatre => HOBBIES 
Gardening => SKILLS 

				
			

When applying the gliNER model, it is observed that many entities are not extracted.

→ In our first example, where we seek to extract two entities from a single dataset, gliNER is utilized, yielding accurate and precise results. However, in the context of CV extraction, where the document encompasses complex structures and extensive data, attempts to extract multiple entities prove unsuccessful in capturing them all

4.Examples of Use Cases Where gliNER Excels : Concrete Applications of gliNER Across Different Domains

gliNER’s versatility extends to various domains, including

Medical: In healthcare, gliNER can swiftly extract crucial information such as drug names, medical conditions, and physician names from patient records, enhancing data analysis and medical research endeavors

Finance: Within the financial sector, gliNER proves invaluable in parsing through vast amounts of financial reports to accurately identify company names, monetary figures, and pertinent financial metrics, thereby aiding in investment decisions and market analysis

Sentiment Analysis: For sentiment analysis tasks, gliNER facilitates the extraction of key entities from customer feedback, social media posts, and online reviews, allowing businesses to gain deeper insights into consumer sentiment and market trends.

Illustration of Specific Use Cases Where gliNER IS Particularly Effective

gliNER’s efficacy is exemplified in numerous scenarios, including

Social Media Monitoring: gliNER adeptly identifies influential figures and trending topics in social media discussions, enabling organizations to track online conversations and gauge public sentiment effectively

Geospatial Analysis: By accurately recognizing location names in news articles and other textual sources, gliNER enables geospatial analysis to map the geographic spread of events and trends, aiding in crisis response efforts and strategic planning

Financial Data Parsing: In financial analysis, gliNER excels at parsing complex financial documents to extract critical information such as company names, stock ticker symbols, and numerical data, streamlining the process of financial modeling and investment analysis

These examples underscore gliNER’s versatility and effectiveness across diverse application domains, making it a valuable tool for extracting insights from unstructured text data

II. Understanding LLM Zero-Shot Labeling

1.Introduction to LLM Zero-Shot Labeling:

LLM (Language Model) ZeroShot Labeling is a stateoftheart approach that harnesses the power of large pretrained language models, such as GPT (Generative Pretrained Transformer), to perform named entity recognition (NER) without the need for finetuning on specific entity types. This methodology operates by leveraging the comprehensive language understanding capabilities of pre-trained models to recognize named entities in a “zero-shot” manner, meaning it can identify entity types not explicitly seen during training.

 

2.Definition of LLM Zero-Shot Labeling Approach:

The zeroshot methodology in NER involves utilizing a pretrained language model to predict entity labels without prior training on specific entity types. Instead of finetuning the model on labeled data for each entity type, zero- shot labeling enables the model to generalize its understanding of named entities based on context and linguistic patterns learned during pre- training

This approach is particularly advantageous in scenarios where labeled data 

for finetuning is scarce or where the entity types of interest are diverse and constantly evolving. By harnessing the inherent knowledge encoded within pretrained language models, LLM zeroshot labeling offers a flexible and efficient solution for named entity recognition tasks across various domains and entity types.

3.Key Features of Zero-Shot Labeling:

Flexibility: Zeroshot labeling enables models to generalize their 

understanding of named entities, allowing them to predict entity labels across a wide range of categories without explicit training

Efficiency: By eliminating the need for finetuning on specific entity types, zeroshot labeling streamlines the model development process, reducing time and resources required for training

Versatility: Zeroshot labeling can be applied to various NER tasks, spanning different domains and languages, making it a versatile solution for diverse applications

4.Applications of Zero-Shot Labeling:

Multidomain NER: Zeroshot labeling is effective in multidomain NER tasks where entity categories may vary widely across different contexts

Lowresource Settings: In scenarios with limited annotated data, zeroshot labeling offers a practical solution for training NER models without extensive labeled datasets

Emerging Entity Types: Zeroshot labeling excels in recognizing emerging entity types that may not have been explicitly seen during model training, enabling adaptability to evolving data.

5.Illustration through Example:

The code below exemplifies the application of the Hugging Face Transformers library for zeroshot named entity recognition (NER). This approach enables users to extract named entities from text across diverse domains and languages without the need for predefined labels. By leveraging this method, valuable insights can be gained for tasks like information extraction, text analysis, and natural language understanding.

				
					from transformers import pipeline 
# Définir le texte d'entrée 
texte = 
Président Joe Biden a annoncé de nouvelles mesures pour lutter contre le changement climatique à Washington, DC. 
etiquettes ["o", "PERSON", "LOCATION"] 
pipeline_ner = pipeline("zero-shot-ner", model="valhalla/zero-shot-ner", token="xyzyKACXmDZLqwSGbqhjZCxWWAkrGpisLFr") 
entites 
= 
pipeline_ner(texte) 
for entite in entites: 
print(entite["entity_group"], "=>", entite["word"]) 
PER => Joe Biden 
LOC=> Washington 
LOC => DC 

				
			

Using UBIAI Annotation Tool

To further elucidate the concept of zeroshot labeling, we can demonstrate 

its application using the UBIAI Annotation Tool. UBIAI presents a cutting- edge solution tailored for the intricate task of Named Entity Recognition (NER) in Natural Language Processing (NLP). With its suite of auto annotation tools, UBIAI streamlines the data annotation process essential for NER model training. The platform boasts advanced features, including

AIpowered autolabeling, Optical Character Recognition (OCR) annotation for extracting text from diverse sources like images and PDFs, and multi- lingual support to cater to linguistic diversity. Its versatility extends across various industries, from healthcare to finance, making it a goto tool for NER dataset preparation. UBIAI’s userfriendly interface and robust functionalities ensure efficiency and accuracy, empowering data scientists and AI developers to expedite NLP model training with confidence

In starting the project, we proceed through a sequence of steps, starting with naming the project and progressing until the identification of the entities to be extracted and loading the data



After loading the data, our approach entails systematically extracting all desired entities by prompting GPT

In the entity list, we have the option to include a description for each label indicating what we aim to extract. Once defined, we can simply click saveto confirm the changes.

We return to the annotation interface and click on predict,after which all entities are extracted.

In the CV extraction example with UBIAI, we utilize ChatGPT for prompting, where it attempts to extract all entities from an unstructured dataset, even across various types of CVs

III. Comparison Insights

Below are the performance metrics of the gliNER model: it achieved moderate precision, recall, and F1 score, indicating its capacity to accurately identify and label entities in the assessed dataset. Nevertheless, there remains scope for enhancing its overall performance.

When evaluating our zeroshot labeling model, we obtained satisfactory results. The zeroshot LLM model demonstrates outstanding performance with a precision of 0.94, recall of 0.97, and F1 score of 0.95, highlighting its remarkable accuracy in entity recognition tasks.

				
					precision = true_positives / (true_positives + false_positives) recall = true_positives / (true_positives + false_negatives) f1_score = 2 * (precision * recall) / (precision + recall) 
# Display model performance print("Model Performance:") print("Precision: ", precision) 
print("Recall:", recall) print("F1 Score:", f1_score) 

----------
Model Performance: 
Precision: 0.9375 
Recall: 0.967741935483871 
F1 Score: 0.9523809523809523 


				
			

 

Overall, the zeroshot LLM model outperforms the gliNER model, making it a preferred choice for organizations and researchers requiring highperformance entity labeling models

In assessing gliNER and LLM zeroshot labeling, it’s discernible that gliNER demonstrates efficacy on less complex examples, as evidenced in the first example.

 

However, its performance falters when confronted with more intricate tasks, such as CV extraction, where it proves less effective. Notably, gliNER’s reliance on traditional text processing and linguistic rules may constrain its adaptability and accuracy in handling diverse data sets

 

Conversely, LLM zeroshot labeling presents notable advantages. Leveraging large pretrained language models, it offers versatility and efficiency in NER tasks, without the need for finetuning on specific entity types.

 

This approach excels in scenarios where labeled data is limited or where entity types are evolving, providing adaptability and ease of use

 

Considering these factors, particularly the prevalence of LLM zeroshot labeling in NLP applications and annotation tools, it is recommended to prioritize its adoption. Its widespread usage and compatibility with various tasks make it a compelling choice for practitioners seeking streamlined and effective NER solutions.

 

Therefore, for optimal results and enhanced productivity, LLM zeroshot labeling emerges as the preferred option

Conclusion

In conclusion, the comparison between gliNER and LLM zeroshot labeling reveals two effective methodologies for Named Entity Recognition (NER). While gliNER offers finegrained control and customization, catering well to users with linguistic expertise and specific domain requirements, LLM zero- shot labeling provides a more intuitive and userfriendly solution, particularly beneficial in applications without coding prerequisites. By comprehensively understanding the strengths and applications of both approaches, practitioners can make informed decisions to address their NER objectives effectively.

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !