How to Automate Job Searches Using Named Entity Recognition

Jul 24, 2020

Have you ever found a job description that precisely matches your skills and education level? What makes a job posting more relevant than others? Most of today’s job search platforms rely on matching keywords from job descriptions to your profile without grasping the semantics and meaning of each word, thus decreasing the job search efficiency. For example, let’s say that your CV has the keyword “JavaScript” in it. A normal keyword search will yield results that match the exact same word, missing a significant number of job posts that do not have the exact keyword “JavaScript” but contain related keywords such as “Java”, “JS” “HTML”, and “CSS”.

I especially encountered this problem when I was job hunting after graduation; I received countless of irrelevant job emails from job search engines and spent hours online trying to manually look for more relevant jobs. On the other hand, employers face a similar problem when trying to hunt for good candidates. It is estimated that companies lose on average $14,900 on every bad hire and nearly 74% of employers say they hire the wrong person.

Wouldn’t it be nice to get job recommendations that precisely match the candidate profile to the job description? This is exactly what will be demonstrated in this tutorial using Named Entity Extraction. We have developed an easy to use text annotation tool at UBIAI to create the Named Entity Recognition (NER) model for entity extraction.

Named Entity Recognition for Entity Extraction

One way to get more relevant job recommendations is to classify words into categories such as Skills, Experience, Degree, etc. (entities) instead of searching for static keywords. Once entities are extracted from job descriptions, you can perform similarity analysis to your resume to get more relevant job recommendations. For this project, I used NER machine learning tool to extract relevant entities from job postings and from my resume. There are several NER tools available such as Stanford NER, NLTK, Spacy, etc. I chose the open source Spacy library because it is fast and has the highest accuracy as shown in the table below:

NLP models accuracy and speed comparison.

Scrape Job Postings

In order to train the Spacy model to extract entities, I needed to scrape data from various company websites and use the data as training material. In this project I focused on scraping engineering jobs (including Hardware engineering, Software Research, etc.) from various tech companies. The main information I extracted from the job postings are: Job ID, Job Title, Job Type and Job Description. Below is an example of the output file:

Scraped job description in excel.

Once the data (at least 200 job postings from tech companies) has been gathered, you are ready to train the model for entity extraction.

Job Model Training using Annotation Tool

I trained the model to extract 4 entities: DIPLOMA, DIPLOMA_MAJOR, SKILLS and EXPERIENCE. The first task is to annotate a few hundred job postings from various companies to use as training data. This part can be laborious and time consuming as you have to manually annotate thousands of words and sentences. Fortunately, at UBIAI we have developed an annotation tool that simplifies, minimizes and streamlines the annotation process as much as possible. The tool includes the following features:

Dictionary: Ability to automatically annotate words in the corpus using a word dictionary with their corresponding entities as defined by the user. You have the option to input a list of words with their corresponding entity types or create a regular expression pattern if your corpus contains repeating patterns such as: phone numbers, names, locations, etc.
Auto-Annotation: Automatically annotate repeating, annotated words in a document
Machine Learning Annotation: Continuously train an ML model to automatically annotate a document based on your previous annotations.
Annotation Metrics: Visualize annotation statistics such as: 1) number of documents annotated, 2) entity distribution across documents, and 3) word distribution within each entity type across the document. This will be helpful for tracking the annotation progress and spot under-sampled or over-sampled entities.

I started by annotating a few dozen of job postings from multiple companies as shown below (documentation on how to use the tool can be found here). It’s important to equally annotate documents across multiple job postings from different companies if you want to create an accurate NLP model as each company has different job description styles.

UBIAI annotation page.

To speed up the annotation process, we use the Dictionary feature available in the tool. In below table, words were input in the dictionary with their associated label as shown below:

Excel lookup table for dictionary auto-annotation

In addition to manual word input, you can use regular expressions to capture repeating patterns in documents. For example, by using the regex r”[0–9][+].*?[.]” you annotate sentences starting with numbers [0–9] followed by + and ends with “.” such as “2+ years experience in the areas of machine learning, information retrieval, natural language processing or data mining.” The tool will then skim through all the documents and automatically annotate the words with their labels, speeding up the annotation process.

Dictionary inputs for auto-annotation.

Auto-annotation with regex.

UBIAI offers a collaboration among team members (read here) to facilitate the annotation process.

I would recommend tracking entity statistics during the annotation process to prevent biased annotation. Below is the entity distribution across 100 documents:

Entities distribution across annotated documents.

Most of the entities in job descriptions are skewed toward entity SKILLS. This is not ideal for training purposes as you want more homogeneous entity distribution. So in this case, you should annotate more documents having EXPERIENCE, DIPLOMA, and DIPLOMA_MAJOR entities.

A second useful feature in the tool is the auto-annotation using an ML model. Here, I use the manually annotated documents as training data to create an ML model that automatically pre-annotates the documents (details can be found in Documentation). The ML model supports multiple languages such as English, Spanish, Arabic, etc. Once you choose the language, press the “Create Model” button.

The model will then be initiated but not trained. To train the NLP model, select your desired project.

*Note: you have the option to annotate the rest of the unannotated documents once the model is trained by checking the “Annotate your document after finish train” box.

Model training settings.

Below is the result of the ML auto-annotation trained on only 100 documents. The job description used for testing was from a different company and job description that was never seen by the model.

While we can see that the auto-annotation made a few errors on entities — e.g. missing “Msc” as a DIPLOMA — overall we got almost 70% success rate. In fact, this is a very respectable score given that we only annotated 100 documents. To improve the success rate further, we take the already auto-annotated documents and correct them manually. We then re-train the model using the corrected and the original documents. We iterate this process few times until we achieve a good success rate.

I decided to stress test our model and try to annotate jobs from a different field (unrelated to hardware or software engineering). Below is the auto-annotation result on a job description in the apparel industry. The model made more mistakes as compared to the engineering job description. For example, it mixed up the DIPLOMA_MAJOR entity with the SKILLS entities and missed one EXPERIENCE entity. That being said, we were able to extract the DIPLOMA entities correctly as well as most of the skills.

Job description from a different domain using ML auto-annotation.

Conclusion

Implementing NLP in job search queries will not only speed up the search process but provide more accurate results, creating streamlined job/candidate matching for both employer and employee. In this tutorial, we visited the annotation process required to train an NER model using the new annotation tool UBIAI. We successfully extracted relevant entities — such as Skills, Experiences, Diploma and Diploma Major — with just 100 training documents. In part 2, we will leverage the power of entity similarity to get job recommendations based on extracted entities from job descriptions and applicants’ resumes. See you there!

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset

Fine-Tuning Strategies and Practical Applications

How to Automate Job Searches Using Named Entity Recognition

Jul 24, 2020

Named Entity Recognition for Entity Extraction

Scrape Job Postings

Job Model Training using Annotation Tool

Conclusion

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

How to Automate Job Searches Using Named Entity Recognition

Jul 24, 2020

Named Entity Recognition for Entity Extraction

Scrape Job Postings

Job Model Training using Annotation Tool

Conclusion

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

Fine Tuning LLMs on Your Own Dataset