Building a Knowledge Graph for Job Search using BERT Transformer
May 17, 2021
Introduction
While the natural language processing (NLP) field has been growing at an exponential rate for the last two years — thanks to the development of transfer based models — their applications have been limited in scope for the job search field. LinkedIn, the leading company in job search and recruitment, is a good example. While I hold a PhD in Material Science and a Master in Physics, I am receiving job recommendations such as Technical Program Manager at MongoDB and a Go Developer position at Toptal which are both web developing companies that are not relevant to my background. This feeling of irrelevancy is shared by many users and is a cause of big frustration.
Job seekers should have access to the best tools to help them find the perfect match to their profile without wasting time in irrelevant recommendations and manual search…
In general, however, traditional job search engines are based on simple keyword and/or semantic similarity that are usually not well suited to providing good job recommendations since they don’t take into account the interlinks between entities. Furthermore, with the rise of Applicant Tracking Systems (ATS), it is of utmost importance to have field-relevant skills listed on your resume and to uncover which industry skills are becoming more pertinent. For instance, I might have extensive skills in Python programming, but the job description of interest requires knowledge in Django framework, which is essentially based on Python; a simple keyword search will miss that connection.
In order to train the NER and relation extraction model, we obtain the training data using the UBIAI tool and model training on google colab as described in my previous article.
In this tutorial, we will build a job recommendation and skill discovery script that will take unstructured text as input, and will then output job recommendations and skill suggestions based on entities such as skills, years of experience, diploma, and major. Building on my previous article , we will extract entities and relations from job descriptions using the BERT model and we will attempt to build a knowledge graph from skills and years of experience.

Job analysis pipeline
In order to train the NER and relation extraction model, we performed text annotation using the UBIAI text annotation tool where the annotated data was obtained by labeling entities and relations. Model training was done on google colab as described in my previous article.
Data Extraction:
For this tutorial, I have collected job descriptions related to software engineering, hardware engineering, and research from 5 major companies: Facebook, Google, Microsoft, IBM, and Intel. Data was stored in a csv file.
In order to extract the entities and relations from the job descriptions, I created a Named Entity Recognition (NER) and Relation extraction pipeline using previously trained transformer models (for more information, check out my previous article ). We will store the extracted entities in a JSON file for further analysis using the code below.
def analyze(text): experience_year=[]
experience_skills=[]
diploma=[]
diploma_major=[] for doc in nlp.pipe(text, disable=["tagger"]):
skills = [e.text for e in doc.ents if e.label_ == 'SKILLS']
for name, proc in nlp2.pipeline:
doc = proc(doc)
for value, rel_dict in doc._.rel.items():
for e in doc.ents:
for b in doc.ents:
if e.start == value[0] and b.start == value[1]:
if rel_dict['EXPERIENCE_IN'] >= 0.9:
experience_skills.append(b.text)
experience_year.append(e.text)
if rel_dict['DEGREE_IN'] >= 0.9:
diploma_major.append(b.text)
diploma.append(e.text)
return skills, experience_skills, experience_year, diploma, diploma_majordef analyze_jobs(item):
with open('./path_to_job_descriptions', 'w', encoding='utf-8') as file:
file.write('[')
for i,row in enumerate(item['Description']):
try:
skill, experience_skills, experience_year, diploma, diploma_major=analyze([row])
data=json.dumps({'Job ID':item['JOBID'[i],'Title':item['Title'[i],'Location':item['Location'][i],'Link':item['Link'][i],'Category':item['Category'[i],'document':row, 'skills':skill, 'experience skills':experience_skills, 'experience years': experience_year, 'diploma':diploma, 'diploma_major':diploma_major}, ensure_ascii=False)
file.write(data)
file.write(',')
except:
continue
file.write(']')analyze_jobs(path)