How to Fine-Tune BERT Transformer with spaCy 3 for NER
Feb 28, 2021
Since the seminal paper “ Attention is all you need ” of Vaswani et al, Transformer models have become by far the state of the art in natural language processing (NLP) technology. With applications ranging from NER, Text Classification, Question Answering or text generation, the applications of this amazing technology are limitless.
More specifically, BERT — which stands for Bidirectional Encoder Representations from Transformers— leverages the transformer architecture in a novel way. For example, BERT analyses both sides of the sentence with a randomly masked word to make a prediction. In addition to predicting the masked token, BERT predicts the sequence of the sentences by adding a classification token [CLS] at the beginning of the first sentence and tries to predict if the second sentence follows the first one by adding a separation token[SEP] between the two sentences.

BERT Architecture
In this tutorial, I will show you how to fine-tune a BERT model to predict entities such as skills, diploma, diploma major and experience in software job descriptions.
Fine tuning transformers requires a powerful GPU with parallel processing. For this we use Google Colab since it provides freely available servers with GPUs.
For this tutorial, we will use the newly released spaCy 3 library to fine tune our transformer. Below is a step-by-step guide on how to fine-tune the BERT model on spaCy 3.
Data Labeling:
I have only labeled 120 job descriptions with entities such as skills , diploma , diploma major, and experience for the training dataset and about 70 job descriptions for the dev dataset.
In this tutorial, I used the UBIAI text annotation tool because it comes with extensive features such as:
- ML auto-annotation
- Dictionary, regex, and rule-based auto-annotation
- Team collaboration to share annotation tasks
- Direct annotation export to IOB format