ubiai deep learning

How to Fine-Tune BERT Transformer with spaCy 3 for NER

Feb 28, 2021

Since the seminal paper “ Attention is all you need ” of Vaswani et al, Transformer models have become by far the state of the art in natural language processing (NLP) technology. With applications ranging from NER, Text Classification, Question Answering or text generation, the applications of this amazing technology are limitless.

More specifically, BERT — which stands for Bidirectional Encoder Representations from Transformers— leverages the transformer architecture in a novel way. For example, BERT analyses both sides of the sentence with a randomly masked word to make a prediction. In addition to predicting the masked token, BERT predicts the sequence of the sentences by adding a classification token [CLS] at the beginning of the first sentence and tries to predict if the second sentence follows the first one by adding a separation token[SEP] between the two sentences.

Fine-Tune BERT Transformer with spaCy 3 for NER

BERT Architecture

In this tutorial, I will show you how to fine-tune a BERT model to predict entities such as skills, diploma, diploma major and experience in software job descriptions.

Fine tuning transformers requires a powerful GPU with parallel processing. For this we use Google Colab since it provides freely available servers with GPUs.


For this tutorial, we will use the newly released  spaCy 3 library  to fine tune our transformer. Below is a step-by-step guide on how to fine-tune the BERT model on spaCy 3.

Data Labeling:

To fine-tune BERT using spaCy 3, we need to provide training and dev data in the spaCy 3 JSON format ( see here ) which will be then converted to a .spacy binary file. We will provide the data in IOB format contained in a TSV file then convert to spaCy JSON format.

I have only labeled 120 job descriptions with entities such as  skills ,  diploma ,  diploma major,  and  experience  for the training dataset and about 70 job descriptions for the dev dataset.


In this tutorial, I used the  UBIAI  text annotation tool because it comes with extensive features such as:

  • ML auto-annotation
  • Dictionary, regex, and rule-based auto-annotation
  • Team collaboration to share annotation tasks
  • Direct annotation export to IOB format
Fine-Tune BERT Transformer with spaCy 3 for NER

UBIAI Annotation Interface

Using the regular expression feature in UBIAI, I start annotating all the experience mentions that follows the pattern “d.*+.*” such as “5 + years of experience in C++”. I then uploaded a csv dictionary containing all the software languages and assigned the entity skills. The pre-annotation saves a lot of time and will help you minimize manual annotation.


For more information about  UBIAI  annotation tool, please visit the  documentation  page and my previous post “ Introducing UBIAI: Easy-to-Use Text Annotation for NLP Applications ”.

The exported annotation will look like this:




in O
electrical B-DIPLOMA_MAJOR
engineering I-DIPLOMA_MAJOR
or O
engineering I-DIPLOMA_MAJOR
. O
experience I-EXPERIENCE
Familiar O
with O
storage B-SKILLS
server I-SKILLS
architectures I-SKILLS
with O

In order to convert from IOB to JSON (see documentation  here ), we use spaCy 3 command:

					!python -m spacy convert drive/MyDrive/train_set_bert.tsv ./ -t json -n 1 -c iob
!python -m spacy convert drive/MyDrive/dev_set_bert.tsv ./ -t json -n 1 -c iob

After conversion to spaCy 3 JSON, we need to convert both the training and dev JSON files to .spacy binary file using this command (update the file path with your own):

!python -m spacy convert drive/MyDrive/train_set_bert.json ./ -t spacy!python -m spacy convert drive/MyDrive/dev_set_bert.json ./ -t spacy

Model Training:

1. Open a new Google Colab project and make sure to select GPU as hardware accelerator in the notebook settings.


2. In order to accelerate the training process, we need to run parallel processing on our GPU. To this end we install the NVIDIA 9.2 cuda library:

					!wget https://developer.nvidia.com/compute/cuda/9.2/Prod/local_installers/cuda-repo-ubuntu1604-9-2-local_9.2.88-1_amd64 -O cuda-repo-ubuntu1604–9–2-local_9.2.88–1_amd64.deb!dpkg -i cuda-repo-ubuntu1604–9–2-local_9.2.88–1_amd64.deb!apt-key add /var/cuda-repo-9–2-local/7fa2af80.pub!apt-get update!apt-get install cuda-9.2


To check the correct cuda compiler is installed, run: !nvcc –version:


3. Install the spacy library and spacy transformer pipeline:


pip install -U spacy
!python -m spacy download en_core_web_trf


4. Next, we install the pytorch machine learning library that is configured for cuda 9.2:


pip install torch==1.7.1+cu92 torchvision==0.8.2+cu92 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html


5. After pytorch install, we need to install spacy transformers tuned for cuda 9.2 and change the CUDA_PATH and LD_LIBRARY_PATH as below.


 Finally, install the cupy library which is the equivalent of numpy library but for GPU:



					!pip install -U spacy[cuda92,transformers]
!export CUDA_PATH=”/usr/local/cuda-9.2"
!pip install cupy

SpaCy 3 uses a config file config.cfg that contains all the model training components to train the model. In  spaCy training page , you can select the language of the model (English in this tutorial), the component (NER) and hardware (GPU) to use and download the config file template.

Fine-Tune BERT Transformer with spaCy 3 for NER

Spacy 3 config file for training.  Source

The only thing we need to do is to fill out the path for the train and dev .spacy files. Once done with the annotated data, we upload the file to Google Colab.

6. Now we need to auto-fill the config file with the rest of the parameters that the BERT model will need; all you have to do is run this command:




!python -m spacy init fill-config drive/MyDrive/config.cfg drive/MyDrive/config_spacy.cfg

I suggest to debug your config file in case there is an error:

!python -m spacy debug data drive/MyDrive/config.cfg

7. We are finally ready to train the BERT model! Just run this command and the training should start:


!python -m spacy train -g 0 drive/MyDrive/config.cfg — output ./



P.S: if you get the error cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed, just uninstall cupy and install it again and it should fix the issue.

If everything went correctly, you should start seeing the model scores and losses being updated:

Fine-Tune BERT Transformer with spaCy 3 for NER

BERT training on google colab

At the end of the training, the model will be saved under folder model-best. The model scores are located in meta.json file inside the model-best folder:

The scores are certainly well below a production model level because of the limited training dataset, but it’s worth checking its performance on a sample job description.

Entity Extraction with Transformers

To test the model on a sample text, we need to load the model and run it on our text:

					nlp = spacy.load(“./model-best”)
text = ['''Qualifications- A thorough understanding of C# and .NET Core- Knowledge of good database design and usage- An understanding of NoSQL principles- Excellent problem solving and critical thinking skills- Curious about new technologies- Experience building cloud hosted, scalable web services- Azure experience is a plusRequirements- Bachelor's degree in Computer Science or related field(Equivalent experience can substitute for earned educational qualifications)- Minimum 4 years experience with C# and .NET- Minimum 4 years overall experience in developing commercial software''']for doc in nlp.pipe(text, disable=["tagger", "parser"]):    print([(ent.text, ent.label_) for ent in doc.ents])

Below are the entities extracted from our sample job description:

[("C", "SKILLS"),("#", "SKILLS"),(".NET Core", "SKILLS"),("database design", "SKILLS"),("usage", "SKILLS"),("NoSQL", "SKILLS"),("problem solving", "SKILLS"),("critical thinking", "SKILLS"),("Azure", "SKILLS"),("Bachelor", "DIPLOMA"),("'s", "DIPLOMA"),("Computer Science", "DIPLOMA_MAJOR"),("4 years experience with C# and .NET -", "EXPERIENCE"),("4 years overall experience in developing commercial software ", "EXPERIENCE")]

Pretty impressive for only using 120 training documents! We were able to extract most of the skills, diploma, diploma major, and experience correctly.


With more training data, the model would certainly improve further and yield higher scores.


With only a few lines of code, we have successfully trained a functional NER transformer model thanks to the amazing spaCy 3 library. Go ahead and try it out on your use case and please share your results. Note, you can use  UBIAI  annotation tool to label your data, we offer free 14 days trial.

As always, if you have any comment, please send an email at admin@!