Can Weak Labeling Replace Human-Labeled Data? A step-by-step comparison between weak and full supervision
Jun 7, 2022
In recent years, there has been a significant advancement in Natural Language Processing (NLP) due to the advent of deep learning models. Real-world applications using NLP, ranging from intelligent chatbots to automated data extraction from unstructured documents, are becoming more prevalent and bringing real business values to many companies. However, these models still require hand-labeled training data to fine-tune them to the specific business use cases. It can take many months to gather this data and even longer to label it, especially if a domain expert is needed and there are multiple classes to be identified within the text. As you can imagine, this can become a real adoption barrier to many businesses as subject matter experts are hard to find and expensive.
To address this problem, researchers have adopted weak forms of supervision, such as using heuristically generated label functions and external knowledge bases to programmatically label the data. While this approach holds a lot of promise, its impact on the model performance in comparison with full supervision remains unclear.
In this tutorial, we will generate two training datasets from job descriptions: one generated with weak labeling and a second dataset generated by hand labeling using UBIAI. We will then compare the model performance on a NER task that aims at extracting skills, experience, diploma and diploma major from job descriptions. The data and the notebook are available in my github repo.
Weak Supervision
Weak supervision pipelines have three components: (1) user-defined labeling functions and heuristic functions, (2) a statistical model which takes as input the labels from the functions, and outputs probabilistic labels, and (3) a machine learning model that is trained on the probabilistic training labels from the statistical model.

Image by Author