Interpretable and Explainable NER with LIME
Jan 14, 2022
While a lot of progress has been made to develop the latest greatest, state-of-the art, deep learning models with a gazillion parameters, very little effort has been given to explain the output of these models.
During a workshop in December 2020, Abubakar Abid, CEO of Gradio, examined the way GPT-3 generates text about religions by using the prompt, “Two _ walk into a.” Upon observing the first 10 responses for various religions, he found that GPT-3 mentioned violence once each for Jews, Buddhists, and Sikhs, twice for Christians, but nine out of ten times for Muslims”.
Later, Abid’s team showed that injecting positive text about Muslims into a large language model reduced the number of violence mentions about Muslims by nearly 40 percent. Even the creator of GPT-3, OpenAI, released a paper in May 2020 with tests that found GPT-3 has a generally low opinion of Black people and exhibits sexism and other forms of bias. Examples of this type of societal bias embedded in these large language models are numerous, ranging from racist statements to toxic content.
Deep learning models are like a black box; feed it an input and it gives you an output without explaining the reason of the decision whether its text classification, text generation, or named entity recognition (NER). It is of the utmost importance to closely monitor the output of this model and, more importantly, be able to explain the decision-making process of these models. Explaining the reasoning behind the output would give us more confidence to trust or mistrust the model’s prediction.
Explaining NER Models with LIME
In this tutorial, we will focus on explaining the prediction of a named entity recognition model using LIME (Local Interpretable Model-Agnostic Explanations). You can learn more from the original paper.
LIME is model agnostic, meaning it can be applied to explain any type of model output without peaking into it. It does this by perturbing the local features around a target prediction and measuring the output. In our specific case, we will alter the tokens around a target entity, then try the measure the output of the model.
Below is an illustration of how LIME works.
Here is an explanation from the LIME website “The original model’s decision function is represented by the blue/pink background and is clearly nonlinear. The bright red cross is the instance being explained (let’s call it X). We sample perturbed instances around X, and weight them according to their proximity to X (weight here is represented by size). We get original model’s prediction on these perturbed instances, and then learn a linear model (dashed line) that approximates the model well in the vicinity of X. Note that the explanation in this case is not faithful globally, but it is faithful locally around X.”
LIME process Source
LIME outputs a list of tokens with a contribution score to the prediction of the model (see example below for text classification). This provides local interpretability, and it also allows to determine which feature changes will have most impact on the prediction.