Build An NLP Project From Zero To Hero (6): Model Integration

Feb 13, 2022

Successfully training a machine learning model is just the beginning. Integrating it into a business application is a whole new challenge. In this article, we will be introducing the notion of ML Model Integration and we will give a simple demonstration of the concept: We are building a web service that will be implemented with FastAPI, high performance, and easy to learn Python Web Framework. The service will include the trained Spacy NER Transformer Model in its API. We will also use the Twitter API to simulate getting live data.

Model Integration refers to the concept of adding Machine Learning Models as a feature for production software. And it is well known that this step is the most challenging in the Machine Learning Project Workflow. According to this survey from Statista, around 54% of organizations take more than a month to actually deploy their model in the year 2021.

First, we will explain briefly how to start with the Twitter Developers API and then we will elaborate on the web service structure and ends with a small demo.

Starting with the Twitter Developer API

If you are new to the concept of APIs or Application Programming Interface, we define them as a connection between computers or between computer programs. Here, we want to establish a connection between our own web service and the Twitter API, which will provide us with useful data.

First, follow the steps in this guide.

After successfully creating an application with the Twitter API, copy and save, to a safe location, the credentials details: your API Key, your API Secret Key, your Bearer Token, your Access Token, and your Secret Access Token. In case you missed them during the configuration of your application, you can find them at the ‘Keys and tokens’ tab on your project page in the Twitter Developer Portal.

Twitter Developer Portal

Then, we can make a testing request with Postman to check if the project configuration is working. Postman is an API platform for building and using APIs. Create an account and download the desktop application.

Now, we need to make an environment within Postman that includes all our credentials by default (and so we do not need to add them to every request), create one, and make sure that the variables have the same values as the picture below as well as copying their values (the different credentials) in both initial and current value fields:

Postman request

Now, create a new HTTP request with this URL, do not forget to set its corresponding environment. Here we are going to get the latest ten tweets with the keyword stocks and we will be extracting their public metrics: retweet_count, reply_count, like_count, and quote_count.

https://api.twitter.com/2/tweets/search/recent?query=stocks&tweet.fields

Testing Twitter API response

Now, we can proceed to build our own API.

Stock Market Tweets Analyzer

Not really a fancy name for an application but it is honest work.

The project structure is as follows:

Project structure

main.py: the main file that contains our service and exposes it to the outside.
.env: contain the environment variables of the application, precisely all the confidential data and credentials of the application
env: this folder is the result of creating a python virtual environment and it contains all our dependencies and the path of Python executable for our project. Do not forget to create this environment by:
python -m venv env
.gitignore: Some files and directories are better not be tracked by the version control. For example, we ignore pycache and .vscode folders.
readme: If you want to explain the work done in the repository.
utils folder: Notice the init.py file. It is a package that contains two modules, twitter_api and nlp.
trf_ner folder: the files of the NER model, I have downloaded it from Google Drive.

To install all the needed dependencies:

				
					pip install "fastapi[all]" #which includes also uvicorn, a lightweight server

      pip install spacy==3.2.1 -U spacy[transformers] python-dotenv requests

Now, the basic idea is to build upon the request we used in Postman. For now, our web service will accept a request including a keyword to search for tweets and their maximum number. Then, it will append to the extracted entities using our NER model and return the response back to the requester.

To get started with FastAPI, I recommend easily their documentation as it is very intuitive. This is the main.py file that holds our application logic. The code is very straightforward: we declare a FastAPI application and we use it to build our routes:

				
					from fastapi import FastAPI
      from pydantic import BaseModel
      from utils.nlp import extract_ents
      from utils.twitter_api import get_response

      app = FastAPI()

      class Query(BaseModel):
          keyword: str
          max_results: int



      @app.get("/")
      async def root():
          return {"message": "Hello to Stock Market NLP Analyzer"}



      @app.get("/get_tweet_ents")
      async def root_post(query:Query):
          return {"query": query}

      @app.post("/get_tweet_ents")
      async def get_tweet_ents(query:Query):
          data  = get_response(query.keyword,query.max_results)
          data = extract_ents(data)
          return data

You are probably not familiar with pydantic’s BaseModel, a long short story, we use it to define the schema of our query and a very simple mean for validation.

The nlp module contains all the code necessary to load and implement the functionality of our NER model, we just need to import the function extract_ents:

				
					import spacy
      import re

      ner = spacy.load('trf_nermodel-best')

      def test_model():
          '''
          Check if the model is loaded properly
          '''

          ner = spacy.load('trf_nermodel-best')

          samples = ["Facebook has a price target of $ 20 for this quarter",
                  "$ AAPL is gaining a new momentum"]

          doc = ner.pipe(samples[0])

          for doc in ner.pipe(samples,disable=['tagger','parser']):
              for ent in doc.ents:
                  print(ent.label_, ent.text)
              print('-----')


      def clean_tweets(texts):
          '''
          Preprocessing necessary for tweets, removing urls and three dots punctuations
          '''
          filtered = []
          url_pattern = "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+"
          for text in texts:
              string = re.sub(r''+str(url_pattern), '', text, flags=re.MULTILINE)
              string = re.sub(r'…','',string)
              string = re.sub(r'...','',string)
              # print('This is Tweet: ',string)
              filtered.append(string)

          return filtered

      def extract_ents(data):
          '''
          Main function to implement NER functionality
          '''
          texts = [tweet['text'] for tweet in data]
          for index,doc in enumerate(ner.pipe(clean_tweets(texts),disable=['tagger','parser'])):
              data[index]['entities'] = [{'text':ent.text,'label':ent.label_} for ent in doc.ents]
          return data

A note about the clean_tweets function, at first, I forgot the preprocessing we have done before to our training dataset and so the model failed completely. So, never forget this detail! This step was done in our Data Preprocessing episode%3A).

In the twitter_api module, we have implemented everything needed to communicate with the Twitter API obviously, the code was inspired by this amazing article, so check it out for further understanding! Be aware that the article talks about the API for Scientific research which is not covered by the Essential enrollment we have done at the beginning of the project.

				
					import os
      from dotenv import load_dotenv
      import requests

      #load your credentials through the .env file
      load_dotenv()

      def create_headers():

          api_key = os.getenv('api_key')
          api_key_secret = os.getenv('api_key_secret')
          bearer_token = os.getenv('bearer_token')
          access_token = os.getenv('acess_token')
          access_token_secret = os.getenv('acess_token_secret')


          headers = {
              "access_token":access_token,
              "access_token_secret":access_token_secret,
              "Authorization":'Bearer '+bearer_token,
              "api_key_secret":api_key_secret,
              "api_key":api_key
          }


          return headers

      def create_url(keyword, max_results = 10):

          search_url = "https://api.twitter.com/2/tweets/search/recent?"


          query_params = {'query': keyword,
                          'max_results': max_results,
                          'tweet.fields': 'public_metrics'}

          return (search_url, query_params)


      def connect_to_endpoint(url, headers, params):

          response = requests.request("GET", url, headers = headers, params = params)
          print("Endpoint Response Code: " + str(response.status_code))
          if response.status_code != 200:
              raise Exception(response.status_code, response.text)
          return response.json()



      def get_response(keyword="stocks",max_results=10,verbose=False):
          headers = create_headers()
          url = create_url(keyword, max_results=10)
          json_response = connect_to_endpoint(url[0], headers, url[1])

          if verbose:
              print(json_response)
              print(type(json_response))

          return json_response['data']

Run your application by using uvicorn:

uvicorn main:app --reload --port 5000

Now, let us test our API! Make sure to type the correct route and set the request type to ‘POST’ and then write your request as raw JSON:

{
        "keyword":"stocks",
        "maximum_results":10
      }

We have noticed that the model was not that good with tweets that did not talk directly and mainly about the stock market (like tagging TIGRAY as a Company). We have also noticed that it sometimes confuses a famous PERSON in a tweet as a company because the majority of the training data had some organizations begun by the symbol ‘@’. And there are of course some mistakes here and there: In this example, the model tagged ‘amp’ as a ticker which is not. ‘amp’ means Auction Market Prefered. As you can see there are many more complex examples to be learned.

				
					{
              "id": "1487848993598033922",
              "public_metrics": {
                  "retweet_count": 69,
                  "reply_count": 0,
                  "like_count": 0,
                  "quote_count": 0
              },
              "text": "RT [@Nayakone](http://twitter.com/Nayakone): Mutual Fund Top Holding Stocks

- Infosys
- TCS
- HDFC Bank
- SBI
- Airtel
- L&amp;T
- HDFC
- Reliance Ind
- Kotak Bank
- ICICI…",
              "entities": [
                  {
                      "text": "[@Nayakone](http://twitter.com/Nayakone)",
                      "label": "COMPANY"
                  },
                  {
                      "text": "Mutual Fund",
                      "label": "COMPANY"
                  },
                  {
                      "text": "Infosys",
                      "label": "COMPANY"
                  },
                  {
                      "text": "TCS",
                      "label": "TICKER"
                  },
                  {
                      "text": "HDFC Bank",
                      "label": "COMPANY"
                  },
                  {
                      "text": "SBI",
                      "label": "TICKER"
                  },
                  {
                      "text": "Airtel",
                      "label": "COMPANY"
                  },
                  {
                      "text": "L&amp;T",
                      "label": "TICKER"
                  },
                  {
                      "text": "HDFC",
                      "label": "TICKER"
                  },
                  {
                      "text": "Reliance Ind",
                      "label": "COMPANY"
                  },
                  {
                      "text": "Kotak Bank",
                      "label": "COMPANY"
                  },
                  {
                      "text": "ICICI",
                      "label": "TICKER"
                  }
              ]
          },...

Remember that we have limited the scope of our training data (only take tweets that talk about companies by name or ticker), its size (in total 400 tweets), and its sources (financial tweets dataset from Kaggle). This was of course to make things simpler in this series. In fact, when we tried it with a new source (with a different style of tweets), the model obviously failed.

We have mentioned this point during the data labeling and data processing, this is just you know how crucial it is to make your data very representative. Given this, one should think of fine-tuning it.

Conclusion

In this article, we have learned to quickly integrate a Spacy NLP model into a web application and use it to provide services to users or other possible services through the HTTP protocol and we learned also to leverage other existing APIs like Twitter Developer API. You can be proud now because you made something tangible, you have progressed beyond just ‘training’ models. Many people forgot what is important about Machine Learning in general, that it is not about having excellent performing models, it is more about having practical and working models.

If you have questions, do not hesitate to contact me through Linkedin or Twitter.

If you would like to request a demo, please email us at: admin@100.21.53.251 or Twitter.

Happy learning and see you in the next article!

What are you waiting for?

Automate your process!

The Services provided are really great, we received a genuine advice and at very reasonable cost. all the work went hassle-free and no complication.

Build An NLP Project From Zero To Hero (6): Model Integration

Feb 13, 2022

Starting with the Twitter Developer API

Stock Market Tweets Analyzer

Conclusion

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Build An NLP Project From Zero To Hero (6): Model Integration

Feb 13, 2022

Starting with the Twitter Developer API

Stock Market Tweets Analyzer

Conclusion

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

Fine Tuning LLMs on Your Own Dataset