If you want to combine the capabilities of pretrained Large language models with external data sources for improved capabilities, then this article is for you. In this article, we aim to provide a comprehensive exploration of the application of Retrieval Augmented Generation (RAG) and its intricate relationship with large language models.
In this article we cover :
RAG, or Retrieval Augmented Generation, represents a cutting-edge approach that synergizes the strengths of pre-trained large language models (LLM), such as GPT-3 or GPT-4, with external data sources. By seamlessly integrating these components, RAG harnesses the sophisticated language understanding and generation capabilities of LLM with the precision and depth of specialized data search techniques. This fusion empowers the system to not only deliver nuanced and precise responses but also to adapt dynamically to a wide range of user queries and information needs. The result is a versatile and robust framework that excels in generating contextually relevant and informative outputs across diverse domains and applications.
When using traditional language models (LLMs), there are several limitations to consider:
RAG addresses these limitations by integrating the general knowledge base of LLMs with access to specific information, such as data from your product database and user manuals. This approach enables highly accurate and tailored responses that meet your organization’s needs.
Now that you understand what RAG is, let’s explore the steps involved in setting up this framework:
Begin by gathering all the necessary data for your application. For a customer support chatbot in an electronics company, this might include user manuals, a product database, and a list of FAQs.
Data chunking involves breaking down your data into smaller, more manageable pieces. For example, a lengthy 100-page user manual can be divided into different sections, each potentially addressing different customer queries. This approach focuses each chunk on a specific topic, making retrieved information more directly applicable to user queries and improving efficiency by quickly obtaining relevant information instead of processing entire documents.
After breaking down the source data, it needs to be converted into a vector representation using document embeddings. These numeric representations capture the semantic meaning behind the text, allowing the system to understand user queries and match them with relevant information in the source dataset based on meaning rather than a simple word-to-word comparison. This ensures that responses are relevant and aligned with the user’s query.
When a user query enters the system, it is also converted into an embedding or vector representation using the same model as for document embeddings to ensure consistency. The system then compares the query embedding with the document embeddings, identifying and retrieving chunks whose embeddings are most similar to the query embedding using measures like cosine similarity and Euclidean distance. These chunks are considered the most relevant to the user’s query.
The retrieved text chunks and the initial user query are fed into a language model, which uses this information to generate a coherent response to the user’s questions through a chat interface
To seamlessly execute these steps for generating responses with LLMs, you can use a data framework like LlamaIndex. This solution enables you to develop your own LLM applications by efficiently managing the flow of information from external data sources to language models like GPT-3. To learn more about this framework and how to build LLM-based applications, read our tutorial on LlamaIndex.
Retrieval Augmented Generation (RAG) has several applications across various domains. Some key applications include:
Let’s set up a system where queries about job opportunities are answered using Langchain for database interaction and OpenAI’s GPT-3.5 model for natural language responses. The generate function takes a query, retrieves relevant information from the database, constructs a prompt with the query and database context, and uses the Langchain ChatOpenAI model to generate a response.
First let’s install the needed libraries
!pip install -q langchain
!pip install -q openai
Then set the OpenAI API key. You can get your key from the official OpenAi website.
import os
os.environ['OPENAI_API_KEY'] = 'Enter your OpeAi Key'
import numpy as np
import pandas as pd
import sqlite3
#If you have your data on Excel feel free to upload it into a dataframe
df = pd.read_excel("Your Excel file")
Now Let’s check the data we’re working with. In our example our data frame is composed of these columns.
df.columns
import sqlite3
# Connect to the SQLite database
conn = sqlite3.connect('People.sqlite')
c = conn.cursor()
# Create the People table if it doesn't exist
c.execute('''CREATE TABLE IF NOT EXISTS People (
_File TEXT,
CERTIFICATE TEXT,
CHARACTERISTIC TEXT,
COMPANY TEXT,
DATE TEXT,
EDUCATION TEXT,
EMAIL TEXT,
INDUSTRY TEXT,
JOBTITLE TEXT,
LANGUAGES TEXT,
LOCATION TEXT,
NAME TEXT,
NUMBER TEXT,
SKILL TEXT,
TIME TEXT,
URL TEXT,
Text TEXT
)''')
conn.commit()
# Insert data from the DataFrame into the People table
df.to_sql('People', conn, if_exists='replace', index=False)
# Retrieve and print all rows from the People table
c.execute('''SELECT * FROM People''')
for row in c.fetchall():
print(row)
At this stage the SQL database is created from the Excel file.
Now let us format a function that takes as input an SQL query in addition to the database and returns the output to that query from the database.
import sqlite3
def read_sql_query(sql, db):
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute(sql)
rows = cur.fetchall()
conn.close()
return rows
# Example usage
db_file = 'People.sqlite'
sql_query = 'SELECT * FROM People'
result_rows = read_sql_query(sql_query, db_file)
for row in result_rows:
print(row)
Import the libraries needed
from langchain.llms import OpenAI
from langchain.sql_database import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
Now create an instance of the SQLDatabase, plus an instance from the openAi LLM. In this example we set the temperature of the OpenAi instance to 0.
input_db = SQLDatabase.from_uri('sqlite:///People.sqlite')
llm_1 = OpenAI(temperature=0)
Then set the Langchain SQL agent.
db_agent = SQLDatabaseChain(llm = llm_1, database = input_db, verbose=True)
Question 1:
db_agent.run("Give me the top 5 skills")
Question 2:
db_agent.run("Give me the top 5 candidates that have experience in Python, Angular and AWS
")
In conclusion, Retrieval Augmented Generation (RAG) stands out as the leading technique for harnessing the language capabilities of Large Language Models (LLMs) in conjunction with specialized databases. These systems effectively address critical challenges in natural language processing, offering an innovative solution.
Despite their advancements, RAG applications are not without limitations, particularly in their dependence on the quality of input data. To maximize the effectiveness of RAG systems, human oversight is essential.Careful curation of data sources, combined with expert knowledge, is crucial to guarantee the reliability of these solutions.