Fine-Tuning Language Models for AI Agents using UbiAI: A Comprehensive Guide and Walkthrough to FireAct and Beyond.

Feb 3rd, 2025

In this article, you will get a sense of how can FireAct enhance your LLM performance, and you all also explore a very useful use case where we are going to build an AI Agent to be our Sales Agent.

In recent years, the field of artificial intelligence (AI) has seen significant advancements in the development of language agents — AI systems that use language models (LMs) to interact with external tools, environments, and users. These agents are capable of reasoning, acting, and even self-reflecting to solve complex tasks. However, most existing language agents rely on few-shot prompting techniques with off-the-shelf LMs, which often leads to suboptimal performance, high costs, and limited robustness.

This article explores the fine-tuning of language models for agentic tasks, focusing on FireAct, a novel approach introduced in the research paper “FireAct: Toward Language Agent Fine-tuning”.

We will delve into the benefits of fine-tuning LMs for language agents, the methodology behind FireAct, and how you can leverage tools like ChatGPT-4, FireAct, and LLaMA 3.1 8B to create powerful AI agents. Additionally, we will discuss how to extract datasets from research papers, use FireAct for training, and fine-tune models like LLaMA 3.1 8B for agent development.

Why Fine-Tuning Language Models for Agents?

Limitations of Few-Shot Prompting

Few-shot prompting, while convenient, has several limitations:

Limited Learning Support: Off-the-shelf LMs are not optimized for agentic tasks like generating actions or self-evaluations.
Poor Robustness: LMs often struggle with noisy or adversarial environments when used as agents.
High Costs: Advanced agents often require GPT-4, which is expensive and slow.
Lack of Controllability: Prompting offers limited control over the agent’s behavior, making it difficult to ensure consistent performance.

Benefits of Fine-Tuning

Fine-tuning LMs for agentic tasks offers several advantages:

Improved Performance: Fine-tuned models consistently outperform prompted models. For example, fine-tuning Llama2–7B with 500 agent trajectories generated by GPT-4 led to a 77% performance increase on the HotpotQA benchmark.
Cost Efficiency: Fine-tuned models reduce inference time and costs. For instance, fine-tuning GPT-3.5 reduced inference time by 70% compared to prompting.
Robustness: Fine-tuned agents are more resilient to noisy or distracting tool outputs.
Generalization: Fine-tuned models generalize better to new tasks, making them more versatile.

FireAct: A Novel Approach to Fine-Tuning Language Agents

What is FireAct?

FireAct is a fine-tuning approach that leverages diverse agent trajectories generated by strong LMs like GPT-4. These trajectories are converted into the ReAct format (Reasoning and Acting) and used to fine-tune smaller LMs. FireAct explicitly promotes data diversity by mixing trajectories from multiple tasks and prompting methods, such as Chain of Thought (CoT) and Reflexion.

Key Features of FireAct

Multi-Task Fine-Tuning: FireAct uses trajectories from multiple tasks (e.g., HotpotQA, StrategyQA, MMLU) to create a more versatile agent.
Multi-Method Fine-Tuning: By combining ReAct, CoT, and Reflexion, FireAct agents can adapt to different task complexities and choose the most suitable method for each problem.
Efficiency: Fine-tuned agents eliminate the need for few-shot prompting, making inference faster and more cost-effective.

Experimental Results

The research paper demonstrates the effectiveness of FireAct across various benchmarks:

HotpotQA: Fine-tuning GPT-3.5 with 500 ReAct trajectories improved the Exact Match (EM) score from 31.4 to 39.2 (a 25% increase).
Bamboogle: Fine-tuned GPT-3.5 achieved an EM score of 44.0, outperforming prompted GPT-3.5 (40.8).
Robustness: Fine-tuned agents showed greater resilience to noisy tool outputs, with performance drops of only 14.2% compared to 33.8% for prompted agents.

How to Create a Fine-Tuned Language Agent

Dataset Extraction

To fine-tune a language model, you need a high-quality dataset of agent trajectories. Here’s how you can extract and prepare such a dataset:

Use ChatGPT-4: Generate agent trajectories by prompting GPT-4 with questions from various datasets (e.g., HotpotQA, StrategyQA).
Convert to ReAct Format: Ensure the trajectories follow the ReAct format, which includes thoughts, actions, and observations.
Diversify Data: Mix trajectories from different tasks and prompting methods to create a diverse training set.

Fine-Tuning with FireAct

Once you have your dataset, you can fine-tune a language model using FireAct:

Choose a Base Model: Select a base LM like GPT-3.5, Llama2–7B, or LLaMA 3.1 8B.
Fine-Tuning Method: Use Low-Rank Adaptation (LoRA) for efficient fine-tuning or full-model fine-tuning for better performance.
Training: Train the model on your dataset, ensuring a mix of tasks and methods for diversity.

Training LLaMA 3.1 8B

LLaMA 3.1 8B is a powerful open-source model that can be fine-tuned for agentic tasks:

Dataset Preparation: Use FireAct trajectories to create a training set.
Fine-Tuning: Fine-tune LLaMA 3.1 8B using LoRA or full-model fine-tuning.
Evaluation: Test the fine-tuned model on benchmarks like HotpotQA and Bamboogle to measure performance improvements.

Fine-tune and evaluate your model with UBIAI

Prepare your high quality Training Data
Train best-in-class LLMs: Build domain-specific models that truly understand your context, fine-tune effortlessly, no coding required
Deploy with just few clicks: Go from a fine-tuned model to a live API endpoint with a single click
Optimize with confidence: unlock instant, scalable ROI by monitoring and analyzing model performance to ensure peak accuracy and tailored outcomes.

Complete Walk-through on how to Fine-Tune and deploy an LLM Agent using UbiAI

Step 1: Set Up Your Environment

Install Required Libraries:

Install the OpenAI library to interact with the ChatGPT API.

				
					pip install openai

Install the datasets library to load the HotpotQA dataset.

				
					pip install datasets

Set Up OpenAI API:

Sign up for the OpenAI API and obtain your API key.
Initialize the API in your environment:

				
					export OPENAI_API_KEY="your_api_key_here"

Step 2: Load the HotpotQA Dataset

Load the Dataset:

Use the datasets library to load the HotpotQA_200 dataset.

				
					from datasets import load_dataset
ds = load_dataset("Sing0402/hotpotqa_200") #dataset that contains 200 examples from the original hotpotqa dataset
questions = ds["train"]["questions"]

Step 3: Define the Prompt for GPT-4

Create a Detailed Prompt:

Write a prompt that instructs GPT-4 to solve the question step-by-step using the ReAct format (Reasoning and Acting).

				
					def create_prompt(question):
    return f"""
   You are an AI assistant that solves questions step-by-step using the ReAct format. For each question, follow these steps:

1. **Thought**: Think about what information is needed to solve the question and plan your next steps.
2. **Action**: Perform an action to gather information. Actions can be:
   - `Search[query]`: Search for information using a query.
   - `Finish[answer]`: Provide the final answer and end the task.
3. **Observation**: Record the result of the action.

Your response must strictly follow the ReAct format. Here is an example:

---
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought: I need to find the elevation range for the eastern sector of the Colorado orogeny.
Action: Search[What is the elevation range for the eastern sector of the Colorado orogeny?]
Observation: The elevation range is 1,800 to 7,000 feet.
Action: Finish[1,800 to 7,000 feet]
---

Now, solve the following question using the ReAct format. Provide clear thoughts, actions, and observations step-by-step:
Question: {question}
    """

Step 4: Generate Trajectories with GPT-4

Make API Requests:

For each question in the dataset, send it to GPT-4o using the prompt and record the trajectory.

				
					import json
import openai
trajectories = []

for question in questions:
    prompt = create_prompt(question)

    response = openai.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that solves questions step-by-step using the ReAct format."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7
    )

    trajectory = response.choices[0].message
    trajectories.append({
        "question": question,
        "trajectory": trajectory,
    })

trajectory_content = []
for entry in trajectories:
    trajectory_content.append(entry["trajectory"].content)

Parse the Trajectories

If you want to structure the trajectories further (split into thoughts, actions, and observations), use a parsing function:

				
					def parse_trajectory(trajectory_text):
    steps = trajectory_text.split("\n")
    parsed_trajectory = []
    for step in steps:
        if "Thought:" in step:
            parsed_trajectory.append({"thought": step.replace("Thought:", "").strip()})
        elif "Action:" in step:
            parsed_trajectory.append({"action": step.replace("Action:", "").strip()})
        elif "Observation:" in step:
            parsed_trajectory.append({"observation": step.replace("Observation:", "").strip()})
    return parsed_trajectory

Step 5: Save the Distilled Dataset

Save to a JSON File:

Save the trajectories in a structured format (e.g., JSON) for fine-tuning.

				
					with open("hotpotqa_trajectories.json", "w") as f:
    json.dump(trajectory_content, f, indent=4)

Example Output:

The resulting JSON file (hotpotqa_trajectories.json) will look like this:

Step 6: Convert to CSV

Save as CSV using Pandas

the final file should be in CSV to be able to upload it into UbiAI:

				
					import json
import pandas as pd

# Load the JSON data
with open('hotpotqa_trajectories.json', 'r') as f:
    data = json.load(f)

# Extract relevant data and flatten the structure
rows = []
for item in data:
    question = item['question']
    trajectory = item['trajectory']

    # Extract thought, action, and observation from each step
    for step in trajectory:
        row = {'question': question}
        if 'thought' in step:
            row['thought'] = step['thought']
        if 'action' in step:
            row['action'] = step['action']
        if 'observation' in step:
            row['observation'] = step['observation']
        rows.append(row)

# Create a Pandas DataFrame
df = pd.DataFrame(rows)

# Save to CSV
df.to_csv('hotpotqa_trajectories.csv', index=False)

Step 7: Adjust the CSV File:

As you can see in the final CSV file, the output has too many columns which represent the observations, thoughts, and actions the LLM took.

We are going to merge all the cells into one cell “trajectories”, so the final output looks something like this:

the final step is to add the “System Prompt” in the file (Same Prompt we used in the previous code).

Step 8: Upload to UbiAI:

Head over to UbiAI and upload the dataset created:

Select “Prompt and Response”:

Select “Upload dataset”:

Complete the Dataset details:

Map the dataset with the necessary columns.

Validate your entry:

Step 9: Train the model:

Click on “New model”:

Select “Text Generation”, as that will to train LLMs:

Complete the model details:

Select “Assign Dataset to Model”, and choose the dataset we uploaded earlier:

Now that our model is created, let’s train it! Click on “More details”:

Let’s complete the Training Configurations, we will go with “llama-3–1–8b-instruct”.
Then press “Start Model Training”:

Now that our agent is trained and ready to be used, let’s compare it and make it useful with a good use case:

Sales agent:

1) Given a company list, search their websites and extract a summary
2) Extract people’s emails and contact information using apollo.io
3) Create a personalized email based on company’s summary
4) Send the email and add them to the CRM

Let’s Start!

First you need to get UbiAI API Token:

head over to UbiAI Model Page and select the model:

Copy the API Code and let’s go to our coding environment:
make sure to install ‘requests’ and ‘json’ libraries:

				
					pip install requests json

Let’s say you want to send email to marketing managers of these tech companies:

				
					tech_company_data = [
    {"company_name": "Figma", "url": "https://www.figma.com"},
    {"company_name": "Notion", "url": "https://www.notion.so"},
    {"company_name": "Linear", "url": "https://www.linear.app"},
    {"company_name": "Vercel", "url": "https://www.vercel.com"},
    {"company_name": "Sentry", "url": "https://www.sentry.io"},
    {"company_name": "Datadog", "url": "https://www.datadog.com"},
    {"company_name": "Twilio", "url": "https://www.twilio.com"},
    {"company_name": "Plaid", "url": "https://www.plaid.com"},
    {"company_name": "Superhuman", "url": "https://www.superhuman.com"},
    {"company_name": "ButterCMS", "url": "https://www.buttercms.com"},
]

After getting our API Token we need to scrape the websites that we want and store them, this function will help in scraping the landing page of each website and extract information:

				
					import requests
from bs4 import BeautifulSoup
import openai 


def scrape_website(url):
  try:
    response = requests.get(url)
    response.raise_for_status()  

    soup = BeautifulSoup(response.content, "html.parser")

    text = ""
    for paragraph in soup.find_all("p"):
      text += paragraph.get_text() + "\n" 

    return text

  except requests.exceptions.RequestException as e:
    print(f"Error fetching URL: {e}")
    return None

Next is to pass the scraped content to our agent and summarize the website information in a concise and clear summary:

This is the prompt we will be using:

				
					You are a highly capable summarization AI. Your primary function is to provide concise and objective summaries of given text.

Key Guidelines:

Conciseness: Keep summaries as brief as possible while maintaining essential information.
Objectivity: Avoid injecting personal opinions, biases, or interpretations. Present information factually.
Clarity: Ensure the summary is easy to understand and free from jargon or ambiguity.
Focus on Key Takeaways: Highlight the most important points and avoid unnecessary details.
Example:

Input: "This company offers a cloud-based platform for businesses to manage their customer relationships. Key features include contact management, sales force automation, marketing automation, and customer service tools. The platform is designed to help businesses increase sales, improve customer satisfaction, and streamline their operations."

Output: "This company provides a cloud-based CRM platform with features like contact management, sales automation, and marketing tools to help businesses improve sales and customer relationships."

Now, please provide a concise and objective summary of the following text:

The function “scrape_and_summarize()” will use the “scrape_website()” function and feed the output to our agent so it can summarize the way we want it to:

				
					def scrape_and_summarize(companies, api_url, api_token):
    summaries = []
    
    for company in companies:
        company_name = company["company_name"]
        url = company["url"]
        text = scrape_website(url)

        if text:
            data = {
                "input_text": "",
                "system_prompt": "You are a highly capable summarization AI. Your primary function is to provide concise and objective summaries of given text.",
                "user_prompt": f"""
                You are a highly capable summarization AI. Your primary function is to provide concise and objective summaries of given text.
                Now, please provide a concise and objective summary of the following text from {company_name}: {text}""",
                "temperature": 0.7
            }

            response = requests.post(api_url + api_token, json=data)
            res = json.loads(response.content.decode("utf-8"))

            summaries.append({
                "company_name": company_name,
                "summary": res
            })

    return summaries

summerized_texts=[]
summerized_text= scrape_and_summarize(tech_company_data,url,my_token)
print(summerized_text)

The output is a Python list containing company names along with their summaries.

The next step involves extracting the company’s marketing managers’ contact information.

To achieve this, we’ll use Apollo.io. Head over to Apollo and get an API Key to continue working.

These are the companies from which we want to retrieve marketing managers:

				
					tech_company_urls = [
    "figma.com",  
    "notion.so",  
    "linear.app",  
    "vercel.com",  
    "sentry.io",  
    "datadog.com",   
    "twilio.com",  
    "plaid.com",  
    "superhuman.com",  
    "buttercms.com", 
]

Using this function we will be able to extract “Marketing Managers” or any other employees by simply changing the “person_title” variable:

				
					import requests

person_titles="marketing manager"
scraped_content=[]
for company_website in tech_company_urls:

  url = f"https://api.apollo.io/api/v1/mixed_people/search?person_titles[]={person_titles}&q_organization_domains={company_website}&page=1&per_page=10"

  headers = {
      "accept": "application/json",
      "Cache-Control": "no-cache",
      "Content-Type": "application/json",
      "x-api-key": "apollo-api-key"
  }

  response = requests.post(url, headers=headers)
  scraped_content.append(response.text)

The output of this function will be a JSON string. We’ll need to parse it and extract the necessary information using the provided function.

				
					import json
def extract_contact_info(data_list):
    contacts = []

    for data in data_list:
        parsed_data = json.loads(data)
        people = parsed_data.get("people", [])
        
        for person in people:
            current_company = None
            for job in person.get("employment_history", []):
                if job.get("current"):
                    current_company = job.get("organization_name")
                    break  

            contact_info = {
                "name": person.get("name"),
                "email": person.get("email"),
                "linkedin_url": person.get("linkedin_url"),
                "company_name": current_company
            }
            contacts.append(contact_info)

    return contacts


result = extract_contact_info(scraped_content)
print(result)

the output now will be a clean list that contains the name , email , linkedin profile url and the company name of each marketing manager.

Now we will generate personalised emails for each marketing manager based on the company description.

				
					import requests
import json

def generate_personalized_email(contact_info, company_descriptions, api_url, api_token):
    """
    Generates a personalized email for each marketing manager using company descriptions and their details.
    
    Args:
        contact_info: List of dictionaries containing marketing manager details (name, email, linkedin, company_name).
        company_descriptions: List of dictionaries containing company names and their descriptions (company_name, summary).
        api_url: The URL of the LLM API.
        api_token: The API token for the LLM.
    
    Returns:
        A list of dictionaries containing the marketing manager's name, email, and the generated email content.
    """
    
    personalized_emails = []

    # Create a dictionary for quick access to company descriptions by company name
    company_description_map = {company['company_name']: company['summary'] for company in company_descriptions}

    for manager in contact_info:
        manager_name = manager['name']
        company_name = manager['company_name']
        manager_email = manager['email']
        linkedin_url = manager['linkedin_url']
        
        # Get the company description for the relevant company
        company_description = company_description_map.get(company_name, "No description available")

        # Prepare the input for the LLM
        data = {
            "input_text": "",
            "system_prompt": "You are a highly capable AI specializing in crafting personalized and professional emails.",
            "user_prompt": f"""
            Write a personalized email for {manager_name}, the Marketing Manager at {company_name}.
            Include relevant company information based on the following description:
            {company_description}
            Make the email respectful, engaging, and concise. 
            Mention their role and how their company stands out. End with a call to action to start a conversation.
            The email should feel warm and professional, expressing genuine interest in their work and company.

            Here's the manager's contact information:
            - Name: {manager_name}
            - Company: {company_name}
            - Email: {manager_email}
            - LinkedIn: {linkedin_url}
            """,
            "temperature": 0.7
        }

        response = requests.post(api_url + api_token, json=data)
        res = json.loads(response.content.decode("utf-8"))
        
        # Store the result along with the contact details
        personalized_emails.append({
            "name": manager_name,
            "email": manager_email,
            "company_name": company_name,
            "linkedin_url": linkedin_url,
            "personalized_email": res
        })

    return personalized_emails

We now have a fully functional AI agent capable of extracting company descriptions, gathering detailed employee information, and crafting personalized emails tailored to your specifications.

Benefits of Fine-Tuning for AI Agents

1. Performance Gains
Fine-tuned models consistently outperform prompted models, especially on complex tasks requiring multi-step reasoning and tool use.

2. Cost and Efficiency
Fine-tuning reduces inference time and costs, making it more practical for large-scale applications.

3. Robustness and Generalization
Fine-tuned agents are more robust to noisy environments and generalize better to new tasks, making them more versatile and reliable.

4. Flexibility
By combining multiple prompting methods (e.g., ReAct, CoT, Reflexion), fine-tuned agents can adapt to different task complexities and choose the most suitable approach for each problem.

Challenges and Future Directions

1. Data Diversity
While FireAct promotes data diversity, more research is needed to determine the optimal mix of tasks and methods for fine-tuning.

2. Scalability
Fine-tuning large models like GPT-4o or LLaMA 3.1 8B requires significant computational resources. Future work could explore more efficient fine-tuning techniques.

3. Multi-Agent Systems
The research paper focuses on single-agent systems. Future work could explore fine-tuning for multi-agent systems, where multiple agents collaborate to solve complex tasks.

4. Real-World Applications
While FireAct has shown promise in QA tasks, its applicability to real-world scenarios (e.g., robotics, web navigation) remains to be explored.

Conclusion

Fine-tuning language models for agentic tasks offers significant benefits over traditional few-shot prompting, including improved performance, cost efficiency, and robustness. FireAct provides a novel approach to fine-tuning by leveraging diverse agent trajectories from multiple tasks and prompting methods. By following the steps outlined in this article, you can create powerful AI agents using tools like ChatGPT-4, FireAct, and LLaMA 3.1 8B.

As the field of AI continues to evolve, fine-tuning language models for agents will play a crucial role in developing more capable, efficient, and versatile AI systems. Whether you’re building agents for question answering, web navigation, or robotics, fine-tuning offers a path to unlocking the full potential of language models in real-world applications.

References

Chen, B., Shu, C., Shareghi, E., Collier, N., Narasimhan, K., & Yao, S. (2023). FireAct: Toward Language Agent Fine-tuning. arXiv preprint arXiv:2310.05915.
OpenAI. (2023). GPT-4 Technical Report.
Touvron, H., et al. (2023). LLaMA 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288.

By leveraging the insights and methodologies from this article, you can take the first steps toward building advanced AI agents that are not only powerful but also efficient and robust.

What are you waiting for?

Automate your process!

The Services provided are really great, we received a genuine advice and at very reasonable cost. all the work went hassle-free and no complication.

Fine-Tuning Language Models for AI Agents using UbiAI: A Comprehensive Guide and Walkthrough to FireAct and Beyond.

Why Fine-Tuning Language Models for Agents?

Limitations of Few-Shot Prompting

Benefits of Fine-Tuning

FireAct: A Novel Approach to Fine-Tuning Language Agents

What is FireAct?

Key Features of FireAct

Experimental Results

How to Create a Fine-Tuned Language Agent

Dataset Extraction

Fine-Tuning with FireAct

Training LLaMA 3.1 8B

Fine-tune and evaluate your model with UBIAI

Complete Walk-through on how to Fine-Tune and deploy an LLM Agent using UbiAI

Step 1: Set Up Your Environment

Step 2: Load the HotpotQA Dataset

Step 3: Define the Prompt for GPT-4

Step 4: Generate Trajectories with GPT-4

Step 5: Save the Distilled Dataset

Step 6: Convert to CSV

Step 7: Adjust the CSV File:

Step 8: Upload to UbiAI:

Step 9: Train the model:

Sales agent:

Benefits of Fine-Tuning for AI Agents

Challenges and Future Directions

Conclusion

References

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Fine-Tuning Language Models for AI Agents using UbiAI: A Comprehensive Guide and Walkthrough to FireAct and Beyond.

Why Fine-Tuning Language Models for Agents?

Limitations of Few-Shot Prompting

Benefits of Fine-Tuning

FireAct: A Novel Approach to Fine-Tuning Language Agents

What is FireAct?

Key Features of FireAct

Experimental Results

How to Create a Fine-Tuned Language Agent

Dataset Extraction

Fine-Tuning with FireAct

Training LLaMA 3.1 8B

Fine-tune and evaluate your model with UBIAI

Complete Walk-through on how to Fine-Tune and deploy an LLM Agent using UbiAI

Step 1: Set Up Your Environment

Step 2: Load the HotpotQA Dataset

Step 3: Define the Prompt for GPT-4

Step 4: Generate Trajectories with GPT-4

Step 5: Save the Distilled Dataset

Step 6: Convert to CSV

Step 7: Adjust the CSV File:

Step 8: Upload to UbiAI:

Step 9: Train the model:

Sales agent:

Benefits of Fine-Tuning for AI Agents

Challenges and Future Directions

Conclusion

References

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

Fine Tuning LLMs on Your Own Dataset