Your e-commerce agent is live in production. It’s supposed to recommend products, answer questions, and drive sales. Instead, it’s recommending winter coats in July, suggesting $2000 laptops to budget shoppers, and writing generic product descriptions that sound like they came from a 2005 SEO spam site.
You built the agent with CrewAI—multiple specialized agents working together, each handling a different part of the customer journey. You added RAG so it could pull product information from your catalog. You wrote detailed prompts explaining exactly what makes a good recommendation. And it still produces recommendations that make you wonder if the AI has ever actually shopped online.
Here’s what’s happening: your agent’s response generator—the component that actually writes product recommendations and descriptions—is using a generic language model. That model has read millions of web pages about products, but it has never learned what makes your products special. It doesn’t know which features actually convert browsers into buyers. It can’t distinguish between a flagship product and clearance junk. It writes descriptions that could apply to literally any product in the category.
The fix isn’t more RAG documents or longer prompts. The fix is fine-tuning the generator component on your actual product data so it learns what good recommendations look like in your domain. Not the whole agent—just the generator. This blog shows you exactly how to do it with CrewAI and UBIAI, using real e-commerce data. Your agent will go from generic spam to conversion-driving recommendations.
Why Your Multi-Agent E-commerce System Is Failing
Let’s start with what you probably built. You have a CrewAI system with multiple agents: a product search agent that finds relevant items, a recommendation agent that picks the best matches, and a content generation agent that writes descriptions. Each agent has a specific role. They pass information between each other. The whole system feels sophisticated.
But here’s the problem nobody talks about: the content generation agent—the one that actually writes what customers see—is using a base language model that has no idea what makes your products sell. It retrieves product specs from your database. Great. It knows the laptop has 16GB RAM and a 512GB SSD. But it doesn’t know that your customers care more about battery life than processor speed. It doesn’t know that mentioning “all-day productivity” converts 3x better than listing technical specs. It doesn’t know that your brand voice should be friendly and accessible, not corporate and stiff.
So what does it produce? Generic garbage. “This laptop features powerful performance with ample storage and memory for all your computing needs.” That sentence could describe literally any laptop made in the last five years. No personality. No understanding of what your customers actually want to hear. No conversion.
You try fixing it with better prompts. You write: “Focus on benefits, not features. Use a friendly tone. Mention battery life prominently.” Sometimes it works. Sometimes it ignores you completely and goes back to generic mode. Prompts can guide behavior, but they can’t teach domain knowledge that the model fundamentally doesn’t have.
The solution is component-level fine-tuning. You take that content generation agent’s underlying model—the generator component—and you fine-tune it on examples of your best product descriptions. The ones that actually convert. The ones written by your top copywriters who understand your customers. You teach the model what good looks like in your specific domain. Then you plug that fine-tuned generator back into your CrewAI system.
Everything else stays the same. But now the output is actually good.
We’re building a multi-agent e-commerce recommendation system using CrewAI. Three agents work together: the Search Agent finds products matching customer queries, the Analyst Agent evaluates which products best fit the customer’s needs, and the Content Agent writes compelling product recommendations. That Content Agent is where we’ll focus our fine-tuning effort.
The Content Agent’s job is to take product data and customer context, then generate recommendations that actually make people want to buy. Before fine-tuning, it produces generic descriptions that could apply to any product. After fine-tuning the generator component, it produces targeted recommendations that highlight the right features, use the right tone, and address specific customer needs.
We’ll use real product review data from Amazon to fine-tune the generator. In production, you’d use your own product catalog and your best-performing descriptions, but this demonstrates the technique with publicly available data. The process is identical whether you’re using Hugging Face datasets or your internal data.
# Install required packages
!pip install crewai crewai-tools datasets pandas requests python-dotenv
import os
import json
import requests
import pandas as pd
from datasets import load_dataset
from crewai import Agent, Task, Crew, Process
from crewai_tools import tool
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Set API keys
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
UBIAI_API_KEY = os.getenv("UBIAI_API_KEY") # Get from UBIAI platform
UBIAI_API_URL = "https://ubiai.tools/Inference/" # Your fine-tuned generator endpoint
Load Real E-commerce Product Data
We’re using the Amazon product reviews dataset from Hugging Face. This contains real product information and customer reviews across multiple categories. The reviews tell us what features customers actually care about—information that’s gold for training a product recommendation generator.
⚠️ IMPORTANT: This uses Hugging Face data for demonstration purposes. For production systems, you should use YOUR company’s actual product catalog, your best-performing product descriptions, and your internal sales data. That’s what teaches the generator your specific brand voice and what actually converts in your market.
# Load Amazon product reviews dataset (Electronics category)
# This gives us real product information and what customers care about
dataset = load_dataset("McAuley-Lab/Amazon-Reviews-2023", "raw_review_Electronics", split="full", trust_remote_code=True)
# Convert to pandas for easier manipulation
df = pd.DataFrame(dataset)
# Display sample
print(f"Loaded {len(df)} product reviews")
print("\nSample review:")
print(df[['title', 'text', 'rating']].head(1))
Simulating a Product Catalog with Retrieval
In a real system, you’d connect to your actual product database or vector store with embeddings. For this demo, we’ll create a simple in-memory catalog from our product review data. This simulates what your Search Agent would retrieve from your RAG system.
# Create simplified product catalog from review data
# In production, this would be your actual product database
products = []
for idx, row in df.head(100).iterrows():
product = {
'id': row.get('parent_asin', f'PROD-{idx}'),
'title': row.get('title', 'Product'),
'category': 'Electronics',
'rating': row.get('rating', 0),
'reviews_summary': row.get('text', '')[:200] # First 200 chars of review
}
products.append(product)
print(f"Created catalog with {len(products)} products")
print("\nSample product:")
print(json.dumps(products[0], indent=2))
Get The Full Notebook From: https://discord.gg/UKDUXXRJtM
Create Tools for Product Search
CrewAI agents need tools to interact with your systems. We’ll create a simple search tool that finds products matching a query. In production, this would query your vector database or search engine.
@tool("Search Products")
def search_products(query: str) -> str:
"""
Search product catalog for items matching the query.
Returns product information including title, rating, and review highlights.
"""
# Simple keyword search (in production, use vector similarity)
query_lower = query.lower()
matches = []
for product in products:
if query_lower in product['title'].lower() or query_lower in product['reviews_summary'].lower():
matches.append(product)
if len(matches) >= 5: # Return top 5 matches
break
if not matches:
return "No products found matching your query."
# Format results
results = []
for p in matches:
results.append(f"""Product: {p['title']}
Rating: {p['rating']}/5
Customer feedback: {p['reviews_summary']}
ID: {p['id']}
""")
return "\n---\n".join(results)
Prepare Training Data for Generator Fine-Tuning
Now comes the critical part: preparing training data that teaches the generator how to write good product recommendations. We’re creating examples that show the model what information to emphasize, what tone to use, and how to connect product features to customer benefits.
Each training example has three parts: a system prompt that defines the agent’s role, an input that provides product context and customer needs, and an output that shows the ideal recommendation. The model learns by studying hundreds of these examples and internalizing the patterns.
# Prepare training data for the content generator
# This teaches the model HOW to write compelling product recommendations
training_data = []
# We'll create training examples from high-quality reviews
# In production, use your best-performing product descriptions
high_quality_reviews = df[df['rating'] >= 4.0].head(200)
for idx, row in high_quality_reviews.iterrows():
product_info = f"""
Product: {row.get('title', 'Product')}
Category: Electronics
Customer Rating: {row.get('rating', 0)}/5
Key Features: {row.get('text', '')[:150]}
"""
# Create an ideal recommendation based on the review
# This is what we want the generator to learn to produce
ideal_recommendation = f"""
I'd recommend the {row.get('title', 'product')} based on what you're looking for. Customers particularly love this product, giving it {row.get('rating', 0)} out of 5 stars.
What makes it stand out: {row.get('text', '')[:200]}
This would be a great choice if you value quality and reliability. The customer feedback consistently highlights its strong performance and good value for money.
"""
training_example = {
"system_prompt": "You are an e-commerce product recommendation specialist. Write compelling, specific recommendations that highlight what customers actually care about. Focus on benefits, use a friendly consultative tone, and reference real customer feedback.",
"input": product_info,
"output": ideal_recommendation.strip()
}
training_data.append(training_example)
print(f"Prepared {len(training_data)} training examples")
print("\nSample training example:")
print(json.dumps(training_data[0], indent=2))
Fine-Tune the Generator Component with UBIAI
This is where the magic happens. We’re taking our training data and fine-tuning a generator component specifically for writing product recommendations. UBIAI handles the entire fine-tuning process: it uploads your data, trains the model weights on your examples, and gives you back an API endpoint for your fine-tuned generator.
The training teaches the model your domain patterns—what features matter in electronics, what tone converts browsers to buyers, how to write recommendations that feel personal rather than generic. After 30-60 minutes, you have a generator that understands your e-commerce domain.
# Save training data in UBIAI format
# UBIAI expects JSONL with system_prompt, input, and output fields
training_file = "ecommerce_generator_training.csv"
with open(training_file, 'w') as f:
for example in training_data:
f.write(json.dumps(example) + '\n')
print(f"Saved training data to {training_file}")
print(f"Total examples: {len(training_data)}")
Get The Full Notebook From: https://discord.gg/UKDUXXRJtM
NEXT STEPS IN UBIAI PLATFORM
Log into UBIAI platform (https://ubiai.tools)
Navigate to Components → Create New Component
Select Generator as the component type
Upload your training file:
ecommerce_generator_training.jsonlChoose your approach:
Prompt Fine-Tuning (5–15 min)
- Tests 100+ prompt variations across models
- Fixes tone, format, instruction-following
- Great for behavioral issues
Weight Fine-Tuning (30–90 min)
- Trains model weights on your data
- Teaches domain knowledge and patterns
- Required for knowledge gaps
For this e-commerce use case: Start with prompt fine-tuning
- If results are good (>85% accuracy), deploy it
- If you need deeper domain understanding, upgrade to weight fine-tuning
After training completes, copy your API endpoint and key
Update
UBIAI_API_URLandUBIAI_API_KEY
Build the CrewAI Multi-Agent System
Now we build the actual multi-agent system. Three agents work together: the Search Agent finds relevant products using our search tool, the Analyst Agent evaluates which products best match customer needs, and the Content Agent generates the final recommendations. That Content Agent is the one using our fine-tuned generator from UBIAI.
This is component-level fine-tuning in action. We’re not retraining the entire agent system. We’re not touching the search logic or analysis logic. We’re only upgrading the specific component responsible for generating customer-facing text. Everything else stays exactly the same.
# Define the Search Agent
# This agent finds products matching customer queries
search_agent = Agent(
role='Product Search Specialist',
goal='Find relevant products that match customer needs and preferences',
backstory="""You are an expert at understanding customer intent and finding
the right products from the catalog. You know how to interpret vague queries
and surface products that truly match what customers are looking for.""",
tools=[search_products],
verbose=True,
allow_delegation=False
)
# Define the Analyst Agent
# This agent evaluates products and determines best matches
analyst_agent = Agent(
role='Product Analysis Expert',
goal='Analyze products and determine which best fit customer requirements',
backstory="""You are an expert at evaluating product features, customer reviews,
and ratings to determine the best match for customer needs. You consider price,
quality, features, and customer satisfaction in your analysis.""",
verbose=True,
allow_delegation=False
)
# Define the Content Agent
# This agent uses our FINE-TUNED GENERATOR to write recommendations
content_agent = Agent(
role='Product Recommendation Writer',
goal='Create compelling, specific product recommendations that drive conversions',
backstory="""You are an expert e-commerce copywriter who writes product
recommendations that actually make people want to buy. You focus on benefits
over features, use a friendly consultative tone, and highlight what real
customers love about products.""",
verbose=True,
allow_delegation=False
Get The Full Notebook From: https://discord.gg/UKDUXXRJtM
Create the Fine-Tuned Generator Integration
Here’s where we connect our fine-tuned UBIAI generator to the Content Agent. Instead of using a generic language model, the Content Agent now calls our specialized generator that’s been trained on e-commerce product recommendations.
def call_ubiai_generator(product_info: str, customer_context: str = "") -> str:
"""
Call the fine-tuned UBIAI generator to create product recommendations.
This is the KEY INTEGRATION: instead of a generic LLM, we're using a model
that's been fine-tuned specifically on e-commerce product recommendations.
"""
url = f"{UBIAI_API_URL}{UBIAI_API_KEY}"
# Construct the input for the fine-tuned generator
user_prompt = f"""
Customer Context: {customer_context if customer_context else 'Customer is looking for product recommendations'}
{product_info}
Write a compelling product recommendation that highlights what customers love and explains why this would be a great choice.
"""
data = {
"input_text": "",
"system_prompt": "You are an e-commerce product recommendation specialist. Write compelling, specific recommendations that highlight what customers actually care about.",
"user_prompt": user_prompt.strip(),
"temperature": 0.7 # Slightly creative but focused
}
try:
response = requests.post(url, json=data, timeout=30)
response.raise_for_status()
result = response.json()
return result.get('response', '').strip()
except Exception as e:
print(f"Error calling UBIAI generator: {e}")
return "Error generating recommendation. Please try again."
# Test the generator with a sample product
sample_product = products[0]
sample_info = f"""
Product: {sample_product['title']}
Rating: {sample_product['rating']}/5
Customer feedback: {sample_product['reviews_summary']}
"""
print("Testing fine-tuned generator...\n")
recommendation = call_ubiai_generator(sample_info, "Customer needs reliable electronics for daily use")
print("Generated Recommendation:")
print(recommendation)
Define Tasks for the Multi-Agent Workflow
Now we define what each agent actually does. Tasks specify the inputs, the expected outputs, and how agents work together in sequence. The Search Agent finds products, the Analyst Agent picks the best ones, and the Content Agent writes the recommendations using our fine-tuned generator.
def create_recommendation_tasks(customer_query: str, customer_context: str = ""):
"""
Create the task sequence for product recommendation.
"""
# Task 1: Search for relevant products
search_task = Task(
description=f"""
Search the product catalog for items matching this customer query: "{customer_query}"
Customer context: {customer_context if customer_context else 'General browsing'}
Find the most relevant products and provide their details including:
- Product title and ID
- Customer ratings
- Key features from reviews
""",
expected_output="A list of 3-5 relevant products with their details and ratings",
agent=search_agent
)
# Task 2: Analyze products and pick best matches
analysis_task = Task(
description="""
Analyze the products found in the search results. Consider:
- How well they match the customer's needs
- Customer ratings and feedback
- Value for money
- Key features that stand out
Select the top 2-3 products that best match what the customer is looking for.
Provide reasoning for why each product is a good fit.
""",
expected_output="Top 2-3 product recommendations with analysis of why they're good matches",
agent=analyst_agent,
context=[search_task]
)
# Task 3: Generate compelling recommendations using fine-tuned generator
content_task = Task(
description="""
Using the analyzed products, create compelling product recommendations.
For each recommended product:
- Write in a friendly, consultative tone
- Highlight specific benefits that match customer needs
- Reference real customer feedback and ratings
- Explain why this product is a great choice
- Keep it conversational, not salesy
Use the fine-tuned generator to ensure high-quality, conversion-focused copy.
""",
expected_output="2-3 well-written product recommendations that feel personal and compelling",
agent=content_agent,
context=[analysis_task]
)
return [search_task, analysis_task, content_task]
Get The Full Notebook From: https://discord.gg/UKDUXXRJtM
Run the Complete Multi-Agent System
Time to see it all work together. We create a Crew with our three agents, give it a customer query, and watch the agents collaborate to produce recommendations. The Search Agent finds products, the Analyst Agent evaluates them, and the Content Agent writes compelling descriptions using our fine-tuned generator.
# Example customer query
customer_query = "wireless headphones for working from home"
customer_context = "Customer works from home, needs good audio quality for video calls and music, budget-conscious"
# Create tasks
tasks = create_recommendation_tasks(customer_query, customer_context)
# Create the crew
recommendation_crew = Crew(
agents=[search_agent, analyst_agent, content_agent],
tasks=tasks,
process=Process.sequential, # Tasks run in sequence
verbose=True
)
# Run the crew
print("\n🚀 Starting multi-agent recommendation process...\n")
result = recommendation_crew.kickoff()
print("\n" + "="*80)
print("FINAL PRODUCT RECOMMENDATIONS")
print("="*80)
print(result)
Get The Full Notebook From: https://discord.gg/UKDUXXRJtM
Monitoring and Iteration in Production
Deploying is not the end. Your product catalog changes. Customer preferences shift. New edge cases emerge. You need to monitor what’s working and what’s failing so you can iterate.
Track these metrics in your production system:
- Click-through rate: Are customers clicking recommended products?
- Conversion rate: Are recommendations actually driving purchases?
- Recommendation relevance: Manual spot-checks of whether recommendations make sense
- Edge case failures: Queries where the agent produces poor recommendations
- Customer feedback: Direct signals about recommendation quality
When you spot failures, collect those cases. If they’re formatting issues—inconsistent tone, wrong structure, not following instructions—that’s behavioral. Do prompt fine-tuning to fix it in 10 minutes. If they’re knowledge issues—doesn’t understand new product categories, misses key features, fails on niche use cases—that’s a knowledge gap. Do weight fine-tuning with targeted examples.
Most teams do monthly retraining for weight fine-tuning, incorporating new products and failure cases. They do weekly prompt optimization to fix behavioral drift. This keeps the agent reliable as your business evolves.
What We Built and Why It Matters
We built a multi-agent e-commerce recommendation system with CrewAI where three specialized agents collaborate: one finds products, one analyzes matches, and one writes recommendations. The critical upgrade was fine-tuning the Content Agent’s generator component on domain-specific product recommendation data.
This is component-level fine-tuning in action. We didn’t retrain the entire agent system. We didn’t touch the search logic or product analysis. We upgraded exactly one component—the generator responsible for writing customer-facing text—and plugged it back into the existing system. Everything else stayed the same. But the output quality jumped from generic spam to conversion-driving recommendations.
The approach works because we matched the technique to the problem. Product search and analysis were working fine with base models and RAG. The failure was in content generation—the model didn’t understand what makes good e-commerce copy. That’s a knowledge problem, not a RAG problem or a prompting problem. Fine-tuning the generator taught it domain patterns: what features matter in electronics, what tone converts browsers to buyers, how to write recommendations that feel personal.
In production, you’d start with prompt fine-tuning first. It takes 10 minutes and fixes 70-85% of issues—tone, formatting, instruction-following. If that gets you to acceptable quality, deploy it. If you need higher accuracy and better domain understanding, upgrade to weight fine-tuning. Costs more time and money, but gets you from 85% to 95%+ accuracy. Most successful systems use both: prompt tuning for quick behavioral fixes, weight tuning for knowledge gaps.
Your e-commerce agent doesn’t have to produce generic garbage. Fine-tune the generator component. Teach it your domain. Watch conversion rates actually improve.