Building Self-Improving AI Agents With UBIAI Reinforcement Learning

November 17, 2025

A Comprehensive Hands-On Guide¶

In this Blog, we’ll explore how to build AI agents that learn and improve from their own experiences using UBIAI framework. Unlike traditional AI agents that remain static after deployment, UBIAI enables continuous learning through reinforcement learning techniques.

What You’ll Learn¶

Understanding the core concepts
Setting up a complete training environment
Building a SQL-generation agent using LangGraph
Implementing reinforcement learning workflows
Debugging and optimizing agent performance

Introduction to UBIAI RLFT¶

The Challenge with Traditional AI Agents¶

Modern AI agents powered by large language models can perform impressive tasks – from answering questions to executing complex workflows. However, they suffer from a critical limitation: they don’t learn from their mistakes

Traditional frameworks like LangChain, AutoGen, and CrewAI provide excellent tools for building agents, but once deployed, these agents operate with fixed behaviors. When they encounter errors or suboptimal outcomes, they can’t automatically adjust their strategy.

UBIAI’s Solution¶

UBIAI introduces a learning layer on top of existing agent frameworks. The key innovation is agent disaggregation – separating the execution logic from the learning mechanism. This allows:

API integration with existing agents (LangChain, LangGraph, AutoGen, CrewAI, custom Python)
Automatic learning from success and failure signals
Progressive improvement through reinforcement learning cycles
Flexible reward systems for intermediate and final outcomes

Understanding UBIAI’s Approach to AI Training¶

What Makes UBIAI Different?¶

Traditional AI platforms require you to manage everything: data preprocessing, model selection, training infrastructure, and deployment. UBIAI simplifies this entire pipeline into a unified platform specifically designed for natural language processing and annotation tasks.

How Reinforcement Learning Works in UBIAI¶

UBIAI uses KTO (Kahneman-Tversky Optimization), a reinforcement learning algorithm designed for learning from human preferences. Here’s how it works:

1. Your model generates an SQL query
2. The query is executed and validated
3. Feedback is collected: ✓ (correct) or ✗ (incorrect)
4. This feedback is stored as training data
5. Periodically, the model retrains on accumulated feedback
6. The model learns to generate better queries over time

This creates a virtuous cycle where your model becomes more accurate with every deployment cycle.

The UBIAI Data Format¶

UBIAI uses a simple CSV format for training data with three columns:

System Prompt: Instructions that set the model’s behavior
User Prompt: The specific input (e.g., “What are the names of all students?”)
Response: The expected output (the SQL query)

For reinforcement learning, you also label each example as positive (correct) or negative (incorrect). The KTO algorithm uses these labels to understand what good outputs look like.

Now let’s dive into building our system!

Part 1: Environment Setup¶

First, we’ll set up our Python environment with all the necessary libraries. We’re keeping dependencies minimal—just what we need for API communication, data handling, and visualization.

In [ ]:

# Install required dependencies
!pip install -q requests pandas matplotlib seaborn sqlalchemy

In [ ]:

# Import essential libraries
import requests
import json
import pandas as pd
import sqlite3
import os
import time
from typing import Dict, List, Any, Tuple, Optional
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Configure visualization settings
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ All dependencies loaded successfully!")

Configuring Your UBIAI Connection¶

Next, we need to connect to your UBIAI account. You’ll need three pieces of information:

UBIAI Token: Your API authentication token (format: /your-token-id)
Project ID: The ID of your UBIAI project containing the training data
Base Model: The foundation model you’re fine-tuning (e.g., llama-3-1-8b-instruct)

Where to find these:

Token: UBIAI Dashboard → Settings → API Keys
Project ID: UBIAI Dashboard → Your Project → Project Settings
Base Model: The model you selected when creating your project

Replace the placeholder values below with your actual credentials:

In [ ]:

# UBIAI Configuration
# ⚠️ IMPORTANT: Replace these with your actual credentials
UBIAI_TOKEN = ""  # Your UBIAI API token
UBIAI_PROJECT_ID = ""  # Your project ID

# API Endpoints (no need to change these)
UBIAI_BASE_URL = "https://api.ubiai.tools:8443"
UBIAI_ANNOTATE_URL = f"{UBIAI_BASE_URL}/api_v1/annotate"  # For inference
UBIAI_TRAIN_URL = f"{UBIAI_BASE_URL}/api_v1/train_model"  # For training

# Model Settings
BASE_MODEL = "llama-3-1-8b-instruct"  # Your fine-tuned model base
TEMPERATURE = 0.7  # Controls randomness (0.0 = deterministic, 1.0 = creative)

print("UBIAI Configuration Summary:")
print("="*50)
print(f"  API Base URL: {UBIAI_BASE_URL}")
print(f"  Project ID: {UBIAI_PROJECT_ID}")
print(f"  Base Model: {BASE_MODEL}")
print(f"  Temperature: {TEMPERATURE}")
print("="*50)
print("\n✓ Configuration complete!")

Part 2: Building the UBIAI API Client¶

Now we’ll create a Python client to communicate with UBIAI’s APIs. This client will handle two main operations:

SQL Generation (Inference): Send a natural language question and receive a SQL query (This is a model we finetuned on ubiai before and that already focuses o generating sql queries)
Model Training: Initiate reinforcement learning training with new feedback data

The client abstracts away the HTTP request details, giving you simple Python methods to work with.

In [ ]:

class UBIAISQLClient:
    """
    Client for interacting with UBIAI's SQL generation and training APIs.

    This class provides two main capabilities:
    1. Generate SQL queries from natural language (inference)
    2. Initiate model training with reinforcement learning
    """

    def __init__(self, token: str, project_id: str, base_model: str = BASE_MODEL):
        """
        Initialize the UBIAI client.

        Args:
            token: Your UBIAI API authentication token
            project_id: Your UBIAI project ID
            base_model: The base model name (e.g., llama-3-1-8b-instruct)
        """
        self.token = token
        self.project_id = project_id
        self.base_model = base_model
        self.annotate_url = UBIAI_ANNOTATE_URL
        self.train_url = UBIAI_TRAIN_URL

    def generate_sql(
        self,
        question: str,
        schema: Optional[str] = None,
        temperature: float = TEMPERATURE,
        context: Optional[str] = None
    ) -> Dict[str, Any]:
        """
        Generate SQL query from a natural language question.

        This method calls UBIAI's inference API to generate a SQL query.
        You can provide additional context like database schema to improve accuracy.

        Args:
            question: Natural language question (e.g., "What are all student names?")
            schema: Optional database schema to provide context
            temperature: Sampling temperature (0.0-1.0, higher = more creative)
            context: Any additional context to help the model

        Returns:
            Dictionary containing:
            - success: Whether generation succeeded
            - query: The generated SQL query (if successful)
            - error: Error message (if failed)
        """

        # Build the system prompt - this sets the model's behavior
        system_prompt = (
            "You are an expert SQL developer. Generate precise, executable SQL queries "
            "from natural language questions. Return ONLY the SQL query without explanations "
            "or markdown formatting."
        )

        # Build the user prompt with all provided context
        user_prompt = f"Question: {question}"

        if schema:
            user_prompt += f"\n\nDatabase Schema:\n{schema}"

        if context:
            user_prompt += f"\n\nAdditional Context:\n{context}"

        user_prompt += "\n\nGenerate the SQL query:"

        # Prepare the API request payload
        data = {
            "input_text": question,
            "system_prompt": system_prompt,
            "user_prompt": user_prompt,
            "temperature": temperature,
            "monitor_model": True,  # Enable usage monitoring
            "knowledge_base_ids": [],
            "images_urls": []
        }

        try:
            # Make the API request
            response = requests.post(
                self.annotate_url + self.token,
                json=data,
                timeout=30  # 30 second timeout
            )
            response.raise_for_status()
            result = response.json()

            # Extract and clean the generated SQL
            if "output" in result:
                sql = result["output"].strip()
                # Remove markdown code blocks if present
                sql = sql.replace("```sql", "").replace("```", "").strip()

                return {
                    "success": True,
                    "query": sql,
                    "raw_response": result
                }
            else:
                return {
                    "success": False,
                    "error": "No output in response",
                    "raw_response": result
                }

        except requests.exceptions.RequestException as e:
            return {
                "success": False,
                "error": str(e),
                "query": None
            }

    def initiate_training(
        self,
        training_type: str = "reinforcement-kto",
        num_iterations: int = 20,
        batch_size: int = 4,
        dropout: float = 0.35,
        validation_split: int = 20
    ) -> Dict[str, Any]:
        """
        Initiate model training using UBIAI's training API.

        This method starts a training job on UBIAI's servers. The training
        will use the data already uploaded to your UBIAI project.

        Args:
            training_type: 'normale' (standard) or 'reinforcement-kto' (RL with KTO)
            num_iterations: Number of training iterations/epochs
            batch_size: Number of examples per training batch
            dropout: Dropout rate for regularization (0.0-1.0)
            validation_split: Percentage of data to use for validation

        Returns:
            Dictionary containing:
            - success: Whether training was initiated
            - response: Training job details from UBIAI
            - error: Error message (if failed)
        """

        # Prepare training configuration
        data = {
            "drop": dropout,
            "max_batch": batch_size,
            "nb_iter": str(num_iterations),
            "project": self.project_id,
            "selected_model": self.base_model,
            "selected_validation": str(validation_split),
            "model_type": "LLM",
            "with_annotate": False,
            "training_type": training_type,
        }

        try:
            # Initiate training via API
            response = requests.post(
                self.train_url + self.token,
                data=data,
                timeout=60
            )
            response.raise_for_status()
            result = response.json()

            return {
                "success": True,
                "response": result,
                "training_config": data
            }

        except requests.exceptions.RequestException as e:
            return {
                "success": False,
                "error": str(e)
            }

# Initialize the UBIAI client
ubiai_client = UBIAISQLClient(
    token=UBIAI_TOKEN,
    project_id=UBIAI_PROJECT_ID,
    base_model=BASE_MODEL
)

print("✓ UBIAI Client initialized successfully!")
print("\nYou can now:")
print("  1. Generate SQL queries with: ubiai_client.generate_sql()")
print("  2. Initiate training with: ubiai_client.initiate_training()")

Testing Your Model: First SQL Generation¶

Before building the full system, let’s verify that your UBIAI model is working correctly. We’ll send a simple question and see what SQL query it generates.

What to expect:

If successful, you’ll see a clean SQL query
If there’s an error, check your API token and project ID
The quality of the output depends on your model’s training

Let’s try it:

In [ ]:

# Test with a simple question
test_question = "What are the names of all students?"

# Provide database schema for context
test_schema = "CREATE TABLE students (id INT, name VARCHAR(100), age INT, gpa FLOAT);"

print(f"Question: {test_question}")
print(f"Schema: {test_schema}")
print("\nGenerating SQL query...\n")

result = ubiai_client.generate_sql(
    question=test_question,
    schema=test_schema,
    temperature=0.7
)

print("="*70)
if result["success"]:
    print("✓ Query generated successfully!")
    print(f"\nGenerated SQL:\n{result['query']}")
else:
    print("✗ Generation failed")
    print(f"Error: {result['error']}")
    print("\nTroubleshooting:")
    print("  - Verify your UBIAI token is correct")
    print("  - Check that your project ID is valid")
    print("  - Ensure your model is deployed in UBIAI")
print("="*70)

Part 3: Building a Production-Ready SQL Agent¶

Now we’re getting to the exciting part: building a production-ready agent that can safely generate and execute SQL queries. In production environments, safety is paramount. You don’t want a bug or malicious input causing your agent to drop tables or delete critical data.

Safety Features We’re Implementing:¶

Query Validation: Inspect every generated query before execution
Forbidden Keywords: Block destructive operations (DROP, DELETE, UPDATE, etc.)
Read-Only Mode: Ensure queries can only read data, not modify it
SQL Injection Prevention: Detect and block suspicious patterns
Multiple Statement Protection: Prevent query chaining attacks
Comprehensive Logging: Track every query for auditing and debugging

How It Works:¶

User Question → Generate SQL → Validate Query → Execute (optional) → Log Results
                                    ↓
                            Block if dangerous

Let’s build it:

In [ ]:

class ProductionSQLAgent:
    """
    Production-ready SQL agent with comprehensive safety features.

    This agent acts as a protective layer between user questions and your database.
    It generates SQL queries using UBIAI, validates them for safety, and optionally
    executes them with proper error handling.
    """

    # List of SQL keywords that should be blocked in read-only mode
    FORBIDDEN_KEYWORDS = [
        "DROP",      # Drop tables/databases
        "DELETE",    # Delete data
        "UPDATE",    # Modify data
        "INSERT",    # Insert new data
        "ALTER",     # Alter table structure
        "CREATE",    # Create new tables
        "TRUNCATE",  # Remove all rows
        "GRANT",     # Grant permissions
        "REVOKE",    # Revoke permissions
        "REPLACE"    # Replace data
    ]

    def __init__(
        self,
        ubiai_client: UBIAISQLClient,
        read_only: bool = True,
        max_retries: int = 2,
        enable_logging: bool = True
    ):
        """
        Initialize the production SQL agent.

        Args:
            ubiai_client: Initialized UBIAI client for SQL generation
            read_only: If True, block all data modification operations
            max_retries: Number of retry attempts if generation fails
            enable_logging: Whether to log all queries and results
        """
        self.ubiai_client = ubiai_client
        self.read_only = read_only
        self.max_retries = max_retries
        self.enable_logging = enable_logging
        self.query_log = []  # Store all query executions

        # Initialize statistics tracking
        self.stats = {
            "total_queries": 0,
            "successful_queries": 0,
            "failed_queries": 0,
            "blocked_queries": 0
        }

    def validate_query(self, query: str) -> Tuple[bool, Optional[str]]:
        """
        Validate SQL query for safety before execution.

        This method performs multiple security checks on the generated query
        to prevent malicious or destructive operations.

        Args:
            query: The SQL query to validate

        Returns:
            Tuple of (is_valid: bool, error_message: Optional[str])
        """
        # Check if query is valid string
        if not query or not isinstance(query, str):
            return False, "Invalid query format"

        query_upper = query.upper()

        # Check for forbidden operations (only in read-only mode)
        if self.read_only:
            for keyword in self.FORBIDDEN_KEYWORDS:
                if keyword in query_upper:
                    return False, f"Forbidden operation detected: {keyword} (read-only mode)"

        # Check for multiple statements (SQL injection risk)
        semicolon_count = query.count(";")
        if semicolon_count > 1:
            return False, "Multiple statements not allowed (security risk)"

        # Check for SQL injection patterns
        suspicious_patterns = [
            "--",   # SQL comments
            "/*",   # Block comments
            "*/",   # Block comments
            "xp_",  # SQL Server extended procedures
            "sp_"   # SQL Server stored procedures
        ]

        for pattern in suspicious_patterns:
            if pattern in query:
                return False, f"Suspicious pattern detected: {pattern} (potential SQL injection)"

        # All checks passed
        return True, None

    def execute_query(
        self,
        db_path: str,
        query: str,
        timeout: int = 10
    ) -> Tuple[Any, Optional[str]]:
        """
        Execute SQL query against a SQLite database with timeout protection.

        Args:
            db_path: Path to the SQLite database file
            query: SQL query to execute
            timeout: Maximum execution time in seconds

        Returns:
            Tuple of (results: Any, error_message: Optional[str])
        """
        try:
            # Connect to database with timeout
            conn = sqlite3.connect(db_path, timeout=timeout)
            cursor = conn.cursor()

            # Execute query
            cursor.execute(query)
            results = cursor.fetchall()

            # Clean up
            conn.close()

            return results, None

        except sqlite3.OperationalError as e:
            return None, f"Database operation failed: {str(e)}"
        except Exception as e:
            return None, f"Execution error: {str(e)}"

    def generate_and_validate(
        self,
        question: str,
        db_path: Optional[str] = None,
        schema: Optional[str] = None,
        execute: bool = False,
        verbose: bool = True
    ) -> Dict[str, Any]:
        """
        Complete pipeline: Generate SQL, validate, and optionally execute.

        This is the main method you'll use in production. It orchestrates
        the entire workflow from question to result.

        Args:
            question: Natural language question
            db_path: Database path (required if execute=True)
            schema: Optional database schema for context
            execute: Whether to execute the query (default: False for safety)
            verbose: Print detailed progress information

        Returns:
            Dictionary containing:
            - timestamp: When the query was processed
            - question: Original question
            - query: Generated SQL
            - validation_passed: Whether validation succeeded
            - execution_successful: Whether execution succeeded (if executed)
            - results: Query results (if executed)
            - error: Error message (if any)
            - duration_ms: Processing time in milliseconds
        """
        start_time = datetime.now()

        if verbose:
            print(f"\n{'='*70}")
            print(f"Question: {question}")
            print("-" * 70)

        # Initialize result dictionary
        result = {
            "timestamp": start_time.isoformat(),
            "question": question,
            "query": None,
            "validation_passed": False,
            "execution_successful": False,
            "results": None,
            "error": None
        }

        # Step 1: Generate SQL query
        if verbose:
            print("[1/3] Generating SQL query...")

        gen_result = self.ubiai_client.generate_sql(
            question=question,
            schema=schema
        )

        if not gen_result["success"]:
            result["error"] = f"Generation failed: {gen_result['error']}"
            self.stats["failed_queries"] += 1

            if verbose:
                print(f"  ✗ {result['error']}")

            self._log_query(result)
            return result

        query = gen_result["query"]
        result["query"] = query

        if verbose:
            print(f"  ✓ Generated: {query}")

        # Step 2: Validate query for safety
        if verbose:
            print("[2/3] Validating query...")

        is_valid, validation_error = self.validate_query(query)

        if not is_valid:
            result["error"] = f"Validation failed: {validation_error}"
            self.stats["blocked_queries"] += 1

            if verbose:
                print(f"  ✗ {validation_error}")

            self._log_query(result)
            return result

        result["validation_passed"] = True

        if verbose:
            print("  ✓ Validation passed")

        # Step 3: Execute query (if requested)
        if execute:
            if verbose:
                print("[3/3] Executing query...")

            if not db_path:
                result["error"] = "Database path required for execution"
                self.stats["failed_queries"] += 1

                if verbose:
                    print(f"  ✗ {result['error']}")
            else:
                exec_results, exec_error = self.execute_query(db_path, query)

                if exec_error:
                    result["error"] = f"Execution failed: {exec_error}"
                    self.stats["failed_queries"] += 1

                    if verbose:
                        print(f"  ✗ {exec_error}")
                else:
                    result["execution_successful"] = True
                    result["results"] = exec_results
                    self.stats["successful_queries"] += 1

                    if verbose:
                        result_preview = exec_results[:3] if len(exec_results) > 3 else exec_results
                        print(f"  ✓ Execution successful")
                        print(f"  Results ({len(exec_results)} rows): {result_preview}")
                        if len(exec_results) > 3:
                            print(f"  ... and {len(exec_results) - 3} more rows")
        else:
            self.stats["successful_queries"] += 1
            if verbose:
                print("[3/3] Skipping execution (execute=False)")

        self.stats["total_queries"] += 1

        # Calculate processing duration
        end_time = datetime.now()
        result["duration_ms"] = (end_time - start_time).total_seconds() * 1000

        if verbose:
            print(f"\nCompleted in {result['duration_ms']:.0f}ms")
            print("="*70)

        self._log_query(result)

        return result

    def _log_query(self, result: Dict[str, Any]):
        """Internal method to log query execution."""
        if self.enable_logging:
            self.query_log.append(result)

    def get_statistics(self) -> Dict[str, Any]:
        """Get agent performance statistics."""
        total = self.stats["total_queries"]
        success_rate = self.stats["successful_queries"] / total if total > 0 else 0

        return {
            **self.stats,
            "success_rate": success_rate
        }

    def export_logs(self, filepath: str):
        """Export query logs to CSV file for analysis."""
        if self.query_log:
            df = pd.DataFrame(self.query_log)
            df.to_csv(filepath, index=False)
            print(f"✓ Logs exported to: {filepath}")
            print(f"  Total entries: {len(df)}")
        else:
            print("No logs to export")

# Initialize the production agent
agent = ProductionSQLAgent(
    ubiai_client=ubiai_client,
    read_only=True,       # Enable read-only mode for safety
    max_retries=2,        # Retry failed requests up to 2 times
    enable_logging=True   # Log all queries
)

print("✓ Production SQL Agent initialized!")
print("\nSafety Features Enabled:")
print(f"  • Read-only mode: {agent.read_only}")
print(f"  • Query validation: Enabled")
print(f"  • Logging: {agent.enable_logging}")
print(f"  • Blocked keywords: {len(agent.FORBIDDEN_KEYWORDS)}")

Testing the Production Agent¶

Let’s put our production agent through its paces with several test questions. This will demonstrate:

How the agent handles valid queries
How safety features block dangerous operations
The logging and statistics tracking in action

We’ll test with increasingly complex questions to see how the model performs:

In [ ]:

# Define a comprehensive set of test questions
test_questions = [
    "Show all student names and their GPAs",
    "What is the average age of students?",
    "List students with GPA greater than 3.5",
    "Count how many students are there",
    "Find the oldest student",
    "Show the top 5 students by GPA",
    "DELETE FROM students WHERE gpa < 2.0"  # This should be BLOCKED
]

# Database schema for context
example_schema = """
CREATE TABLE students (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    age INTEGER,
    gpa FLOAT CHECK (gpa >= 0.0 AND gpa <= 4.0)
);
"""

print("Testing Production Agent")
print(f"Total test questions: {len(test_questions)}")
print(f"Schema provided: Yes")
print(f"Execution mode: Validation only (no database execution)\n")

# Run all test questions
for i, question in enumerate(test_questions, 1):
    print(f"\n{'='*70}")
    print(f"Test {i}/{len(test_questions)}")
    print(f"{'='*70}")

    result = agent.generate_and_validate(
        question=question,
        schema=example_schema,
        execute=False,  # Just validate, don't execute
        verbose=True
    )

    # Small delay to respect rate limits
    time.sleep(0.5)

# Display final statistics
print("\n\n" + "="*70)
print("AGENT PERFORMANCE STATISTICS")
print("="*70)
stats = agent.get_statistics()
print(f"Total Queries: {stats['total_queries']}")
print(f"Successful: {stats['successful_queries']}")
print(f"Failed: {stats['failed_queries']}")
print(f"Blocked (Security): {stats['blocked_queries']}")
print(f"Success Rate: {stats['success_rate']:.2%}")
print("="*70)

Part 4: Feedback Collection and Self Improving System¶

This is where the magic happens. To enable continuous improvement through reinforcement learning, we need to collect feedback on every query the agent generates. This feedback becomes the training data for the next iteration of your model.

Understanding the Feedback Loop:¶

In production, here’s how feedback collection works:

User asks a question → Agent generates SQL
Query executes → Results are returned
User provides feedback → “This is correct” or “This is wrong”
System stores feedback → Question + Query + Label (positive/negative)
Threshold reached → Export feedback and retrain model

UBIAI’s Required Format:¶

UBIAI expects training data as a CSV with these columns:

System Prompt: Instructions for the model
User Prompt: The input (question + schema)
Response: The expected output (SQL query)

For reinforcement learning, each row also gets a label: positive (correct) or negative (incorrect).

Our FeedbackCollector class automates this entire process:

In [ ]:

class FeedbackCollector:
    """
    Collect and manage user feedback for continuous model improvement.

    This class handles the entire feedback lifecycle:
    1. Collect feedback from production usage
    2. Format feedback in UBIAI's required CSV format
    3. Monitor feedback accumulation
    4. Trigger retraining when threshold is reached
    5. Export data ready for UBIAI upload
    """

    def __init__(self, retrain_threshold: int = 50):
        """
        Initialize feedback collector.

        Args:
            retrain_threshold: Number of feedback examples before suggesting retrain
        """
        self.feedback_data = []
        self.retrain_threshold = retrain_threshold
        self.last_export_time = datetime.now()

    def add_feedback(
        self,
        question: str,
        generated_query: str,
        is_correct: bool,
        schema: Optional[str] = None,
        user_comment: Optional[str] = None,
        correct_query: Optional[str] = None
    ):
        """
        Add a feedback entry from production usage.

        Args:
            question: The user's natural language question
            generated_query: SQL query generated by the model
            is_correct: Whether the query was correct (True/False)
            schema: Database schema that was provided
            user_comment: Optional comment from user
            correct_query: Optional correct query (if user provided one)
        """

        # Build system prompt (consistent with training)
        system_prompt = (
            "You are an expert SQL developer. Generate precise, executable SQL queries "
            "from natural language questions. Return ONLY the SQL query without explanations "
            "or markdown formatting."
        )

        # Build user prompt
        user_prompt = f"Question: {question}"
        if schema:
            user_prompt += f"\n\nDatabase Schema:\n{schema}"
        user_prompt += "\n\nGenerate the SQL query:"

        # Determine which query to use as the response
        response_query = correct_query if correct_query else generated_query

        # Create feedback entry in UBIAI format
        feedback = {
            "timestamp": datetime.now().isoformat(),
            "System Prompt": system_prompt,       # UBIAI column 1
            "User Prompt": user_prompt,            # UBIAI column 2
            "Response": response_query,            # UBIAI column 3
            "is_correct": is_correct,
            "label": "positive" if is_correct else "negative",
            "user_comment": user_comment,
            "original_question": question,
            "generated_query": generated_query
        }

        self.feedback_data.append(feedback)

        # Check if we've reached the retraining threshold
        if len(self.feedback_data) >= self.retrain_threshold:
            print(f"\n⚡ RETRAIN THRESHOLD REACHED!")
            print(f"   Collected: {len(self.feedback_data)} feedback examples")
            print(f"   Threshold: {self.retrain_threshold}")
            print(f"\n   Recommendation: Export feedback and initiate retraining")

    def get_statistics(self) -> Dict[str, Any]:
        """Get current feedback statistics."""
        if not self.feedback_data:
            return {"message": "No feedback collected yet"}

        df = pd.DataFrame(self.feedback_data)

        positive_count = (df["is_correct"] == True).sum()
        negative_count = (df["is_correct"] == False).sum()
        accuracy = positive_count / len(df) if len(df) > 0 else 0
        until_retrain = max(0, self.retrain_threshold - len(self.feedback_data))

        return {
            "total_feedback": len(self.feedback_data),
            "positive_feedback": positive_count,
            "negative_feedback": negative_count,
            "accuracy": accuracy,
            "until_retrain": until_retrain,
            "ready_for_retrain": until_retrain == 0
        }

    def export_for_ubiai(self, output_dir: str = "data") -> str:
        """
        Export feedback in UBIAI's required CSV format.

        The exported CSV will have three columns:
        - System Prompt
        - User Prompt
        - Response

        Args:
            output_dir: Directory to save the CSV file

        Returns:
            Path to the exported CSV file
        """
        if not self.feedback_data:
            print("⚠️  No feedback to export")
            return None

        # Create output directory
        Path(output_dir).mkdir(exist_ok=True)

        # Generate timestamped filename
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filepath = f"{output_dir}/ubiai_training_data_{timestamp}.csv"

        # Create DataFrame with UBIAI's required columns
        df = pd.DataFrame(self.feedback_data)

        # Export only the columns UBIAI needs
        ubiai_columns = ["System Prompt", "User Prompt", "Response"]
        df_export = df[ubiai_columns]

        # Save to CSV
        df_export.to_csv(filepath, index=False)

        # Also save full data with labels for reference
        full_filepath = f"{output_dir}/feedback_full_{timestamp}.csv"
        df.to_csv(full_filepath, index=False)

        # Print summary
        print(f"\n{'='*70}")
        print("✓ FEEDBACK EXPORTED SUCCESSFULLY")
        print(f"{'='*70}")
        print(f"\nUBIAI Training File: {filepath}")
        print(f"Full Data (with labels): {full_filepath}")
        print(f"\nExport Summary:")
        print(f"  Total entries: {len(df)}")
        print(f"  Positive (correct): {(df['is_correct'] == True).sum()}")
        print(f"  Negative (incorrect): {(df['is_correct'] == False).sum()}")
        print(f"  Accuracy: {df['is_correct'].mean():.2%}")

        print(f"\n📋 NEXT STEPS:")
        print(f"  1. Open UBIAI platform: https://ubiai.tools")
        print(f"  2. Navigate to Project ID: {UBIAI_PROJECT_ID}")
        print(f"  3. Upload file: {filepath}")
        print(f"  4. Label examples as 'positive' or 'negative' in UBIAI")
        print(f"  5. Run the training cell in Part 5 below")
        print(f"{'='*70}\n")

        self.last_export_time = datetime.now()
        return filepath

    def visualize_feedback(self):
        """Create visualizations of feedback trends."""
        if not self.feedback_data:
            print("No feedback to visualize")
            return

        df = pd.DataFrame(self.feedback_data)

        fig, axes = plt.subplots(1, 2, figsize=(14, 5))

        # Chart 1: Feedback Distribution
        label_counts = df["label"].value_counts()
        colors = ['#2ecc71' if label == 'positive' else '#e74c3c' for label in label_counts.index]
        axes[0].bar(label_counts.index, label_counts.values, color=colors)
        axes[0].set_title('Feedback Distribution', fontsize=14, fontweight='bold')
        axes[0].set_ylabel('Count')
        axes[0].set_xlabel('Label')

        # Add value labels on bars
        for i, v in enumerate(label_counts.values):
            axes[0].text(i, v + 0.5, str(v), ha='center', fontweight='bold')

        # Chart 2: Cumulative Accuracy Over Time
        df['timestamp'] = pd.to_datetime(df['timestamp'])
        df = df.sort_values('timestamp')
        df['cumulative_accuracy'] = df['is_correct'].expanding().mean()

        axes[1].plot(range(len(df)), df['cumulative_accuracy'], linewidth=2, color='#3498db')
        axes[1].axhline(y=0.8, color='#2ecc71', linestyle='--', alpha=0.5, label='Target (80%)')
        axes[1].set_title('Cumulative Accuracy Over Time', fontsize=14, fontweight='bold')
        axes[1].set_xlabel('Number of Queries')
        axes[1].set_ylabel('Accuracy')
        axes[1].set_ylim([0, 1])
        axes[1].grid(True, alpha=0.3)
        axes[1].legend()

        plt.tight_layout()

        # Save visualization
        viz_path = 'data/feedback_analysis.png'
        Path('data').mkdir(exist_ok=True)
        plt.savefig(viz_path, dpi=300, bbox_inches='tight')
        plt.show()

        print(f"\n✓ Visualization saved to: {viz_path}")

# Initialize feedback collector
feedback_collector = FeedbackCollector(retrain_threshold=50)

print("✓ Feedback Collector initialized!")
print(f"\nConfiguration:")
print(f"  Retrain threshold: {feedback_collector.retrain_threshold} examples")
print(f"  Export format: UBIAI CSV (System Prompt, User Prompt, Response)")
print(f"  Automatic threshold monitoring: Enabled")

Simulating Production Usage with Feedback Collection¶

Now let’s simulate real usage where users submit questions and provide feedback on the results. In production, this feedback would come from actual users evaluating whether the generated SQL queries are correct.

In a real production system, feedback collection might look like:

# User submits question
result = agent.generate_and_validate(user_question, schema=db_schema)

# Query executes and user reviews results
user_feedback = get_user_feedback()  # "correct" or "incorrect"

# Collect feedback
feedback_collector.add_feedback(
    question=user_question,
    generated_query=result['query'],
    is_correct=(user_feedback == "correct"),
    schema=db_schema
)

For this tutorial, we’ll simulate this with several test queries:

In [ ]:

# Simulate production usage with feedback collection
import random

production_questions = [
    "What are the names of all students?",
    "Find students with GPA above 3.5",
    "Calculate the average student age",
    "How many students are enrolled?",
    "Show the top 5 students by GPA",
    "List students younger than 20",
    "What's the highest GPA in the database?",
    "Find all students named John",
    "Show students ordered by age descending",
    "Count students in each age group",
    "List students with GPA between 3.0 and 3.5",
    "Find the youngest student",
    "Show all students with their ages",
    "Calculate the median GPA",
    "List students whose names start with 'A'"
]

print("Simulating Production Usage")
print("="*70)
print(f"Total queries: {len(production_questions)}")
print(f"Feedback: Automatic (simulated user responses)")
print(f"Schema: Provided for all queries\n")

for i, question in enumerate(production_questions, 1):
    print(f"\n[{i}/{len(production_questions)}] {question[:60]}...")

    # Generate query
    result = agent.generate_and_validate(
        question=question,
        schema=example_schema,
        execute=False,
        verbose=False
    )

    # Simulate user feedback
    # In production, this would come from actual user evaluation
    # For simulation: 85% success rate (model performs well but not perfect)
    is_correct = result["validation_passed"] and random.random() > 0.15

    # Collect feedback if query was generated
    if result["query"]:
        feedback_collector.add_feedback(
            question=question,
            generated_query=result["query"],
            is_correct=is_correct,
            schema=example_schema,
            user_comment="Simulated production feedback"
        )

        status = "✓" if is_correct else "✗"
        print(f"  Generated: {result['query'][:60]}...")
        print(f"  Feedback: {status} {'Correct' if is_correct else 'Incorrect'}")
    else:
        print(f"  ✗ Generation failed: {result['error']}")

    # Rate limiting
    time.sleep(0.3)

print("\n" + "="*70)
print("FEEDBACK COLLECTION SUMMARY")
print("="*70)

stats = feedback_collector.get_statistics()
if "message" not in stats:
    print(f"\nTotal Feedback Collected: {stats['total_feedback']}")
    print(f"  Positive (Correct): {stats['positive_feedback']}")
    print(f"  Negative (Incorrect): {stats['negative_feedback']}")
    print(f"  Current Accuracy: {stats['accuracy']:.2%}")
    print(f"\nRetraining Status:")
    if stats['ready_for_retrain']:
        print(f"  ✓ Ready for retraining!")
    else:
        print(f"  ⏳ Need {stats['until_retrain']} more examples")
        print(f"  Progress: {stats['total_feedback']}/{feedback_collector.retrain_threshold}")
else:
    print(stats['message'])

Visualizing Feedback Trends¶

Let’s create visualizations to understand how our model is performing over time:

In [ ]:

# Visualize feedback data
feedback_collector.visualize_feedback()

Part 5: Export Feedback for UBIAI Training¶

Now comes the crucial step: exporting your collected feedback in the exact format UBIAI expects. This exported CSV file will be uploaded to your UBIAI project and used for reinforcement learning training.

What happens when you run this:

Creates a CSV with columns: System Prompt, User Prompt, Response
Also creates a full export with all metadata (labels, timestamps, etc.)
Provides step-by-step instructions for uploading to UBIAI

Important: The exported CSV is ready to upload, but you still need to label each example as “positive” or “negative” in the UBIAI interface before training.

In [ ]:

# Export feedback for UBIAI training
feedback_file = feedback_collector.export_for_ubiai(output_dir="data")

if feedback_file:
    print("\nPreview of exported data (first 3 rows):")
    print("="*70)
    df_preview = pd.read_csv(feedback_file)
    for idx, row in df_preview.head(3).iterrows():
        print(f"\nRow {idx + 1}:")
        print(f"  System Prompt: {row['System Prompt'][:80]}...")
        print(f"  User Prompt: {row['User Prompt'][:80]}...")
        print(f"  Response: {row['Response']}")
        print("-" * 70)

Part 6: Initiate Reinforcement Learning Training¶

This is where your model starts getting smarter! Once you’ve uploaded the feedback CSV to your UBIAI project and labeled the examples, you can initiate reinforcement learning training.

Understanding KTO Training Parameters:¶

training_type: "reinforcement-kto" uses KTO algorithm for learning from positive/negative labels
num_iterations: More iterati better learning, but longer training time and higher cost
batch_size: Number of examples processed together (4 is typical for 8B models)
dropout: Regularization to prevent overfitting (0.35 is a good default)
validation_split: Percentage held out for validation (20% is standard)

Before Running Training:¶

Upload the exported CSV to your UBIAI project
Label examples as “positive” or “negative” in UBIAI interface
Verify your UBIAI account has sufficient credits
Ensure the project has your base model selected

Set initiate_training = True below when ready:

In [ ]:

# Training configuration
training_config = {
    "training_type": "reinforcement-kto",  # KTO reinforcement learning
    "num_iterations": 20,                  # 20 epochs
    "batch_size": 4,                       # 4 examples per batch
    "dropout": 0.35,                       # 35% dropout for regularization
    "validation_split": 20                 # 20% validation set
}

print("Training Configuration:")
print("="*70)
for key, value in training_config.items():
    print(f"  {key}: {value}")
print("="*70)

print("\n⚠️  PRE-FLIGHT CHECKLIST:")
print("  ☐ Feedback CSV uploaded to UBIAI project")
print("  ☐ Examples labeled as positive/negative in UBIAI")
print("  ☐ UBIAI account has sufficient credits")
print("  ☐ Project ID and token are correct")
print("\n💡 Ready to train? Set 'initiate_training = True' in the next cell.")

In [ ]:

# Set to True when ready to start training
initiate_training = False  # Change to True to start training

if initiate_training:
    print("\nInitiating reinforcement learning training...")
    print("This may take several minutes to hours depending on dataset size.\n")

    training_result = ubiai_client.initiate_training(**training_config)

    print("="*70)
    if training_result["success"]:
        print("✓ TRAINING INITIATED SUCCESSFULLY!\n")
        print("Training Response:")
        print(json.dumps(training_result["response"], indent=2))
        print("\n📊 Monitor training progress:")
        print(f"  1. Visit: https://ubiai.tools")
        print(f"  2. Navigate to Project ID: {UBIAI_PROJECT_ID}")
        print(f"  3. Go to Models/Training tab")
        print(f"  4. Watch for training completion notification")
        print("\n⏰ Training typically takes 1-3 hours for small datasets")
        print("   You'll receive an email when training completes.")
    else:
        print("✗ TRAINING INITIATION FAILED\n")
        print(f"Error: {training_result['error']}")
        print("\n🔍 Troubleshooting:")
        print("  - Verify training data is uploaded to UBIAI project")
        print("  - Check that examples are properly labeled")
        print("  - Confirm your UBIAI token is valid")
        print("  - Ensure project has sufficient credits")
    print("="*70)
else:
    print("\n⏸️  Training not initiated.")
    print("   Set 'initiate_training = True' when ready.")
    print("\n   Remember: Upload and label your feedback in UBIAI first!")

Part 7: Monitoring Dashboard¶

While training runs on UBIAI’s servers, let’s review our current agent’s performance. This dashboard gives you a comprehensive view of how your system is performing and helps you track improvements over time.

In [ ]:

def create_monitoring_dashboard():
    """
    Create a comprehensive monitoring dashboard for the SQL agent.
    """
    print("\n" + "="*70)
    print("SQL AGENT MONITORING DASHBOARD")
    print("="*70)

    # Agent Statistics Section
    print("\n📊 AGENT PERFORMANCE")
    print("-" * 70)
    agent_stats = agent.get_statistics()
    print(f"  Total Queries Processed: {agent_stats['total_queries']}")
    print(f"  Successful: {agent_stats['successful_queries']}")
    print(f"  Failed: {agent_stats['failed_queries']}")
    print(f"  Blocked (Security): {agent_stats['blocked_queries']}")
    print(f"  Success Rate: {agent_stats['success_rate']:.2%}")

    # Feedback Statistics Section
    print("\n📝 FEEDBACK COLLECTION")
    print("-" * 70)
    feedback_stats = feedback_collector.get_statistics()
    if "message" in feedback_stats:
        print(f"  {feedback_stats['message']}")
    else:
        print(f"  Total Feedback: {feedback_stats['total_feedback']}")
        print(f"  Positive: {feedback_stats['positive_feedback']}")
        print(f"  Negative: {feedback_stats['negative_feedback']}")
        print(f"  Model Accuracy: {feedback_stats['accuracy']:.2%}")

    # Recent Activity Section
    print("\n📋 RECENT QUERIES (Last 5)")
    print("-" * 70)
    if agent.query_log:
        recent = agent.query_log[-5:]
        for i, log in enumerate(recent, 1):
            status = "✓" if log["validation_passed"] else "✗"
            question_preview = log['question'][:50] + "..." if len(log['question']) > 50 else log['question']
            print(f"  {i}. {status} {question_preview}")
            if log.get("error"):
                print(f"     Error: {log['error'][:60]}...")
    else:
        print("  No queries logged yet")

    # Training Readiness Section
    print("\n🎯 TRAINING READINESS")
    print("-" * 70)
    if "message" not in feedback_stats:
        if feedback_stats['ready_for_retrain']:
            print("  ✅ READY FOR RETRAINING!")
            print(f"  Collected: {feedback_stats['total_feedback']} examples")
            print(f"  Threshold: {feedback_collector.retrain_threshold} examples")
            print("\n  Next Steps:")
            print("    1. Export feedback (Part 5)")
            print("    2. Upload to UBIAI project")
            print("    3. Label examples in UBIAI")
            print("    4. Initiate training (Part 6)")
        else:
            remaining = feedback_stats['until_retrain']
            current = feedback_stats['total_feedback']
            threshold = feedback_collector.retrain_threshold
            progress = (current / threshold) * 100 if threshold > 0 else 0

            print(f"  ⏳ Collection in progress...")
            print(f"  Need {remaining} more examples")
            print(f"  Progress: {current}/{threshold} ({progress:.1f}%)")

            # Progress bar
            bar_width = 40
            filled = int(bar_width * progress / 100)
            bar = "█" * filled + "░" * (bar_width - filled)
            print(f"  [{bar}] {progress:.1f}%")
    else:
        print("  No feedback collected yet")

    # System Health Section
    print("\n🏥 SYSTEM HEALTH")
    print("-" * 70)
    print(f"  Agent Mode: {'Read-Only' if agent.read_only else 'Full Access'}")
    print(f"  Logging: {'Enabled' if agent.enable_logging else 'Disabled'}")
    print(f"  UBIAI Connection: Configured")
    print(f"  Project ID: {UBIAI_PROJECT_ID}")
    print(f"  Base Model: {BASE_MODEL}")

    print("\n" + "="*70)

# Display the dashboard
create_monitoring_dashboard()

Part 8: Export Logs and Reports¶

For compliance, debugging, and analysis purposes, it’s important to export all logs and create summary reports. This data can be invaluable for understanding model behavior and tracking improvements over time.

In [ ]:

# Create reports directory
reports_dir = Path("data/reports")
reports_dir.mkdir(exist_ok=True, parents=True)

timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

print("Exporting logs and generating reports...\n")

# 1. Export agent query logs
agent_logs_path = f"data/reports/agent_logs_{timestamp}.csv"
agent.export_logs(agent_logs_path)

# 2. Export feedback data (if exists)
if feedback_collector.feedback_data:
    feedback_path = f"data/reports/feedback_{timestamp}.csv"
    df_feedback = pd.DataFrame(feedback_collector.feedback_data)
    df_feedback.to_csv(feedback_path, index=False)
    print(f"✓ Feedback exported to: {feedback_path}")
    print(f"  Total feedback entries: {len(df_feedback)}")

# 3. Create JSON summary report
summary_path = f"data/reports/summary_{timestamp}.json"
summary = {
    "report_metadata": {
        "generated_at": datetime.now().isoformat(),
        "report_version": "1.0"
    },
    "agent_statistics": agent.get_statistics(),
    "feedback_statistics": feedback_collector.get_statistics(),
    "configuration": {
        "ubiai_project_id": UBIAI_PROJECT_ID,
        "base_model": BASE_MODEL,
        "temperature": TEMPERATURE,
        "read_only_mode": agent.read_only,
        "retrain_threshold": feedback_collector.retrain_threshold
    },
    "training_readiness": {
        "ready": feedback_collector.get_statistics().get("ready_for_retrain", False),
        "feedback_collected": len(feedback_collector.feedback_data),
        "threshold": feedback_collector.retrain_threshold
    }
}

with open(summary_path, 'w') as f:
    json.dump(summary, f, indent=2)

print(f"\n✓ Summary report exported to: {summary_path}")

print(f"\n{'='*70}")
print("EXPORT COMPLETE")
print(f"{'='*70}")
print(f"\nAll reports saved to: {reports_dir}/")
print(f"\nFiles created:")
print(f"  1. Agent query logs (CSV)")
print(f"  2. Feedback data (CSV)")
print(f"  3. Summary report (JSON)")
print(f"\nUse these files for:")
print(f"  • Performance analysis")
print(f"  • Compliance auditing")
print(f"  • Debugging issues")
print(f"  • Tracking improvements over time")

Conclusion: Your Journey to Production AI¶

Congratulations! You’ve just built a complete, production-ready SQL generation system with continuous improvement capabilities. Let’s recap what you’ve accomplished:

Resources:¶

UBIAI Platform: https://ubiai.tools
UBIAI Documentation: Check their docs for API updates

Remember:¶

The secret to a successful AI agent isn’t just building it—it’s continuously improving it based on real-world usage. Your model will get smarter with every retraining cycle, adapting to your specific use cases and user needs.

Start collecting feedback today, and watch your model evolve!

What are you waiting for?

Fine-tune Your Model for Free

The Services provided are really great, we received a genuine advice and at very reasonable cost. all the work went hassle-free and no complication.

Features

Case Studies

Company

Legal

Building Self-Improving AI Agents With UBIAI Reinforcement Learning

A Comprehensive Hands-On Guide¶

What You’ll Learn¶

Introduction to UBIAI RLFT¶

The Challenge with Traditional AI Agents¶

UBIAI’s Solution¶

Understanding UBIAI’s Approach to AI Training¶

What Makes UBIAI Different?¶

How Reinforcement Learning Works in UBIAI¶

The UBIAI Data Format¶

Part 1: Environment Setup¶

Configuring Your UBIAI Connection¶

Part 2: Building the UBIAI API Client¶

Testing Your Model: First SQL Generation¶

Part 3: Building a Production-Ready SQL Agent¶

Safety Features We’re Implementing:¶

How It Works:¶

Testing the Production Agent¶

Part 4: Feedback Collection and Self Improving System¶

Understanding the Feedback Loop:¶

UBIAI’s Required Format:¶

Simulating Production Usage with Feedback Collection¶

Visualizing Feedback Trends¶

Part 5: Export Feedback for UBIAI Training¶

Part 6: Initiate Reinforcement Learning Training¶

Understanding KTO Training Parameters:¶

Before Running Training:¶

Part 7: Monitoring Dashboard¶

Part 8: Export Logs and Reports¶

Conclusion: Your Journey to Production AI¶

Resources:¶

Remember:¶

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

Fine Tuning LLMs on Your Own Dataset