Fine-tuning Gemma 7B LLM with Hugging Face

Mar 8th 2024

In the dynamic realm of artificial intelligence, Google introduces Gemma, a cutting-edge open-source language model poised for versatile applications. This tutorial delves into the fine-tuning process of Gemma, enriching its capabilities with data sourced from UBIAI. Leveraging the collaborative power of Gemma and UBIAI, this tutorial demonstrates a strategic approach to enhance the model’s performance, making it a robust solution for responsible AI development.

The Gemma Lineup: 2B and 7B Models

Released in two sizes – Gemma 2B and Gemma 7B – Google aims to redefine the role of AI in various domains, from small work-based tasks like chatbots to more complex applications such as data analysis. Unlike its heavyweight counterparts, Gemma is designed to operate efficiently on a laptop, workstation, or within the Google Cloud ecosystem.

Unraveling the Gemma Architecture

Gemma’s standout performance in its size category is attributed to several key design choices. The model boasts a substantial vocabulary size of 256,000 words, dwarfing competitors like Llama 2 with a 32,000-word vocabulary. Furthermore, Gemma’s training on a colossal 6 trillion token dataset sets it apart from the resource requirements of its counterparts.

Delving into its architecture, Gemma showcases a resemblance to the Gemini models, particularly in the utilization of Nvidia GPUs for optimization. The collaboration with Nvidia ensures industry-leading performance from data centers to the cloud, aligning with Google’s commitment to accessibility in AI.

Gemma Meets the Open-Source Community

Google’s commitment to ethical AI is exemplified through Gemma’s release as an open-source model. Developers and researchers can access Gemma’s full architecture, training methodology, and model parameters under permissible licenses. This transparency not only fosters collaboration but also allows external scrutiny, reinforcing accountability in AI development.

Fine-Tuning Gemma: Unlocking the Full Potential

Fine-tuning Gemma stands as a crucial gateway to unleashing the true power and adaptability of this revolutionary AI model. While Gemma arrives pre-trained with impressive capabilities, fine-tuning allows developers to tailor the model to specific applications and nuances, optimizing its performance for diverse tasks. The ability to fine-tune Gemma opens avenues for customization, enabling developers to address domain-specific challenges and improve model outcomes. Whether it’s enhancing conversational nuances in chatbots or refining data analysis for intricate business needs, the fine-tuning process empowers developers to mold Gemma into a versatile tool that aligns precisely with their objectives.

Environment and Dataset Preparation

To fine-tune the LLM with Python API, we need to install the Python package, which you can run using the following code.

				
					!pip install pandas autotrain-advanced -q

Refining the Training Data Structure for Gemma Fine-Tuning with UBIAI Data

In the process of fine-tuning Gemma for enhanced performance, meticulous structuring of the training data becomes pivotal. Our approach involves integrating UBIAI data, specifically focusing on tables and their summarization capabilities. The dataset’s structure plays a crucial role in shaping Gemma’s adaptability to this unique domain.

Consequently, the training data is curated to feature conversational turns within a well-defined format. This format is characterized by markers that distinctly delineate between user inputs and model responses. Here, we draw inspiration from UBIAI’s rich dataset, ensuring that the integration aligns seamlessly with Gemma’s objective of table summarization. Each exchange is framed by and markers, demarcating user and model inputs. Consider the following exemplars:

Input Table:

Summary:

This structured data format ensures a rich and varied training set, enabling Gemma to comprehend and respond effectively across a diverse array of topics and queries.

Next, we must format our data for fine-tuning the GEMMA 7B model.

We would need a CSV file containing a text column for the fine-tuning with Hugging Face AutoTrain. However, we would use a different text format for the base and instruction models during the fine-tuning.

First, let’s look at the dataset we used for our sample.

				
					# Create a DataFrame
df = pd.DataFrame(data, columns=['text'])
df.head(5)

				
					# Save the DataFrame to a CSV file
df.to_csv(file_path, index=False)

print(f"DataFrame saved to: {file_path}")

With all the preparation set, we can now initiate the AutoTrain to fine-tune our Gemma model.

Logging to Hugging Face

To make sure the model can be uploaded to be used for Inference, it’s necessary to log in to the Hugging Face hub.

Getting a Hugging Face token

Steps:

Navigate to this URL: https://huggingface.co/settings/tokens

Create a write token and copy it to your clipboard
Run the code below and enter your token

				
					from huggingface_hub import notebook_login
notebook_login()

Training and Fine-tuning

Let’s set up the Hugging Face AutoTrain environment to fine-tune the Gemma model. First, let’s run the AutoTrain setup using the following command.

				
					!autotrain setup --update-torch

				
					> INFO    Installing latest xformers
> INFO    Successfully installed latest xformers
> INFO    Installing latest PyTorch
> INFO    Successfully installed latest PyTorch

Fine-Tuning Gemma with AutoTrain: Key Parameters

Fine-tuning Gemma using AutoTrain involves a meticulous configuration of parameters to ensure optimal model adaptation. The following bullet points break down the crucial parameters specified in the command:

–train: Initiates the fine-tuning process for Gemma.

–model: Specifies the base model for fine-tuning, allowing customization based on project requirements.

–project-name: Assigns a distinct project name, aiding organization and identification in the training pipeline.

–data-path: Points to the location of the training data, facilitating seamless access during the fine-tuning process.

–lr: Sets the learning rate, a pivotal factor influencing the rate at which the model adapts to new data.

–batch-size: Defines the batch size, determining the number of training samples utilized in each iteration.

–epochs: Specifies the number of training epochs, indicating how many times the entire training dataset is processed.

–block-size: Establishes the block size, influencing how data is divided for processing during training.

–warmup-ratio: Incorporates a warm-up ratio, gradually increasing the learning rate at the beginning of training to enhance stability.

–lora-r: Introduces the LoRA regularization term, a key element for improving the robustness of Gemma during fine-tuning.

–lora-alpha: Determines the alpha parameter in LoRA, contributing to the regularization process.

–lora-dropout: Specifies the dropout rate for LoRA, influencing the network’s resistance to overfitting.

–weight-decay: Incorporates weight decay, controlling the contribution of regularization to the overall loss.

–gradient-accumulation: Adjusts the gradient accumulation steps, impacting the optimization process.

–quantization: Enables quantization, a technique for reducing model size and improving inference speed.

–target-modules: Identifies specific target modules for quantization, focusing on q_proj and v_proj in this instance.

–mixed-precision: Activates mixed-precision training, enhancing computational efficiency.

–peft: Conditionally includes PEFT (Ponder-Execute-Fine-Tune) for additional training robustness.

–push-to-hub: Conditionally pushes the fine-tuned Gemma model to the Hugging Face model hub, contingent on user preference and authentication.

				
					import os
project_name = 'UBIAI_finetuned_gemma'
model_name = 'google/gemma-2b'

				
					#Push to Hub?
#Use these only if you want to push your trained model to a private repo in your Hugging Face Account
#If you dont use these, the model will be saved in Google Colab and you are required to download it manually.
#Please enter your Hugging Face write token. The trained model will be saved to your Hugging Face account.
#You can find your token here: https://huggingface.co/settings/tokens
push_to_hub = False
hf_token = "HUGGINGFACE_TOKEN"
repo_id = "username/repo_name"


#Hyperparameters
learning_rate = 2e-4
num_epochs = 5
batch_size = 4
block_size = 256
trainer = "sft"
warmup_ratio = 0.1
weight_decay = 0.01
gradient_accumulation = 4
mixed_precision = "fp16"
peft = True
quantization = "int4"
lora_r = 16
lora_alpha = 32
lora_dropout = 0.05




os.environ["PROJECT_NAME"] = project_name
os.environ["MODEL_NAME"] = model_name
os.environ["PUSH_TO_HUB"] = str(push_to_hub)
os.environ["HF_TOKEN"] = hf_token
os.environ["REPO_ID"] = repo_id
os.environ["LEARNING_RATE"] = str(learning_rate)
os.environ["NUM_EPOCHS"] = str(num_epochs)
os.environ["BATCH_SIZE"] = str(batch_size)
os.environ["BLOCK_SIZE"] = str(block_size)
os.environ["WARMUP_RATIO"] = str(warmup_ratio)
os.environ["WEIGHT_DECAY"] = str(weight_decay)
os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)
os.environ["MIXED_PRECISION"] = str(mixed_precision)
os.environ["PEFT"] = str(peft)
os.environ["QUANTIZATION"] = str(quantization)
os.environ["LORA_R"] = str(lora_r)
os.environ["LORA_ALPHA"] = str(lora_alpha)
os.environ["LORA_DROPOUT"] = str(lora_dropout)

				
					!autotrain llm \
--train \
--model ${MODEL_NAME} \
--project-name ${PROJECT_NAME} \
--data-path /content/data \
--lr ${LEARNING_RATE} \
--batch-size ${BATCH_SIZE} \
--epochs ${NUM_EPOCHS} \
--block-size ${BLOCK_SIZE} \
--warmup-ratio ${WARMUP_RATIO} \
--lora-r ${LORA_R} \
--lora-alpha ${LORA_ALPHA} \
--lora-dropout ${LORA_DROPOUT} \
--weight-decay ${WEIGHT_DECAY} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--quantization ${QUANTIZATION} \
--target-modules q_proj,v_proj \
--mixed-precision ${MIXED_PRECISION} \
$( [[ "$PEFT" == "True" ]] && echo "--peft" ) \
$( [[ "$PUSH_TO_HUB" == "True" ]] && echo "--push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}" )



> INFO    Running LLM
> INFO    Starting local training...
> INFO    
🚀 INFO   | | __main__:process_input_data:109 - Train data: Dataset({
    features: ['text'],
    num_rows: 200
})
🚀 INFO   | | __main__:process_input_data:110 - Valid data: None
`low_cpu_mem_usage` was None, now set to True since model is quantized.
Loading checkpoint shards: 100% 2/2 [00:05<00:00,  2.95s/it]
🚀 INFO   | | __main__:train:321 - Using block size 64
Running tokenizer on train dataset: 100% 30/30 [00:00<00:00, 3416.02
🚀 INFO   | | __main__:train:383 - creating trainer
  0%
100% 40/40 [00:39<00:00,  1.00it/s]
🚀 INFO   | 2024-02-28 12:59:33 | __main__:train:521 - Finished training, saving model...

If the fine-tuning process succeeds, we will have a new directory of our fine-tuned model. We would use this directory to test our newly fine-tuned model.

				
					from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "content/UBIAI_finetuned_gemma"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

With the model and tokenizer ready to use, we would try the model with an input example.

				
					input_text = """
2023	2022	2021
Gross Margin			
Products	$108,803	$114,728	$105,126
Services	$60,345	$56,054	$47,710
Total Gross Margin	$169,148	$170,782	$152,836
Gross Margin Percentage			
Products	36.5%	36.3%	35.3%
Services	70.8%	71.7%	69.7%
Total Gross Margin Percentage	44.1%	43.3%	41.8%
Provision for Income Taxes	$16,741	$19,300	$14,527
Effective Tax Rate	14.7%	16.2%	13.3%
Statutory Federal Income Tax Rate	21%	21%	21%"""
input_ids = tokenizer.encode(input_text, return_tensors="pt")
output = model.generate(input_ids, max_new_tokens = 200)
predicted_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(predicted_text)

Output:

Summarization:

The financial performance data for the years 2021, 2022, and 2023 reveals a consistent increase in gross margin, with Products and Services contributing significantly. In 2023, the total gross margin reached $169.15 million. While the gross margin percentage for Products slightly rose, Services maintained a high percentage, resulting in an overall increase in the total gross margin percentage to 44.1%.

Despite a rise in income taxes provision to $16.74 million in 2023, the effective tax rate decreased to 14.7%. This contrasts with the statutory federal income tax rate, which remained constant at 21% throughout the analyzed period. Overall, these figures suggest a resilient financial performance with effective tax management in the given years.

Conclusion:

Our expedition with Gemma, from meticulous data structuring to AutoTrain commands, has been transformative. The integration of UBIAI data significantly elevates Gemma’s ability to summarize complex tables. This fine-tuning not only optimizes performance but also aligns with ethical AI development principles, emphasizing transparency and accountability.

Gemma, in its 2B and 7B configurations, stands as a versatile toolkit, promising innovation in chatbots, data analysis, and content creation. This journey marks a milestone in unlocking Gemma’s potential for impactful and ethical AI solutions, particularly in the domain of table summarization.

What are you waiting for?

Automate your process!

The Services provided are really great, we received a genuine advice and at very reasonable cost. all the work went hassle-free and no complication.

Fine-tuning Gemma 7B LLM with Hugging Face

Mar 8th 2024

The Gemma Lineup: 2B and 7B Models

Unraveling the Gemma Architecture

Gemma Meets the Open-Source Community

Fine-Tuning Gemma: Unlocking the Full Potential

Environment and Dataset Preparation

Refining the Training Data Structure for Gemma Fine-Tuning with UBIAI Data

Input Table:

Summary:

Output:

Conclusion:

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Fine-tuning Gemma 7B LLM with Hugging Face

Mar 8th 2024

The Gemma Lineup: 2B and 7B Models

Unraveling the Gemma Architecture

Gemma Meets the Open-Source Community

Fine-Tuning Gemma: Unlocking the Full Potential

Environment and Dataset Preparation

Refining the Training Data Structure for Gemma Fine-Tuning with UBIAI Data

Input Table:

Summary:

Output:

Conclusion:

What are you waiting for?

Automate your process!

Features

Case Studies

Company

Legal

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost

Fine Tuning LLMs on Your Own Dataset