ubiai deep learning

The Surprising Efficiency of Fine-tuned LLMs on Patent Claim Generation

Dec 25th, 2024

The Surprising Efficiency of Fine-tuned LLMs on Patent Claim Generation

As LLM models grow larger and more complex, aligning them with specific tasks presents notable challenges. A pertinent example is the drafting of patent claims, which must succinctly and precisely outline the scope of an invention while adhering to legal and technical standards.

 

Patent claims play a crucial role in the patent filing process by defining the boundaries of intellectual property protection. They enable examiners, attorneys, and potential competitors to understand the scope of an innovation and determine its uniqueness. Seasoned patent attorneys develop this skill over the years, balancing intricate details with concise language.

The Surprising Efficiency of Fine-tuned LLMs on Patent Claim Generation - intro

Attempts to automate this process using standard GPT models have encountered difficulties. For instance, research indicates that while GPT-4 can generate patent claims, their effectiveness significantly diminishes with subsequent dependent claims.

These findings underscore the limitations of relying solely on prompt engineering with vanilla GPT models for such nuanced tasks. The generated claims frequently omit critical details, include ambiguities, or fail to meet the stringent requirements of patent law, demonstrating the need for more specialized approaches to replicate the expertise of human patent drafters effectively.

In this tutorial, we will show how fine-tuning a smaller LLM for patent claim generation outperforms larger LLMs like GPT-4o.

The Solution: Fine-Tuning Small LLMs

Fine-tuning smaller LLMs presents many advantages:

  • Lower cost per request
  • Low latency
  • Greater accuracy than larger models when fine-tuned
  • In the case of open-source LLMs, they can be used privately on-premises within our own server.

The last point about privacy is particularly important in the legal field: intellectual property is one of the most critical assets for companies, and if leaked, it could cause major setbacks or even bankruptcy. Therefore, keeping the data private is of the utmost importance. Open-source LLMs can indeed solve this issue.

Fine-tuning offers a targeted method to align GPT models with specific tasks like drafting patent claims.

The Process of Fine-tuning LLMs on Patent Claim Generation

To fine-tune a model, we need a few hundred to a thousand examples, each with an input and a desired output.

For this tutorial, we are going to use the full description of the patent as the input and the actual claims as the output.

1- Data Collection:

We will scrape a dataset of a thousand patent abstracts along with their corresponding claims. The SERP API has a free public API that can be used to extract patent information from Google Patents.

Scraped Patent
Scraped Patent

2. Dataset Preparation

Next, we write a small script that extracts the content and the claims for the JSON file and organizes the data in a CSV file, with the claims in the input column and the corresponding abstract in the response column. In addition, we added two columns for the system prompt and user prompt with fixed values which will be fed to the LLM during fine-tuning.

Processed CSV file
Processed CSV file

3. Upload the dataset to the UbiAI platform:

For this tutorial, we are going to use the UbiAI’s platform to fine-tune our LLMs. We will fine-tune both GPT4o-mini and open-source Llama 3.1 and compare their performance to generic larger models such as GPT-4o.

UbiAI Text Completion Interface
UbiAI Text Completion Interface

4. Fine-tuning:

The final step after uploading and reviewing the data is to launch the fine-tuning process. Using UbiAI’s no-code platform, we can launch the training seamlessly with just a few clicks. It usually takes between 1h–4h depending on the size of the dataset:

UbiAI Model Training Interface
UbiAI Model Training Interface

In addition to commercial models like GPT4-o, UbiAI supports open-source LLM fine-tuning such as LLama 3.1 and Mistral allowing users to benchmark multiple fine-tuned models against each other and select the best one for the task.

Comparing the Results

Human Evaluation:

After fine-tuning the model, we are now ready to test it on new unseen patents.

 

UbiAI has a playground feature, where you can test all your fine-tuned models:

UbiAI model Testing Playground
UbiAI Testing Playground

To test the fin-tuned model, we will provide the description of a new unseen patent found in Google Patents and ask the model to generate the claims, here is the system and user prompt used:

				
					System prompt: “You are a patent expert in claim generation.”

User prompt: “Given the patent description below generate x claims”
				
			

Note that, x here refers to the number of claims as shown in the patent itself. The reason we fixed the number of claims is to be able to do an apple-to-apple comparison between the generated output and the actual input this allows the model to focus solely on generating semantically sound claims.

Below is an example of claims generated by the fine-tuned GPT-4o-mini model from a description found in this patent:

				
					Here is the text formatted correctly:

1. A spintronic device that includes a graphene sheet having first and second surfaces and at least two electrode terminals, wherein the at least two electrode terminals are spaced apart and are in electrical contact with the graphene sheet. The graphene sheet is configured to conduct an electric signal between the at least two electrode terminals and to affect an electrical signal based on a magnetic field effect induced in the graphene sheet, with an electrical property of the graphene sheet related to the magnetic field effect induced in the graphene sheet.  

2. The spintronic device of claim 1, wherein the graphene sheet has a thickness of less than about 10 nm.

3. The spintronic device of claim 1, wherein the graphene sheet has a thickness of a single monolayer of carbon.

4. The spintronic device of claim 1, wherein the graphene sheet has a thickness of multiple monolayers of graphene.

5. The spintronic device of claim 1, wherein the graphene sheet possesses a non-zero magnetization.

6. The spintronic device of claim 1, further comprising at least one magnetic material in contact with at least a portion of the first or second surface of the graphene sheet.

7. The spintronic device of claim 6, wherein the at least one magnetic material has a non-zero magnetization which induces the magnetic field effect in the graphene sheet.

8. The spintronic device of claim 6, wherein the at least one magnetic material is disposed on the graphene sheet as a continuous layer.

9. The spintronic device of claim 6, wherein the at least one magnetic material is disposed on the graphene sheet as two or more non-continuous discrete sections.

10. The spintronic device of claim 6, wherein the magnetic material comprises either a ferromagnetic material or an antiferromagnetic material, and further wherein the ferromagnetic material is in contact with the graphene sheet, or the antiferromagnetic material is in contact with the graphene sheet and the ferromagnetic material is in contact with the antiferromagnetic material.

11. The spintronic device of claim 6, wherein magnetic moments of the magnetic materials are correlated.

12. The spintronic device of claim 6, wherein the alignment of the magnetic moments of the magnetic materials is correlated.

13. The spintronic device of claim 6, wherein the electrical property of the graphene sheet is further based on a magnetic field effect induced in the graphene sheet by the at least one magnetic material.

14. The spintronic device of claim 1, wherein the at least two electrode terminals include a third electrode terminal, and further wherein an electrical property of the graphene sheet is based on an electrical input signal supplied to the at least three electrode terminals.

15. The spintronic device of claim 14, wherein the at least three electrode terminals include a gate electrode terminal and further wherein a transistor effect is induced in the graphene sheet based on the magnetic field effect induced in the graphene sheet and an electrical input signal supplied to the gate electrode terminal.

16. The spintronic device of claim 1, wherein an excitonic insulator is formed in part of the graphene sheet based on the magnetic field effect induced in the graphene sheet, and further wherein an electrical property of the graphene sheet is based on a tunneling current through the excitonic insulator.

17. The spintronic device of claim 16, further including at least two more magnetic materials disposed on either surface of the graphene sheet to form a magnetic trilayer.

18. The spintronic device of claim 17, further including another layer of graphene supported by the trilayer magnetic material.

19. The spintronic device of claim 16, further including a first magnetic material in contact with the first surface of the graphene sheet, and a second magnetic material in contact with the second surface of the graphene sheet, where the first and second magnetic materials have different magnetizations to, respectively, exhibit a first magnetic field effect and a second magnetic field effect in the graphene sheet.

20. A method of using a spintronic device that includes a graphene sheet having first and second surfaces and two electrode terminals, spaced apart and in electrical contact with the graphene sheet. The graphene sheet is configured to conduct an electric signal between the two electrode terminals and to affect an electrical signal based on a magnetic field effect induced in the graphene sheet, with an electrical property of the graphene sheet related to the magnetic field effect induced in the graphene sheet. The method comprising:
   - Supplying an electric input signal to one of the two electrode terminals. 
   - Converting the electric input signal into a spin-polarized electric current using the magnetic field effect induced in the graphene sheet, wherein the converting generates a spin-flipped electric output current, and the spin-flipped electric output current is supplied to the other of the two electrode terminals.

21. The method of claim 20, further including magnetizing in a magnetic field, and gradually reducing the magnetic field to zero or near zero.

22. The method of claim 21, wherein the magnetic field is applied along the graphene sheet.

23. The method of claim 20, further including inducing a magnetization in part of the graphene sheet, where the magnetization is less than an ordering temperature of a magnetic material in contact with the part of the graphene sheet.

24. The method of claim 20, further including applying an electric field to the graphene sheet, thereby inducing a magnetization in the graphene sheet.

25. The method of claim 20, further including supplying an electric input signal to all three terminals of the device, thereby inducing a current dependent on the spin of the charge carriers responsible for the current and the gate potential.

26. A spintronic device that includes: a graphene-(antiferro)-ferromagnet multilayer (GMM) having a graphene sheet with first and second surfaces and at least two electrode terminals, wherein the at least two electrode terminals are spaced apart and are in electrical contact with the graphene sheet. The graphene sheet is configured to:
   - Conduct an electric signal between the at least two electrode terminals; and
   - Affect an electrical signal based on a magnetic field effect induced in the graphene sheet, with an electrical property of the graphene sheet related to the magnetic field effect induced in the graphene sheet. A magnetic material is in contact with at least one of the surfaces of the graphene sheet and has a magnetization which induces the magnetic field effect in the graphene sheet.

27. The spintronic device of claim 26, wherein the magnetic material is disposed on the graphene sheet as two or more non-continuous discrete sections, where each section has substantially identical magnetizations to achieve a desired electrical characteristic, or wherein one or more sections have different magnetizations to achieve a desired electrical characteristic in a spintronic device.
				
			

Here is the actual abstract from the patent:

				
					1. A spintronic device comprising: a graphene sheet having first and second surfaces, an electrical property of said graphene sheet being based on a magnetic field effect induced in said graphene sheet; and at least two electrode terminals electrically coupled to said graphene sheet, said at least two electrode terminals being spaced apart; wherein said graphene sheet is configured to conduct an electric signal between said at least two electrode terminals, said graphene sheet configured to affect said electrical signal based on said magnetic field effect induced in said graphene sheet.
2. The spintronic device of Claim 1, further comprising: a first magnetic material in contact with at least a portion of said first or second surface, said magnetic material inducing said magnetic field effect in said graphene sheet.
3. The spintronic device of Claim 2, wherein said first magnetic material has a non-zero magnetization.
4. The spintronic device of Claim 2, wherein said first magnetic material is disposed on said graphene sheet as a plurality of non-continuous discrete sections.
5. The spintronic device of claim 2, wherein said first magnetic material comprises a ferromagnetic material.
6. The spintronic device of claim 2, wherein said first magnetic material comprises an antiferromagnetic material.
7. The spintronic device of claim 2, further comprising a second magnetic material in contact with said first magnetic material.
8. The spintronic device of Claim 7, wherein said second magnetic material does not contact said graphene sheet.
9. The spintronic device of Claim 8, wherein the first magnetic material comprises an antiferromagnetic material and the second magnetic material comprises a ferromagnetic material.
10. The spintronic device of Claim 7, wherein a magnetic moment of said first magnetic material and a magnetic moment of said second magnetic material are correlated.
11. The spintronic device of Claim 10, wherein the alignment of said magnetic moment of said first magnetic material and the alignment of said magnetic moment of said second magnetic material are correlated.
12. The spintronic device of claim 1, wherein said graphene sheet has a thickness less than about 10 nanometers.
13. The spintronic device of claim 1, wherein said graphene sheet has a thickness of a single monolayer of carbon.
14. The spintronic device of claim 1, wherein said graphene sheet possesses a non-zero magnetization.
15. An apparatus compri sing : a spintronic device having at least two electrode terminals and a graphene sheet, said spintronic device configured to conduct an electric signal between at least two electrode terminals and through the graphene sheet in a plane of the sheet, said spintronic device affecting said electrical signal based on a magnetic field effect induced in said graphene sheet, the electrical properties of said graphene sheet being related to said magnetic field effect induced in said graphene sheet.
16. The apparatus of Claim 15, wherein said spintronic device further comprises a first magnetic material in contact with at least a portion of said graphene sheet, said first magnetic material having a non-zero magnetization and operable to induce said magnetic field effect in said graphene sheet.
17. The apparatus of Claim 16, wherein said first magnetic material is disposed on a surface of said graphene sheet as a plurality of non-continuous discrete sections.
18. The apparatus of claim 16, wherein said first magnetic material comprises at least one of a ferromagnetic material and an antiferromagnetic material.
19. The apparatus of claim 16, further comprising a second magnetic material in contact with said first magnetic material and not contacting said graphene sheet, said first magnetic material comprising an antiferromagnetic material and the second magnetic material comprising a ferromagnetic material.
20. A method of using a spintronic device having a graphene magnet multilayer (GMM) device comprising: supplying an electrical input signal to a first electrode terminal that is in electrical contact with a graphene sheet, said graphene sheet affecting said electrical input signal based on a magnetic field effect induced in said graphene sheet; and obtaining an electrical output signal at one or more of said at least two electrode terminals in response to said electrical input signal, said electrical output signal being based on said electrical input signal and said magnetic field effect induced in said graphene.
21. The method of Claim 20, further comprising varying a magnitude of the magnetic field effect to vary an electrical property of said graphene sheet.
22. The method of Claim 20, wherein said electrical input signal is an electric current and the method further comprises polarizing said electric current based on said magnetic field effect induced in said graphene sheet.
23. The method of Claim 20, further comprising inducing a magnetization in a magnetic material of the GMM, said magnetic field effect induced in said graphene sheet being related to said magnetization in said magnetic material.
24. The method of Claim 20, further comprising inducing different magnetic field effects in different regions of said graphene.
25. A method of forming a spintronic device comprising: disposing a graphene sheet on a substrate; contacting said graphene sheet with a first magnetic material; disposing at least two electrode terminals on said graphene sheet to provide regions for electrically coupling to said graphene; and inducing a magnetization in said first magnetic material.
26. The method of Claim 25, wherein said magnetic material is an antiferromagnetic material and the method further comprises forming a second magnetic material comprising a ferromagnetic material on said first magnetic material.
27. The method of Claim 25, wherein inducing a magnetization in said first magnetic material further comprises: inducing a first magnetization in a first region; and inducing a second magnetization in a second region.
				
			

Visually speaking, the generated claims look similar to the reference patent.

But how does the fine-tuned model compare to the best state-of-the-art GPT-4o model?

Here is the output of GPT-4o:

				
					Independent Claims
A spintronic device, comprising:
a graphene sheet having first and second surfaces and at least two electrode terminals electrically coupled to the graphene sheet, wherein the graphene sheet is configured to conduct an electric signal and affect the electric signal based on a magnetic field effect induced in the graphene sheet.

A spintronic device, comprising:
a graphene sheet having a thickness of less than about 10 nanometers, at least one magnetic material in contact with the graphene sheet, and at least two electrode terminals electrically coupled to the graphene sheet, wherein the graphene sheet exhibits a nonzero magnetization and the electrical properties of the graphene sheet are based on the magnetic field effect induced by the magnetic material.

A method of using a spintronic device, comprising:
providing a graphene sheet having a magnetic field effect induced by contact with at least one magnetic material; applying an electrical input signal to the graphene sheet through a first electrode terminal; and obtaining an electrical output signal from a second electrode terminal, wherein the output signal is affected by the magnetic field effect.

A method for fabricating a spintronic device, comprising:
disposing a graphene sheet on a substrate; forming at least two electrodes on the graphene sheet; and inducing a magnetization in at least one magnetic material in contact with the graphene sheet to generate a magnetic field effect in the graphene sheet.

A graphene-magnet multilayer (GMM) device, comprising:
a graphene sheet disposed between at least one layer of ferromagnetic material and at least one layer of antiferromagnetic material, wherein the layers are configured to induce a magnetic field effect in the graphene sheet to produce spin-polarized electric current.

A spintronic transistor, comprising:
a graphene sheet in contact with a magnetic material and coupled to at least three electrode terminals, wherein the graphene sheet generates a spin-polarized current that is tunable by an applied gate voltage.

Dependent Claims
The device of claim 1, wherein the graphene sheet comprises a monolayer or multiple monolayers of graphene.

The device of claim 2, wherein the magnetic material is selected from ferromagnetic, antiferromagnetic, or ferrimagnetic materials.

The method of claim 3, further comprising varying the magnetic field effect by changing the magnetization of the magnetic material.

The method of claim 4, wherein the substrate comprises a non-magnetic or magnetic material.

The device of claim 5, wherein the ferromagnetic material is in direct contact with the graphene sheet and the antiferromagnetic material is disposed on the ferromagnetic material.

The device of claim 6, wherein the magnetic material is configured to induce Zeeman splitting in the graphene sheet.

The device of claim 1, wherein the graphene sheet is capped by an additional layer of magnetic material.

The device of claim 2, wherein the magnetic material is patterned as discrete sections to create spatial variations in the magnetic field effect.

The method of claim 3, wherein the magnetic field effect in the graphene sheet facilitates spin polarization of the electric input signal.

The method of claim 4, wherein the magnetization is induced by exposure to light or temperature variations.

The device of claim 5, wherein the graphene sheet is epitaxially grown on the substrate.

The device of claim 6, wherein the gate voltage selectively enhances or suppresses spin-polarized electron or hole currents.

The device of claim 1, wherein the graphene sheet exhibits ballistic conduction with minimal electron scattering.

The device of claim 2, wherein the magnetic material is configured to induce a Weiss exchange field in the graphene sheet.

The method of claim 3, further comprising inducing a spin-dependent splitting of electronic bands in the graphene sheet.

The method of claim 4, wherein the graphene sheet is formed by chemical vapor deposition or mechanical exfoliation.

The device of claim 5, wherein the antiferromagnetic material comprises NiO or CoO.

The device of claim 6, wherein the spin-polarized current is utilized for performing logic operations in reconfigurable circuits.

The device of claim 1, wherein the magnetic material is discontinuously deposited to form a patterned array.

The device of claim 2, wherein the magnetic field effect creates an excitonic insulator phase in the graphene sheet.

The method of claim 4, wherein the induced magnetization is optimized for room-temperature operation of the spintronic device.
				
			

The output of GPT-4o looks visually and semantically different from the actual claims.

We need to perform a semantic similarity check to get our answer.

Semantic Similarity

We will embed the generated claims and measure their cosine distance to the embedded target claims (actual patent) to capture the semantic similarity between the reference text and the outputs.

We use the SentenceTransformer library for this task and embed the texts using the pre-trained model PatentSBERTa since it was fine-tuned on patent vocabulary. We are going to benchmark 4 models, namely:

  • Un-trained GPT-4o
  • Untrained GPT-4o-mini
  • Fine-tuned GPT4-o mini
  • Fine-tuned Llama 3.1 8B
				
					from sentence_transformers import SentenceTransformer, util

# Initialize the PatentSBERTa model
model = SentenceTransformer('AI-Growth-Lab/PatentSBERTa')

target_text='[actual claims]'
generated_text='[generated claims]'

#Generate embeddings for each text
target_embedding = model.encode(target_text, convert_to_tensor=True)
embedding1 = model.encode(generated_text, convert_to_tensor=True)

# Calculate cosine similarities
similarity1 = util.cos_sim(target_embedding, embedding1).item()

# Print the similarity scores
print(f"Similarity between target text and model's output: {similarity1:.4f}")
				
			

Here are the results:

				
					Similarity between target text and OpenAI-gpt-4o's output: 0.8739
Similarity between target text and OpenAI-gpt-4o-mini's output: 0.8147
Similarity between target text and ubiai-model-llama-ft's output: 0.8496
Similarity between target text and ubiai-model-4o-mini-ft's output: 0.9611
				
			

With a similarity score of 96%, the smaller fine-tuned GPT-4o mini is the winner. The larger GPT-4o had only 87% similarity. On the other hand, the fine-tuned open-source Llama 3.1 had an 84% similarity, lower than the fine-tuned GPT-4o-mini but higher than the un-trained GPT-4o-mini (which is presumably larger than the 8B parameter Llama).

Let’s test on a different interesting patent (For amusement, I encourage you to read this fascinating patent):

				
					Similarity between target text and OpenAI-gpt-4o's output: 0.8684
Similarity between target text and OpenAI-gpt-4o-mini's output: 0.9111
Similarity between target text and ubiai-model-llama-ft's output: 0.8445
Similarity between target text and ubiai-model-4o-mini-ft's output: 0.9082
				
			

In this instance, the un-trained GPT-4o-mini gave almost the same score as the fine-tuned one, which is higher than the larger GPT-4o model.

Having tested on multiple patents, the fine-tuned GPT-4o was overall scoring higher compared to the other models, while the Llama 3.1 8B had a lower score but in many instances higher than the un-trained models.

These results indicate that the Llama 3.1 8B might require more training examples to outperform commercial LLMs.

Conclusion

Drafting patent claims is a nuanced, high-stakes task that requires precision and domain-specific expertise. As demonstrated by our tutorial and evaluation, fine-tuning smaller LLMs (such as GPT-4o-mini) can generate remarkably accurate claims compared to larger, generic models like GPT-4o.

Beyond performance, fine-tuned smaller LLMs come with critical advantages:

  • Improved Accuracy for Specific Tasks: Tailoring the model to patent drafting brings its output closer to actual claims, capturing detailed legal and technical requirements.
  • Cost and Latency Benefits: Smaller models are faster and more cost-effective to run, which is especially beneficial at scale.
  • On-Prem Privacy: With open-source solutions, data can remain within secure environments, a fundamental requirement in legal and intellectual property work.

While our comparisons show that a fine-tuned GPT-4o-mini can outperform larger models, open-source models like Llama 3.1 8B remain competitive — especially with additional training data. Ultimately, this workflow underscores that domain-specific fine-tuning, coupled with the flexibility of open-source platforms, is key to delivering precise, cost-effective, and secure patent claims generation.

Unlocking the Power of SLM Distillation for Higher Accuracy and Lower Cost​

How to make smaller models as intelligent as larger ones

Recording Date : March 7th, 2025

Unlock the True Potential of LLMs !

Harnessing AI Agents for Advanced Fraud Detection

How AI Agents Are Revolutionizing Fraud Detection

Recording Date : February 13th, 2025

Unlock the True Potential of LLMs !

Thank you for registering!

Check your email for the live demo details

see you on February 19th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Thank you for registering!

Check your email for webinar details

see you on March 5th

While you’re here, discover how you can use UbiAI to fine-tune highly accurate and reliable AI models!

Fine Tuning LLMs on Your Own Dataset ​

Fine-Tuning Strategies and Practical Applications

Recording Date : January 15th, 2025

Unlock the True Potential of LLMs !