Categorizing Invoices: Multimodal Transformers for Structured and Unstructured Data
Aug 9, 2022
In this article, we will fine-tune a pre-trained BERT model on our “multimodal” data to perform a multiclass classification of invoices by category.
- Business understanding
- Work environment preparation
- Data understanding
- What are Multimodal Transformers?
- Data Preparation
- Modeling
- Evaluation results
Business understanding
As a matter of fact, in most organizations, each invoice is classified into a specific category. In practical terms, if an employee incurs expenses to repair or maintain facilities or equipment in their office, those invoices will be classified under the category “Repair_and_Maintenance” for example.
Therefore, the category is an important information to create expense reports in order to reimburse employees for eligible business expenses, and to track either expenses for the overall organization or expenses associated with a specific product, client or project.
Although categorizing expenses is an important task to accomplish, doing it manually can be a real burden and a waste of time and resources. The same problem arises when we need to extract data from invoices such as date, TTC amount, taxes, and seller… However, with the recent advancement in deep learning models such as Transformers, it is becoming easier than ever to fine-tune large language models to serve a specific business need. All you need is high-quality labeled data to train the model.
For invoice extraction, we need to find an annotation tool that offers OCR annotation to parse the text and bounding box from the invoices and allows native labeling. Fortunately, I have found a tool named UBIAI that will enable you to directly label your invoices and also train deep learning models such as LayoutLM to automatically extract information from the invoice image (as shown in the illustration below).