How to Automate Data Extraction from Bank Statements using custom trained AI model
MAY 28th, 2023
In the world of accounting, document extraction from bank statements is an important task that ensures efficiency and accuracy in financial transactions. This is particularly important in an era where data is growing at an unprecedented rate and manual data entry is becoming increasingly inefficient.
In this tutorial we are going to learn how to automate the data extraction process from bank statements using custom trained AI models and automated table extraction.
Table Extraction

Bank statement example
An NLP model can be trained to automatically recognize and extract specific types of information from unstructured document such as amounts, dates, statement period and so on. However, it is not the most efficient use of time to train it on extracting organized tabular data. For this purpose, it is more efficient to use pre-trained tabular extraction APIs such as Microsoft Azure or AWS since they have been trained on millions of examples.
Below is an example of automated table extraction using UBIAI based on Microsoft azure API:

UBIAI’s table extraction
AI Model Training
Now that we are able to reliably extract the tables, we can train our AI model to extract the relevant information located at the top of the statement. Using UBIAI Annotation Tool, this can be done quite easily by labeling just 5 documents to train the AI model.
UBIAI’s table extraction

Model training dashboard in UBIAI
Custom Workflow Creation
Once the model is trained, we are now ready to combine table extraction and our custom trained model into one workflow that automatically extracts the relevant information from our bank statements.
To do so, we will use the AI Builder to deploy our model and create custom workflows with just few clicks. Users can combine different modules such as image processing, OCR, and custom NLP models, table extraction, LLMs and more, to create a tailored solution for their specific use case. For more in-depth information, please read this introductory article.
For this tutorial we are going to use the following workflow to achieve our goal:

Workflow building interface
- The first part of the workflow is document import. To do so, we simply drag-and-drop the PDF and Photo modules into the builder canva.
- Once the data is imported, we add the OCR module and connect the output of the data importers to the input of OCR module in order to parse the data from the PDF and images.
- Next we add two modules: Form Recognizer to import our custom trained AI model and Extract Tables module to read the tables.
- And that’s it, we can finally send the data to the export module.
Combining our custom trained AI model with other data processing modules can be done extremely easy using AI Builder’s modular custom workflow.
Now that the workflow has been created, let’s run it on a new bank statements.
Bank Statement Processing
After the documents have been processed, we can now review and correct the output before exporting the data out. Below is AI Builder’s review dashboard. Each module output can be visualized and reviewed.