Building An NLP Project From Zero To Hero (1): Project Overview
Dec 7, 2021
Whether it’s receipts, contracts, financial documents, or invoices etc., automating information retrieval will help you increase your business efficiency and productivity at a fraction of the cost. However, this amazing feat will not be possible without text annotation. While natural language processing (NLP) tasks such as NER or relation extraction have been widely used for information retrieval in unstructured text, analyzing structured documents such as invoices, receipts, and contracts is a more complicated endeavor.
First, there is not much semantic context around the entities we want to extract (i.e. price, seller, tax, etc.) that can be used to train an NLP model. Second, the document layout changes frequently from one invoice to another; this will cause traditional NLP task such as NER to perform poorly on structured documents. That being said, structured text — such as an invoice — contain rich spatial information about the entities. This spatial information can be used to create a 2-D position embedding that denotes the relative position of a token within a document. More recently, Microsoft released a new model LayoutLM to jointly model interactions between text and layout information across scanned document images. They achieved new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).

Whether it’s receipts, contracts, financial documents, or invoices etc., automating information retrieval will help you increase your business efficiency and productivity at a fraction of the cost. However, this amazing feat will not be possible without text annotation. While natural language processing (NLP) tasks such as NER or relation extraction have been widely used for information retrieval in unstructured text, analyzing structured documents such as invoices, receipts, and contracts is a more complicated endeavor.
First, there is not much semantic context around the entities we want to extract (i.e. price, seller, tax, etc.) that can be used to train an NLP model. Second, the document layout changes frequently from one invoice to another; this will cause traditional NLP task such as NER to perform poorly on structured documents. That being said, structured text — such as an invoice — contain rich spatial information about the entities. This spatial information can be used to create a 2-D position embedding that denotes the relative position of a token within a document. More recently, Microsoft released a new model LayoutLM to jointly model interactions between text and layout information across scanned document images. They achieved new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).
