In one of the most recent workshops where I spoke, I presented a series of use cases for automating financial processes. The first of these focused on extracting data from documents originally in paper format. What used to be difficult and costly to implement just a few years ago can now be done easily in any company using Microsoft’s Power Platform and its AI Builder add-in — a tool that brings AI models into applications and business processes with no-code integration.
Business Case: Why extract data from invoices or documents?
The main economic benefit of extracting data from documents is the ability to structure unstructured data. In many companies, entire teams spend countless hours manually reading documents and entering data into various systems:
- Accounting teams must read invoices, receipts, and other documents to classify and post entries.
- HR departments analyze CVs and cover letters to select job candidates.
- Insurance companies process claim documents related to accidents or damages.
When data is structured (organized into tables, for simplicity), much of this work can be automated. It becomes possible to classify accounting records, detect fraud or posting errors, assess job applications or complaints, etc.
And that’s our business case — the economic advantage of automation.
Architecture of a Document Data Extraction Model
The solution architecture we presented is summarized below:
- Document scanning via mobile phone or scanner
- Upload of the scanned file to a SharePoint document library
- Alternatively, the user can upload files directly through a custom Canvas App
- A Power Automate flow is triggered when a new document is added, extracting the data using the trained recognition model
- The same flow stores the extracted data in Dataverse tables (or optionally in SharePoint) and sends a notification email to the user with the processed file attached
In practical terms: you scan an invoice and receive a notification email along with a structured data table extracted from the document.
The AI Builder Data Extraction Model
At the heart of the solution is the extraction model. To enable native integration with Power Apps, SharePoint, Outlook, and other Microsoft tools, we used AI Builder.
AI Builder simplifies the entire training and testing process. It’s a no-code tool that can be used by anyone — literally. Business users with no coding experience can train models, which makes it incredibly powerful for adoption.
In AI Builder, there are ready-to-use models for document data extraction — perfect for this use case. Our process involved three key steps:
- Train the model by uploading sample documents and visually tagging the fields we want to extract
- Test the model with unseen documents and measure its accuracy — how many fields are identified correctly?
- Refine the model to improve its performance, if necessary
Live Demonstration
We demonstrated the process using invoices from a local supermarket (Pingo Doce), originally in paper form. Here’s an example of one invoice:

As you can see, the invoice is hard to read. It looks like it was kept in a pocket or wallet for a few days and is visibly damaged. Even for a human, extracting all the information would be challenging.
Using this and similar invoices, we trained the AI Builder model. You can either use a pre-trained model or build a custom one — both follow the same workflow: you upload multiple examples and define which fields should be recognized.
Even when documents follow the same format, variation is possible: some invoices contain more products, different VAT rates, or different sections. This variability makes it impossible to rely solely on a rules-based model — a machine learning approach is required.
Here’s the model we trained specifically for this format:

We used 16 invoices for training and got the following results:

The model achieved an 85% accuracy rate — quite acceptable. That means some human correction is still needed (~15% of cases), but overall the results are solid. In the right-hand panel, we see that some fields performed better than others.
For example, the invoiceDate field reached 99% accuracy, which is almost perfect. Others — like vendorName and the product table — were harder to extract accurately.
The product list is particularly challenging since it’s a dynamic table: some invoices have one product, others have several. Similarly, the VAT table varies. Still, with only three possible VAT rates, the model achieved 91% accuracy for that section.
Operationalizing the Model
To use this model in a real-world scenario, we can deploy it via a Power Apps frontend and a Power Automate flow in the backend. In our demo, that’s exactly what we did: Power Apps for the UI and Power Automate for the processing and email notifications.
The app interface looks like this:

Here, the user uploads a scanned invoice. That upload triggers the automation flow, which extracts the data and returns it to the app.
The user can then review the data and validate it. If necessary, they can manually adjust the values using the app interface. In the final screen, all previously processed documents are listed along with their extracted data.
The Power Automate flow behind this looks like this at a high level:

This image shows the first 3 steps, but the full flow is longer and will be explained in more detail in a future post — if readers are interested.
Finally, the extracted data is stored in tables — either in Dataverse or even Excel — and results can be sent via email to a user for validation. The email might look like this:





