gtag('config', 'G-0PFHD683JR');
Price Prediction

The best artificial intelligence models to treat the bill: standard comparisons

The most popular intelligence models have been tested to know the extent to which bills are processed outside the box, without any refinement.

Reading to learn:

  • Any model that surpasses all others At least 20 %
  • Why Google AI failed to work with organized data
  • Learn about the forms Dealing with a low -precision examination

Models tested

To achieve the goal of this test, I began searching for artificial intelligence models using these standards:

  • popularityPopular models have better support and documentation.

  • The ability to treat the billThe model must be able to process bills from the beginning, without adjusting or training API.

  • Integration: Since the results of this test are supposed to be used in practice, it is important that each API integration capacity is for ease of integration.

It has landed on 7 models of Amnesty International shown below. I have given each one a title to rest:

  • Amazon Analysis API expenses, or “AWS
  • Azure Ai Document Intelligence – Model Prebuilt Prebuilt, or “Go out
  • Google Ai – Parser the bill, or “Google
  • GPT -4O API – Enter a text with the third party OCR, or “GPTT
  • GPT -4O API – Enter Image, or “GPTI
  • Gemini 2.0 Pro experimental, or “twin
  • Deepsek v3 – Enter text, or “Deepsek-T

The bill data collection

Models are tested on a data collection of 20 invisions of various designs and years of version (from 2006 to 2020).

Faten year

Number of bills

2006 – 2010

6

2011 – 2015

4

2016 – 2020

10

methodology

Each invoice analysis, I have set a list of 16 main fields common among all bills and contains the most important data:

Invoice Id, Invoice Date, Net Amount, Tax Amount, Total Amount, Due Date, Purchase Order, Payment Terms, Customer Address, Customer Name, Vendor Address, Vendor Name, Item: Description, Item: Quantity, Item: Unit Price, Item: Amount.

The fields extracted from the models have been appointed to a joint naming agreement to ensure consistency. LLM (GPT, Deepseek and Gemini) specifically asked to return the results using these common fields.

Disclosure of the bill elements

For each invoice, I assessed the extent of extracting models extracted from the main elements fields:

Description, Quantity, Unit Price, Total Price

Efficiency measures

I used the most likely efficiency scale (EFF, %) to assess the accuracy of extraction. This scale is collected:

Strict basic fields: accurate matching, such as the bill identifier, dates, etc.

Non -tapeable fields: the permitted partial matches if the similarity (RLD, %) exceeds the threshold.

The bill elements: It is evaluated as correct only if all the features of the elements are accurately extracted.

Formula

Total efficiency (EFF, %): EFF, % = (countif (ESS. ESS. Fields, positive) + Countif (ESS fields.

Efficiency efficiency (EF-I, %): EF-I, % = positive if (all (quantity, unit price, amount-positive) and RLD (description)> RLD threshold) * 100

The results of the bill

NoteGoogle results are deleted from this because Google failed to properly extract the elements.

The best ideas

Azure is not the best with elements descriptions.

One of the bills in the data set has employee names as elements. In this bill, Azure failed to discover the names of full items, with only the first names realized, while other models have successfully identified the full names in all 12 elements.

This problem greatly affected the AZURE efficiency in this bill, which was significantly less (33.3 %) compared to other models.

💡 Azure’s inability to analyze multi -word descriptions in structured fields highlights decisive restrictions compared to competitors.

The accuracy of the decrease in bills in practice does not affect the quality of detection.

Low accuracy (as the human eye sees) of bills in general has not been degraded from the quality of detection. Low accuracy mainly leads to minor recognition errors, for example, in one of the invoices, Deepseek erred in a point of point, which leads to an incorrect digital value.

💡 Modern OCR and AI models are strong in solving problems, although rare coordination errors may occur.

Google failed to discover the elements.

Google combines all elements fields in one chain, making it impossible to compare the results with other styles. Google Learn Results:

Actual bill:

All other services have a 100 % correct discovery with breakdown as features.

💡 Amnesty International from Google cannot extract structured data without setting it.

The descriptions of multiple lines did not affect the quality of detection.

💡 With the exception of the Google AI case above, the descriptions of multiple lines did not affect the quality of detection negatively in all models.

Gemini has the best “attention to detail”.

LLMS such as GPT, Gemini and Deepseek can be extracted more data from the pre -integrated bill forms. Of all LLMS, Gemini has the best accuracy when it comes to extracting additional data from the bill elements. GPT is often extracted the right fields but incorrect field values, and Deepseek performance is worse than the three models with the accuracy of extracting the poorest domain value.

An example of an invoice:

Gemini results:

AccurateAccurate

GPT results:

The same features, but inaccurate valuesThe same features, but inaccurate values

Deepsik Results:

Most values ​​are incorrect or absent, a bad text in the features of the textMost values ​​are incorrect or absent, a bad text in the features of the text

💡 GIMINI has the highest accuracy of extracting elements compared to other LLMS: extracts all fields, not only the standard fields, and have the highest accuracy in preserving the text and numerical values.

Comparing costs

I calculated the cost of 1000 invoice processing depending on each model, as well as the average cost of one invoice processing:

service

Assign

The cost per page (average)

AWS

10/1000 pages (1)

0.01 dollars

AZURE AI DOCUMENT Intelligence

$ 10/1000 pages

0.01 dollars

Google Ai Document

$ 10/1000 pages

0.01 dollars

“GPTT”: GPT-4O API, introducing text with a third party

Enter icons $ 2.50 / 1M, $ 10.00 / 1 million output codes (2)

0.021 dollars

“GPTI”: GPT-4O only

Entering 2.50 / 1 million dollars, $ 10.00 / 1 million directing symbols

0.0087 dollars

Gemini 2.0 Pro

$ 1.25, input claims ≤ 128 kilos
2.50 dollars, input claims> 128 km codes
$ 5.00, output claims ≤ 128 kilos
10.00 dollars, output claims> 128 kilograms

0.0045 dollars

Deepseek v3 API

$ 10/1000 pages + $ 0.27 / 1m input icons, 1.10 / 1m output icons

0.011 dollars

Notes:

(1) – 8/1000 dollars after a million dollars per month

(2) – An additional $ 10 per 1000 pages to use the text recognition form

The main results

🚀 The most efficientGIMINI and GPT-4O lead the efficiency and consistency of extraction in all bills.

⚠ The worst performanceGoogle Ai is the worst of all models that are tested when it comes to extracting items, making the total efficiency degree low. Google combines all elements fields in one line, making it the worst option to use it outside the box.

🎲 The least reliableDeepseek showed frequent errors in the text and numerical values.

Which model is the best why?

✅ Gemini, AWS or Azure to extract high -resolution data.

✅ GPT-4O (Insert a text with the third party OCR) to learn the effective cost and balance of the wonderful “cost-efficiency”.

❌ Avoid Google AI if you need to extract elements with high accuracy.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button