Why do new artificial intelligence agents choose to reduce navigation on HTML?
Artificial intelligence agents take care of the world, and put the next big step in the development of artificial intelligence 🦖. So, what does all these factors share? They use Markdown instead of raw html when processing content on web pages ⛓. My curiosity to find out why?
This blog post will show you how this simple trick can provide you with up to 99 % of symbols and money!
Artificial intelligence agents and data processing: Introduction
Artificial intelligence agents They are software systems that harness the strength of artificial intelligence to accomplish tasks and follow -up goals on behalf of users. Equipped with thinking, planning and memory, these agents can make decisions, learning and adaptation – all of them on their own. 🤯
In recent months, artificial intelligence agents have launched, especially in the browser automation world. These AI’s AI’s agent browsers enable you to use LLMS to control software browsers, and to automate tasks such as adding products to the Amazon 🛒.
Have you ever wondered about AI’s libraries and frames from AI, such as Crawl4ai, ScrapeGRAPHAI and Langchain?
When processing data from web pages, These solutions often turn HTML into an automatic reduction– Or display ways to do this – before sending data to LLMS. But why do artificial intelligence agents prefer discounts on HTML? 🧐
The short answer is: To save the symbols and accelerate the treatment! ⏩
It is time to dig deeper! But first, let’s take a look at another approach used by artificial intelligence agents to reduce data download. 👀
From excessive data to clarity: The first step for artificial intelligence customers
Imagine that you want your artificial intelligence agent:
-
Contact the e -commerce site (for example Amazon)
-
Find a product (for example PlayStation 5)
-
Extract data from this specific product page
This is a common scenario for the artificial intelligence agent, as the renewal of e -commerce is a wild trip. After all, the product pages are chaos chaos for constantly changing changes, making software data a nightmare analysis. This is the place where the factors of artificial intelligence praise their great powers, and benefit from LLMS to extract data smoothly – it does not matter how the page structure is chasing!
Now, let’s say that you are on a mission to seize all the modern details of PlayStation 5 page page On the Amazon 🎮:
Here’s how to drive AI’s agent browser to achieve this:
Navigate to Amazon's homepage. Search for 'PlayStation 5' and select the top result.
Extract the product title, price, availability, and customer ratings.
Return the data in a structured JSON format.
This is what the artificial intelligence agent should do (hope 🤞):
-
Open the Amazon in the browser 🌍
-
Look for “PlayStation 5” 🔍
-
Determine the correct product 🎯
-
Extracting the product details from the page and returning it in JSON 📄
But this is the real challenge –Step 4. Amazon PlayStation 5 page is a monster! HTML is full of tons of information, which you don’t even need.
Want a guide? Full HTML copy of the page from the DOM page to the browser and drop it into a tool like the distinctive code calculator LLM:
🚨 Take yourself …
896,871 symbols?! 😱 Yes, I have read this correctly-ninety-six thousand, and eight hundred and seventy of the distinctive symbols!
This huge load of data – AKA tons of money! 💸 (more than $ 2 per request on GPT-4O! 😬)
As you can imagine, passing all these data to the artificial intelligence agent comes with great restrictions:
- You may require installment/supporter plans to support the use of a highly distinctive symbol 💰
- It costs a fortune – especially if you are running frequent inquiries 🤑
- It slows down responses because artificial intelligence must address a ridiculous amount of information ⏳
Reform: fat trimming
Most artificial intelligence agents allow you to determine the CSS specific to extract the relevant departments only from the web page. Others use the algorithms of affairs for the automatic candidate content-such as stripping heads and appetite (which usually does not add any value). ✂
For example, if you check the Amazon PlayStation 5 product page, you will notice that most useful content lives within the HTML element that you have selected #ppd
CSS specific:
Now, what if you told the artificial intelligence agent only focus on #ppd
An element instead of the entire page? Will this make a difference? 🤔
Let’s put it on the test in the confrontation face to the face below! 🔥
Markdown Vs HTML in the processing of artificial intelligence data: comparison face to face
Compare the use of the distinctive symbol when processing a portion of the web page directly in exchange for converting it into a reduction.
Html
In your browser, copy HTML from #ppd
The element, and drop it into a symbolic calculator:
From 896,871 symbols to only 309,951 –Nearly 65 % save!
This is a significant decrease, for sure, but let’s be real – still a lot of distinctive symbols! 😵💸
Price reduction
Now, let’s repeat the trick used by artificial intelligence agents by taking advantage of the HTML converting tool to Markdown online. But first, remember that artificial intelligence agents perform some pre -treatment to remove important signs of content such as and
Signs.
You can filter the HTML for the target element using this simple text program in the browser controller:
function removeScriptsAndStyles(element) {
let htmlString = ppdElement.innerHTML;
// Regex to match all and tags
const scriptRegex = /
Next, copy HTML, which was cleaned and converted into discounts using the HTML converter to Markdown online:
The resulting reduction is much smaller but It still contains all important text data!
Now, paste this reduction in the distinctive LLM code tool:
Boom! 💣 From 896,871 symbols to 7,943 symbols only. These are savings ~ 99 %!
By removing the basic content only and converting HTML to Markdown, you have a more size load, lower costs, and faster processing. Great victory! 💰
Markdown vs html: The Battle for Tokens and Cost Saves
The last step is to check that the text text still contains all the main data. To do this, pass it to LLM with the last part of the original claim, and here is the result of Json you will get:
{
"product_title": "PlayStation®5 console (slim)",
"price": "$499.00",
"availability": "In stock",
"customer_ratings": {
"rating": 4.6,
"total_ratings": 5814
}
}
This is exactly what your artificial intelligence agent will return - Spot on!
For a quick overview, check the final summary below:
road
Symbols
The price of O1-Mini
GPT-4O-MINI Price
GPT-4O price
HTML is the entire
896,871
13.4531 dollars
0.1345 dollars
$ 2.2422
#ppd
Html
309,951
$ 4.6493
0.0465 dollars
0.7749 dollars
#ppd
Price reduction
7,943
0.0596 dollars
0.0012 dollars
0.0199 dollars
Where the agents of artificial intelligence fail
All of these symbolic tricks are useless if the artificial intelligence agent is banned by the target site 😅 (I saw ever how AI captcha be honey? 🤣).
Why does this happen? basic! Most sites use anti -deployment measures that can easily prevent browsers. Do you want a complete collapse? Watch the next web symposium below:
If you have followed our advanced webs guide, then you know that the problem is not with browser automation tools (libraries that operate artificial intelligence agents). no, The real perpetrator is the browser himself. 🤖
To avoid a ban, you need a specially designed browser for cloud automation. Enter the scraping browser, browser:
- It is operated in the position of the head just like the regular browser, which makes it difficult for the anti -detect systems. 🔍
- It is closer to the cloud, which saves you time and money on the infrastructure. 💰
- Captcha automatically replaces, deals with the browser's fingerprint, dedicates cookies/heads, and tries to retry matters smoothly. ⚡
- IPS rotates from one of the largest and most reliable agent networks there. 🌍
- Smoothly integrates with famous automation libraries such as theatrical writer, selenium and dolls. 🔧
Learn more about the Bright Data Data browser, The perfect tool for integration into artificial intelligence agents:
Final ideas
You are now in the episode about the reason for the use of artificial intelligence agents to reduce data processing. It is a simple trick to provide symbols (and money) while speeding up LLM processing.
Do you want to work artificial intelligence without hitting the blocks? Take a look at the BRIGHT DATA tool collection for AI! Join us to make the Internet accessible to everyone - even through the browsers of the automatic artificial intelligence agent. 🌐
Even next time, keep the web freezer! 🏄