Work with LLMS for a full year: The lessons you picked along the way

How do not mess with the construction of artificial intelligence (and saving time already)
We are in the era of artificial intelligence. Each company wants to add a kind of artificial intelligence feature to its products. LLMS (big language models) appear everywhere – some even build them. But while everyone jumps to the noise, I just want to share some things that I learned after working with LLMS through various projects during the past year.
1. Choose the right model for the right function
A big mistake I saw (and made) is the assumption that all LLMS is equal. They are not.
Some models have more knowledge about certain sites or fields (Gemini is great in some areas where Openai models do not work well). Some are good in thinking but slow. Others are fast but bad in critical thinking.
Each model has its own strengths. Use Openai GPT-4 for deep thinking tasks. Claude or Gemini use other areas depending on how to train it. Models such as Gemini Flash have been improved for speed, but they tend to skip deep thinking.
The bottom line: Do not use a single model for everything. Be intended. Try, test and choose the best in case of use.
2. Do not expect LLMS to do all the thinking
I used to believe that you can just throw a directed at LLM, and will do all heavy lifting.
For example, I was working on a project that users did where users chose their favorite teams, and the application had to create a course based on matches. Initially, I thought I could just send the entire list of matches to LLM and expect to choose the best and build the flight line. It did not succeed.
He was slow, chaotic and unreliable.
So I changed the approach: the system chooses the right match first, then only passes the relevant match to LLM to create the flight line. This is better.
lesson? Let your application deal with logic. Use llms to create things, not to self -report. They are wonderful in the language. Not always great in logic, at least now.
3. Give each agent one responsibility
Try to make one LLM to do multiple functions is a recipe for confusion.
In one of my projects, I had an honorable agent who sent messages to various specialized agents based on the input of the user. Initially, I added a lot of logic to this – the context of treatment, knowledge of follow -up, determination of continuity, etc.
Therefore, I divided it. I moved some logic (such as the continuity of the thread) outside and kept the supervisor focusing only on the guidance. After that, things became more stable.
lesson: Do not overload your agents. Maintaining one responsibility for each agent. This helps reduce hallucinations and improves reliability.
4. Cumin is inevitable – use broadcasting
LLMS which is good in thinking is usually slow. This is the reality now. Some models such as GPT-4 or Claude 2 take their time, especially with complex claims. You cannot completely get rid of the delay, but you can make him feel better for the user.
One way to do this? The output is as created. Most LLM APIS supports text transfer, allowing you to start sending partial responses – even the sentence depending on the sentence – while the rest is still processing.
In my applications, I broadcast everything ready for the customer as soon as it is available. It gives users a sense of progress, even if the full result takes a little longer.
lesson: You cannot avoid cumin, but you can hide it. Early flow, partial output often a big difference in the perceived speed.
5. You can provide you with careful control time (and symbols)
People often avoid their polish because it looks complicated or costly. But in some cases, it actually provides a lot.
If your claims need to include the same structure or context each time and does not help the cache, then you spend the distinctive symbols and time. Instead, just set the form with this structure. Next, you don’t need to pass the same example every time – he only knows what to do.
But be careful: Do not turn on data that change frequently, such as flights or prices. You will end up teaching the form old information. Setting works better when logic and coordination are stable.
lesson: Adjust it when things are consistent – not when they change constantly. It saves the long -term effort and leads to faster and cheaper claims.
Final ideas
Work with LLMS not only about claims and application programming interface. It comes to architecture, performance and clarity, and most importantly, knowing what can be expected (and what cannot be expected) of these models.
I hope this helps someone build the following artificial intelligence feature.