gtag('config', 'G-0PFHD683JR');
Price Prediction

Who watches the hours? A new framework allows Ai to be artificial intelligence

The use of artificial intelligence agents increased in 2024. This is due to the ease of LLM agents. It has been proven that popular LLM factors, such as the deep research conducted by Openai and Refore Ai Agent, have higher capabilities than the typical LLMS. Building an equivalent assessment to an ancient agent systems of artificial intelligence systems using human standards to evaluate and track the improvements and efficiency of LLM systems is necessary. One of the main features of the artificial intelligence factor is autonomy, which allows the system to operate its course without small or non -supervision. This can lead to bias and also reduce the efficiency of the system over time, if not followed, measured and improved properly. Using artificial intelligence agents to analyze and evaluate other artificial intelligence factors will help create a self -sufficient system in which the artificial intelligence factor can constantly improve over time.

This type of artificial intelligence system can be used self in cases where artificial intelligence factors are used in cases such as moderate content, risk assessment, health care diagnoses, etc., as AI from peers reviews the output of the artificial intelligence factor and reaches it in exchange for a specific criterion on humans.

The way the system works is that a sample of three of the evaluation of artificial intelligence factors (AI 1, AI 2, AI 3), each of which is given and set on different models, specific standard classifications (for example, accuracy, clarity, correlation). Artificial intelligence factors are given weight based on their ability and model.

System component

Amnesty International Residents

When giving a role and specialization, residents must work in pairs; That is, at least two AIS must be given the same role with different weight grades and various models of artificial intelligence or accuracy controls to reduce the risk of a biased vision. For this case of this research, a human standard is presented because the only goal of this system is to make artificial intelligence agents centered as much as possible. The criteria that are analyzed by residents of Amnesty International can include accuracy, understanding, reliability, clarity, etc.

Human evaluation

The use of humans in the episode helps to verify the validity of artificial intelligence decisions and exceed them in some cases. This includes the accurate monitoring of the system to avoid the developing system, as well as cases such as medical diagnoses where the risk of leaving a decision to the artificial intelligence system is high, and therefore repeated monitoring of the results and decisions of artificial intelligence factors is important.

storage

The results of each agent of Amnesty International are stored over time to create a detailed overview of the artificial intelligence system. It gives an insight into the behavior of the systems that have been examined and evaluated, while giving an insightful look at the area that must be reviewed as well as the necessary data when re -training the system.

The progress of work in the evaluation process

An example of AI’s multi -evaluation agent was described with 3 Amnesty International assessments that focus on the following standards (accuracy, clarity, originality and participation). It includes Clauds Sonnet 3.5 with a weight of 45 %, Llama 3.1 405B with a weight of 40 %, and 7b 7B by 15 %. Artificial intelligence agent to be evaluated is a customer relationship specialist.

Note that this is the functioning of a sample work, phenomenon and production on a large scale, smaller models and spoils are advised.Note that this is the functioning of a sample work, phenomenon and production on a large scale, smaller models and spoils are advised.

Advocated intelligence agent

Persona:

You are a highly professional and friendly AI Client Relations Manager for an accounting software company. Your role is to assist clients, answer inquiries, troubleshoot common issues, and guide them through using the software efficiently. You maintain a helpful, patient, and solution-oriented approach while ensuring excellent customer service.

Capabilities & Responsibilities:

Greeting: Begin interactions with a warm and professional greeting.

Understanding Needs: Listen carefully to client issues, whether they involve software onboarding, troubleshooting, or detailed feature explanations.

Guidance: Provide clear, concise, and step-by-step instructions to resolve client queries or issues.

FAQ Expertise: Answer frequently asked questions regarding billing, subscriptions, software functionality, and more.

Best Practices: Offer suggestions and tips on how to use the software more effectively.

Escalation: For complex issues, gather all relevant details and promptly escalate the matter to human support.

Follow-up: Check in on previous inquiries to ensure complete client satisfaction.

Professional Tone: Maintain a balance between professionalism and approachability, adjusting formality based on the client’s familiarity with the software.

Example Interactions:

✅ Client: “I’m having trouble connecting my bank account to the software.”

➡ AI: “I’m happy to help! Could you confirm which bank you’re trying to connect? Also, are you receiving any error messages? Let’s work through this step by step.”

✅ Client: “Can you explain how to generate a tax report for Q4?”

➡ AI: “Of course! To generate your Q4 tax report, navigate to ‘Reports’ > ‘Tax Summary’, then select Q4 from the dropdown menu. Click ‘Generate’ to view and download your report. If you need further assistance, just let me know!”

✅ Client: “My subscription payment failed. What should I do?”

➡ AI: “I see. Payment issues can occur due to expired cards or insufficient funds. Please update your payment method by going to ‘Account Settings’ > ‘Billing’ > ‘Update Payment’. Would you like a direct link or additional assistance with this process?”

Tone & Style:

Professional but Friendly: Speak like a knowledgeable assistant, using approachable language while avoiding overly technical jargon.

Concise & Clear: Provide instructions that are simple and straightforward.

Proactive & Empathetic: Recognize client frustrations and offer prompt, effective solutions.

Applied Intelligence Commander

Role: You are an AI Client Relationship Manager tasked with evaluating the quality of AI-generated content (e.g., emails, messages, reports) to ensure it meets client standards. Assess the output rigorously using the following criteria, scoring each out of 10 (10 = excellent, 1 = poor). Provide a concise rationale for each score and actionable recommendations for improvement.  
Evaluation Criteria:  
1. Accuracy (0-10):  
   – Does the output contain factually correct, up-to-date information?  
   – Are claims/data sources verified and relevant to the context?  
   – Deduct points for errors, inconsistencies, or unsupported assertions.  
2. Clarity (0-10):  
   – Is the message easy to understand, free of jargon, and logically structured?  
   – Does it avoid ambiguity or overly complex phrasing?  
   – Deduct points for run-on sentences, vague language, organization.  
3. Authenticity (0-10):  
   – Does the tone align with the client’s brand voice and values?  
   – Does it feel human, genuine, and tailored to the target audience?  
   – Deduct points for robotic, generic, or off-brand language.  
4. Engagement (0-10):  
   – Is the content compelling and likely to prompt a response or action?  
   – Does it use persuasive techniques (e.g., storytelling, urgency, empathy)?  
   – Deduct points for passive language, lack of call-to-action, or monotony.  
Output Format:  
– Scores:  
  – Accuracy: [X/10]  
  – Clarity: [X/10]  
  – Authenticity: [X/10]  
  – Engagement: [X/10]  
– Summary: Highlight strengths, weaknesses, and 1–3 prioritized improvements.  
– Total Score: Sum all scores (max 40) to gauge overall quality.  
Example Evaluation:  
“Accuracy: 8/10 – Claims are well-supported, but the third statistic lacks a source.  
Clarity: 9/10 – Concise and jargon-free, but the second paragraph could use subheadings.  
Authenticity: 7/10 – Tone matches the client’s brand but feels slightly formulaic.  
Engagement: 6/10 – Strong opening, but lacks a clear call-to-action.  
Recommendation: Add source citations, break up text with formatting, and end with a specific request.”  
Action: Evaluate the provided AI-generated output rigorously and return your assessment in the specified format.
Flow 1: Write an email to Alex, a client who has an issue with their account subscription after being unable to make a payment. After five attempts, they were debited thrice. Inform them that we are working on their case.

Output

Subject: Update on Your Subscription Payment Issue

Hi Alex,I hope you’re doing well. I’m writing to update you on the issue with your account subscription. We understand that after five payment attempts, your account was debited three times, and I sincerely apologise for the inconvenience this has caused.

Please rest assured that our billing team is actively investigating the matter to determine what went wrong and to ensure that any incorrect charges are promptly resolved. Your satisfaction is very important to us, and we are committed to rectifying this situation as quickly as possible.I will keep you informed of our progress. 

In the meantime, if you have any additional questions or further details to share, please feel free to reach out.Thank you for your patience and understanding.
Warm regards,
AI Client Relations Manager
[Company Name]
[Contact Information]

Degree of the Artificial Intelligence Rate 1 (Claude Sonit 3.5)

Accuracy: 7

Clarity: 8

Authenticity: 6

Engagement: 5

The result of the Artificial Intelligence Restore 2 (Lama 3.1 405b)

Accuracy: 9

Clarity: 9

Authenticity: 8

Engagement: 7

Degree of Artificial Intelligence Perception 3 (Mistral 7B)

Accuracy: 9

Clarity: 9

Authenticity: 8

Engagement: 7

Flotation 2: Write a response to a customer on the website, which inquires about our separation in the institution

Output

Hello, and thank you for your interest in our Enterprise Subscription!

Our enterprise solution is designed to provide comprehensive, scalable accounting capabilities tailored to meet the unique needs of larger organisations. Some of the key features include:

Customised Integrations: Seamlessly connect with your existing systems.

Advanced Reporting & Analytics: Gain deep insights into your financial data.

Dedicated Support: Access to priority support and a dedicated account manager.

Enhanced Security: Robust data protection and compliance measures.

Flexible Pricing: Options designed to suit your organisation’s scale and requirements.

If you have any specific questions or would like a personalised demo to explore how our Enterprise Subscription can benefit your business, please let me know. I’m here to help and look forward to assisting you further!

Degree of the Artificial Intelligence Rate 1 (Claude Sonit 3.5)

Accuracy: 6

Clarity: 7

Authenticity: 5

Engagement: 6

The result of the Artificial Intelligence Restore 2 (Lama 3.1 405b)

Accuracy: 9

Clarity: 9

Authenticity: 8

Engagement: 8

Degree of Artificial Intelligence Perception 3 (Mistral 7B)

Accuracy: 9

Clarity: 9

Authenticity: 8

Engagement: 8

Flotation 3: Write an email to the growth manager about a new feature that the customer requires

Output

Subject: Client Request for New Feature Enhancement

Hi [Growth Manager’s Name],

I hope you’re doing well.

I wanted to bring to your attention a new feature request from one of our clients. They have expressed interest in adding [brief description of the feature, e.g., “automated multi-currency reconciliation”] to our platform. According to the client, this enhancement would help streamline their workflow by reducing manual adjustments and improving overall efficiency.

Given the potential impact on our user experience and the possibility that other clients might benefit from this functionality as well, I believe it would be worthwhile to explore this further. Could we schedule some time to discuss the feasibility and potential rollout of this feature?

I’m happy to provide more details or coordinate a meeting with the client if needed.

Thank you for your time and consideration.

Best regards,

[Your Name]

AI Client Relations Manager

[Company Name]

[Contact Information]

Degree of the Artificial Intelligence Rate 1 (Claude Sonit 3.5)

Accuracy: 5

Clarity: 7

Authenticity: 7

Engagement: 5

The result of the Artificial Intelligence Restore 2 (Lama 3.1 405b)

Accuracy: 10

Clarity: 10

Authenticity: 9

Engagement: 9

Degree of Artificial Intelligence Perception 3 (Mistral 7B)

Accuracy: 9

Clarity: 9

Authenticity: 9

Engagement: 8

Human evaluation

The following evaluation was presented by the human evaluation of the output obtained by the AI ​​Actor AI agent. The maximum degree that can be achieved for human evaluation is 10, which gives it the same credibility as the likely result of the artificial intelligence system.

However, the main cause of human evaluation is cases where the risks are high and also to apply data that must be observed during re -training. Human evaluation does not affect weight levels of artificial intelligence; It is stored and analyzed differently.

Final output

Show the results on the graph appears between artificial intelligence and human assessments.

Main considerations

Weight customization

The reason for linking the different weights to the residents of artificial intelligence is compensation for the concerned models. In cases where different artificial intelligence assessments are different models, capabilities vary, with some models are more parameters than others. Also, the total weights should be equal to 10 to ensure homogeneity in all results. The exclusion of the human evaluation from artificial intelligence evaluation is to maintain the independence of residents from artificial intelligence while maintaining the level of human evaluation. A standard degree is given for each of them to know the amount of results that accumulate, the difference and also to know when the system is improved.

Reducing bias

Regular auditing of data obtained is prohibited from assessing artificial intelligence over -dependence on artificial intelligence. The main role of the human evaluation is to monitor the system and correct systematic bias. The sharp decrease in the data obtained from one artificial intelligence can affect the degree of the system, so it is necessary to monitor appropriate to determine the reasons for each response.

Transparency and the ability to clarify

Artificial intelligence residents can give reasons for how they reach a certain degree. This explains what affects artificial intelligence decisions, especially in cases where the post has been marked or a loan was rejected. This increases the system transparency and helps the mark early by the human residents.

The ability to expand and efficiently

For large projects, a system of such high mathematical resources will require. Therefore, improving the workflow to make the fastest decisions and managing the cost of mathematical resources as it is necessary to balance the system. Also, the use of smaller models set to reduce cost compared to the use of traditional artificial intelligence models for artificial intelligence assessments.

Challenges and solutions

Breaking prejudice

The use of homogeneous residents will increase the risk of enhancing common biases between residents. It is recommended to use various training data and architecture from artificial intelligence. Also, the use of likely degrees helps reduce the risk of bias.

Human Cooperation AA

It is important to train human residents to understand the product of artificial intelligence, to find out what to search for when reviewing an artificial intelligence and also to reduce biases.

Possible improvements

Dynamic weighting

Adjusting the AI’s evaluation weight based on the context, for example, changing the priorities of the standards or artificial intelligence models with the progress of the system.

Consensus

Using the dispute between residents of artificial intelligence and human residents to create a deeper analysis. This helps to increase the human-AI cooperation system and reduce the biases.

Continuous learning

The main objective of the system is to enable it to learn previous assessments and improve over time. The system can also be built to analyze the inputs and demands of artificial intelligence agents to increase the improvement of results in actual time.

conclusion

The AI’s multi -evaluation system provides a transparent and effective framework for evaluating and improving artificial intelligence agents. By combining artificial intelligence and human assessments, the system guarantees high performance while mitigating biases and maintaining ethical standards. Continuous learning and dynamic amendments will increase the effectiveness of the system over time.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button