gtag('config', 'G-0PFHD683JR');
Price Prediction

Can LLMS improve collective evaluation in dialogue systems?

Authors:

(1) Clementia Siro, University of Amsterdam, Amsterdam, Netherlands;

(2) Mohamed Elianjadi, University of Amsterdam, Amsterdam, Netherlands;

(3) Martin de Regic, University of Amsterdam, Amsterdam, Netherlands.

Abstract and 1 introduction

2 methodology and 2.1 experimental data and tasks

2.2 Automatic generation from the various dialogue contexts

2.3 Sourdsource experiments

2.4 Experimental conditions

2.5 Participants

3 results and analysis and 3.1 data statistics

3.2 RQ1: an impact of varying quantity of the context of the dialogue

3.3 RQ2: The effect of the context of the dialogue that was automatically created

4 discussion and implications for

5 relevant work

6 Conclusion, restrictions and ethical considerations

7 thanks, appreciation and references

Appetite

2 methodology

We study how contextual information about the dialogue affects the consistency of collective designer judgments regarding the importance and benefit of the dialogue response. Here, contextual information indicates information or conversation that precedes a specific response. We perform experiments in two phases. Stage 1 It involves changing the amount of the context of the dialogue to respond to the conditions for the answer Rq1. in Stage 2We differ from the type of previous contextual information available to the treatment conditions for processing Rq2.

2.1 Experimental data and tasks

We use the Redial Dialogue Data set (Li et al The data collection is collected using the human approach, that is, a person behaves as a student in the movie, while the other is the recommendation for the recommendation of a suitable film for the researcher, which makes the data set directed towards a goal. Randomly choose the regulatory responses of 40 squares to set relevant and interest stickers. These dialogues usually consist of 10 to 11 tablets each, with the average length of speech 14 words. We evaluate the same regime’s responses across all experimental conditions.

The task of explanatory comments for the explanations includes two dimensions: (1) The importance: Is the system’s response related to the user’s request, given the context of the dialogue? And (2) interest: How useful is the system response in view of the user information? As for the importance of importance, we ask the conditions to judge the connection of the system’s recommendations with the user’s request (Alonso et al., 2008). First, the broadcaster must judge whether the regime’s response includes the recommendation of a movie or not; If the answer is yes, the project evaluates whether the film meets the user’s preference; If not, we ask them to notice that the speech does not recommend a movie. Judgment on a bilateral scale for the last issue, where the film is either (1) or not (0). For each experimental case (see below), the mechanics evaluate the system response only with reaching the previous context. Note that we give up the user’s observations about the response that was evaluated (the following user’s words) in order to focus on the topical importance of the recommended film, that is, if the film fulfills the user’s request and preference in terms of type, actor, director, etc. A three -point scale (i.e. is somewhat, not useful). Contrary to the importance of the importance, the explanations can reach the following words of the user for the interest task; The interest is allocated to the user, although the film may be in the same type, sometimes the user may not like (for example, the main actor), which makes the system’s response to the user.

2.2 Automatic generation from the various dialogue contexts

You need user information. User information plays an important role when evaluating or improving the quality of data collected in infrared systems (MAO et Al., 2016). Indicate To the specific requirements or inquiries the user, which directs the system to understand its preferences and recover the relevant information to achieve this need. For TDSS, understanding the user’s intention is very important to explaining the evaluation participants, because they are not actual ultimate users. This understanding improves the evaluation stickers alignment with the actual user requirements. We define the user information that needs to be preferred to recommend the movie. Looking at the consistency of the user’s preferences in the Redial Data set, where users tend to maintain one preference during the conversation, providing the user’s initial information needs help in assessing the current turn of importance or interest.

We adopt approach to creating a user information need. One of them is the extraction of the first speech of the user who requests the recommendation of a movie or expresses the preference of a movie, based on phrases such as “Searching for”, “We recommend” and “preference”. These phrases are extracted from the first three user words in a dialogue, with the best 10 common phrases. The second approach depends on LLMS to create a user information need. We assume that LLMS can identify the user’s relevant words in a dialogue and create the needs of the corresponding information. We use GPT-4 (Openai, 2023) to prepare zero; With the context of the dialogue until the current turn as inputs, we demand the model to create the needs of the user information.

Generating dialogue summaries. The summary of the dialogue is useful to provide a quick context for the new participants in the conversation and help people understand the main ideas or search for the main content after conversation, which can increase efficiency and productivity (Feng et al., 2022). We use dialogue summaries to provide broadcasters with the previous rapid context of the dialogue. We use GPT-4 (Openai, 2023) in a zero setting, as in the case of user information needs, but change the claim. We direct GPT-4 to create a summary and information rich, and it is less than half the length of the input dialogue. Each of the user information needs and summary of the user information is combined in stage 2 of group outsourcing experiments.

Because of the LLMS capabilities of hallucinations (Bouyamourn, 2023; Chang et al We clarify the steps we took in section A2.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button