gtag('config', 'G-0PFHD683JR');
Price Prediction

How the immediate complexity affects the accuracy of the GPT-3.5 mutation

Authors:

(1) Bo Wang, Beijing Jiaotong University, Beijing, China ([email protected]);

(2) Mingda Chen, University of Beijing Jiaotong, Beijing, China ([email protected]);

(3) YouFang Lin, Beijing Jiaotong University, Beijing, China ([email protected]);

(4) Mike Papadakis, University of Luxembourg, Luxembourg ([email protected]);

(5) Jie M. Zhang, King’s College London, London, UK ([email protected]).

Abstract and 1 introduction

2 background and relevant work

3 study design

3.1 Overview of questions and research

3.2 Data sets

3.3 generation of a mutation via llms

3.4 Evaluation measures

3.5 Experience settings

4 evaluation results

4.1 RQ1: Performance for cost and ease of use

4.2 RQ2: Similar behavior

4.3 RQ3: The effects of different claims

4.4 RQ4: Various LLMS effects

4.5 RQ5: Root causes and types of errors from non -applicable mutations

5 discussion

5.1 Allergy to the chosen experience settings

5.2 Archeology

5.3 Frontal threats

6 conclusion and references

4.2 RQ2: Similar behavior

The three lower rows of table 4 represent a comparison of behavior standards for the mutation generation approach.

4.2.1 Discovering real errors. GPT-3.5 382 insects of defects 395 Jests4J and 39 insects of all connected errors 45, i.e. 96.7 % of defects. Codellama-13B 358 insects of 4j (i.e. 90.6 %) and 30 insects of Condefections (i.e. 66.7 %), respectively. Major achieves the second best performance by discovering 362 defects in the defect (i.e., which represents 91.6 %) and 31 durable insects (i.e., representing 68.9 %).

4.2.2 association rate. The conjugation rate measures the degree of conjugation between the created mutations and the real corresponding cells. GPT-3.5 offers 0.416 conjunction rates on 4J and 0.625 defects on Condefects, respectively, achieving the best performance on both data sets. While Codellama-13B achieves associated rates of 0.398 and 0.612, respectively

4.2.3 Ochiai laboratories. The ochiai coefficient measures the semantic similarity between mutations and bugs. GPT-3.5 with 0.638 transactions on 4J and 0.689 defects on Condefects, surpassing Codellama-13B performance, which records 0.39 and 0.378 on the relevant data sets. Despite the noticeable performance gap between the two, their results are consistent with data groups. The second rank is ranked second with 0.519 transactions on 4J and 0.6 defects on the porphies.

4.3 RQ3: The effects of different claims

The left half of Table 7 offers the comparative results of GPT3.5 via various claims listed in section 3.5.3. P1 is gradually demanding the P3 simplifying, each contains less information than its predecessor, while the P4 is the most complex, and the P1 is reinforced with the test suite symbols.

In general, P1, virtual claim, excels in the translation rate and all behavior measures. P2, created by removing a few examples of P1, leads to average generation time, an useless mutation, and an equivalent mutation rate, indicating improved quality in assembleable mutations. P3, which is only provided with the code component to be covered, achieves the lowest cost to use the least symbols. On the contrary, the P4, which extends P1 with the testing suite symbols, shows the lowest performance in all standards, indicating that the GPT-3.5 cannot use the test suite data effectively to enhance the quality of the boom.

This paper Available on Arxiv Under CC with a license of 4.0 bonds (internationally 4.0 support).

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button