How the researchers used a theory on the accumulation of Copilot issues
Links table
Abstract and 1. Introduction
2. Methodology and 2.1. Search questions
2.2. Collection
2.3. Place a sign of data
2.4. Data extraction
2.5. Data analysis
3. Results and interpretation and 3.1. Type of problems (RQ1)
3.2. Type of causes (RQ2)
3.3. Solutions type (RQ3)
4. The consequences
4.1. The effects of Copilot users
4.2. The effects of Copilot team
4.3. The effects of researchers
5. Justice threats
6. Related work
6.1. Evaluating the quality of code created by Copilot
6.2. Copilot effect on practical development and 6.3. Conclusive summary
7. Conclusions, data, approval, a statement of credit and references contribution
To answer the three RQS in Section 2.1, we created a set of data elements to extract data, as shown in Table 1. D1-D3 data elements intend to extract problem information, basic reasons, and potential solutions from the filtered data to answer the RQ1-RQ3, respectively. These three data elements can be extracted from any part of the GitHub or discussion or to that, such as the title, description of the problem, comments, and discussions.
2.4.1. Experimental data extraction
The first and third author conducted experimental data extract on 20 randomly chosen GitHub issue, 20 discussions, 20 publications, and in the event of any variations, the second author participated to reach consensus. The results indicated that the three data elements can be extracted from our data set. Based on the observation, we created the following criteria for extracting official data: (1) If the same problem is determined by multiple users, we recorded it only once. (2) If multiple problems are determined within the same GitHub case, or a Github discussion, or to that, we record each one separately. (3) With regard to a problem that has multiple reasons mentioned, we only recorded the reason that the problem correspondent or the Copilot team as a fundamental cause. (4) For a problem that contains multiple solutions, we have recorded the solutions confirmed by the problem correspondent or the Copilot team only to solve the problem already.
2.4.2. Extracting official data
The first and third authors conducted the extraction of official data from the database set to extract data elements. After that, they discussed and reached a consensus with the second author on contradictions to ensure commitment to the process of extracting data for pre -defined standards. Each extract data component has been reviewed several times by the three authors to ensure accuracy. The results of the data extraction and registration are collected in MS Excel (Zhou et al., 2024).
It is important to note that it does not include all the data collected and solve the problem. Although we have chosen closed GitHub problems and answered the GitHub discussions and thus published during the data collection stage, the details of each part of the data differ greatly. Sometimes, respondents may provide a problem related to Copilot as a solution without a detailed analysis of this problem, which prevents us from extracting the basic causes. In other situations, although the cause of a problem is determined, the user did not describe the specified accuracy process. For example, the user found it Copilot.It is not possible to work properly on the distant VsCode server “ And I realized that it was due “Bad Network”But no detailed solutions (Discussion #14907). In addition, even when some responses provide causes and solutions, they may not be accepted or effective by the problem correspondent or members of the Copilot team. For example, user request “A way to prepare GitHub Copilot in Google Colab”But the user did not accept nor respond to the proposed three answers (so #72431032). Therefore, we cannot consider any of the three answers as an effective solution to its problem.
2.5. Data analysis
To answer the three RQS made in Section 2, we conducted data analysis using open coding methods and comparing comparison, two techniques that are widely used from the foundation theory during specific data analysis (Stol et al., 2016). Open coding is not limited to pre -existing theoretical frameworks; Instead, researchers encourage the creation of symbols based on actual content within data. These symbols constitute a descriptive summary of the data, with the aim of capturing basic topics. In continuous comparison, researchers constantly compare encrypted data, improve and control groups based on similarities and differences.
The specified process for data analysis includes four steps: 1) The first author reviewed the accurate collected data, then descriptive symbols were appointed to be briefly briefly. For example, the problem was coded in discussion No. 10598 “Stop submitting guaranteed suggestions”The user who noticed that Copilot, who was previously working, suddenly stopped offering code suggestions in VsCode. 2) Compare the first author different symbols to identify common patterns and topics and distinguish between them. Through this repetitive comparison process, similar symbols have been combined with high -level types and categories. For example, the Discussion Blog No. 10598, along with other symbols, was formed in the type of job failure, which belongs to the category The issue of the operation. Once the uncertainty appeared, the first author participated in discussions with the second and third authors to achieve consensus. It should be noted that due to the ongoing nature of comparison, both species and categories underwent several rounds of improvement before reaching their final shape. 4) The initial version of the results of the analysis was verified by the second and third authors, and the negotiating agreement (Campbell et al., 2013) was employed to address disputes. The final results are shown in Section 3.
Authors:
(1) Xu Zhou, College of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(2) Ping Liang (author), College of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(3) Becky Chang, College of Computer Science, Wuhan University, Wuhan, China ([email protected]);
(4) Zengyang Li, College of Computer Science, Central China University, Wuhan, China ([email protected]);
(5) Aakash Ahmed, College of Computing and Communications, University of Lancaster Leipzig, Leipzig, Germany ([email protected]);
(6) Mojtaba Shahin, College of Computing Technologies, RMIT University, Melbourne, Australia ([email protected]);
(7) Mohamed Wasim, College of Information Technology, University of Gifksel, Jevskil, Finland ([email protected]).