Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
Contact:
边赟1,王海全2,陈义2,崔喆2
通讯作者:
基金资助:
Abstract: Software defect prediction often lacks explainable information, such as defect localization, explanation, and repair suggestions. This limitation makes the prediction results difficult to apply in actual development. To address this problem, this paper proposes an explainable approach for constructing a software defect prediction dataset based on context engineering and large language models (LLMs). In addition, it introduces HandPick, the first accompanying multi-programming-language dataset for software defect prediction. First, the TriCogVuln-LLM method was designed based on software engineering principles and prior defect knowledge, guiding LLMs to sequentially generate function descriptions, CWE defect predictions, and repair suggestions. Next, a consensus voting mechanism was employed to form an optimal ensemble of generative models for defect prediction, thereby improving the quality and diversity of the generated data. Finally, the HandPick dataset was constructed through consensus-driven automated data generation, covering code in four mainstream programming languages. Downstream task validation shows that the Qwen2.5-14B-HandPick model, fine-tuned on the HandPick dataset, achieves significant improvements over baseline models on an independent, publicly available test set, with gains of 19.29, 21.26, 24.11, and 18.30 percentage points in precision, recall, F1 score, and accuracy, respectively. These results highlight substantial improvements in the model’s defect identification and analysis capabilities, which can assist developers in more effectively addressing software defects.
Key words: explainability, software defect prediction, context engineering, large language models, common weakness enumeration
摘要: 针对软件缺陷预测缺乏缺陷定位、缺陷解释及修复建议等可解释性信息,导致预测结果难以在实际开发中应用的问题,提出一种基于上下文工程与大语言模型(LLMs)的、具有可解释性的软件缺陷预测数据集构建方法,并发布了首个配套的多编程语言的软件缺陷预测数据集HandPick。首先,基于软件工程原则和缺陷先验知识,设计了TriCogVuln-LLM方法,引导LLMs依次完成功能描述生成、CWE缺陷预测和缺陷修复建议生成。其次,设计了共识投票机制,构建了缺陷预测的最佳生成模型池,进一步提升了生成数据的质量与多样性。最后,利用共识驱动的自动化数据生成,构建出包含四种主流编程语言的软件缺陷预测数据集——HandPick。下游任务验证表明,与使用基线模型相比,采用HandPick数据集微调后的Qwen2.5-14B-HandPick模型在独立公开测试集的表现显著提升,其精确率、召回率、F1分数与准确率分别提升了19.29、21.26、24.11与18.30个百分点,能够显著提升模型的缺陷识别和分析能力,有望辅助开发人员更好地修复软件缺陷。
关键词: 可解释性, 软件缺陷预测, 上下文工程, 大语言模型, 通用缺陷枚举
CLC Number:
TP391.1
边赟 王海全 陈义 崔喆. 具备可解释性的软件缺陷预测数据集构建方法[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025080987.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025080987