Journal of Computer Applications

    Next Articles

Unit test generation method via path constraint sharding driven LLM

XU Xiaolong1, WANG Junfeng1,2, WU Peng3, CAO Xiansheng4   

  1. 1.College of Computer Science, Sichuan University 2.National Key Laboratory of Fundamental Science on Synthetic Vision (Sichuan University) 3.School of Artificial Intelligence, Sichuan Tourism University 4.School of Cyber Science and Engineering, Sichuan University
  • Received:2025-09-18 Revised:2025-09-30 Online:2025-10-30 Published:2025-10-30
  • About author:XU Xiaolong, born in 2000, M. S. candidate. His research interests include large language models, automatic test case generation. WANG Junfeng, born in 1976, Ph. D., research fellow. His research interests include network and information security, new technology of industrial software, and space information network. WU Peng, born in 1982, Ph. D., lecturer. His research interests include software supply chain security, software testing. CAO Xiansheng, born in 1991, Ph. D. candidate. His research interests include source code vulnerability analysis, deep learning.
  • Supported by:
    National Natural Science Foundation of China (U24B20147, U2133208); Major Science and Technology Special Project of Sichuan Province (2024ZHCG0195, 2024ZDZX0044, 2024ZYD0269)

基于路径约束分片驱动大模型的单元测试生成方法

徐晓龙1,王俊峰1,2,吴鹏3,曹先省1   

  1. 1.四川大学 计算机学院 2.视觉合成图形图像技术国防重点学科实验室(四川大学) 3.四川旅游学院 人工智能学院 4.四川大学 网络空间安全学
  • 通讯作者: 王俊峰
  • 作者简介:徐晓龙(2000—),男,新疆乌鲁木齐人,硕士研究生,主要研究方向:大语言模型、测试用例自动生成;王俊峰(1976—),男,安徽芜湖人,研究员,博士,主要研究方向:网络信息安全、工业软件新技术;吴鹏(1982—),男,四川成都人,讲师,博士,主要研究方向:软件供应链安全、软件测试;曹先省(1991—),男,山东菏泽人,博士研究生,主要研究方向:源代码脆弱性分析、深度学习。
  • 基金资助:
    国家自然科学基金资助项目(U24B20147, U2133208);四川省重点研发计划项目(2024ZHCG0195,2024ZDZX0044,2024ZYD0269)

Abstract: Automated unit test generation is the key to modern software development to improve development efficiency and ensure software quality assurance. Large Language Model (LLM) is applied to automatic test case generation because of its good code understanding ability, however, when dealing with complex functions, it is difficult to cover deep branch paths. Threefore, PYULLM method was proposed, which combines path constraint sharding with LLM generation ability to solve the above problems. Specifically, all the path constraints were coleected systematically by preorder traversal of the Abstract Syntax Tree. On this basis, a fine-grained relationship between code lines and path constraints was established, intelligent partitioning of path constraint set. This slicing mechanism enables LLM to focus on specific path constraints, which significantly improves the coverage of generating unit test cases in complex scenarios. Experimental results show that compared with the sofa tool Pynguin, PyULLM improves the line coverage by 24.16 percentage points and the branch coverage by 26.61 percentage points. Compared with the current state-of-the-art CODAMOSA method, PyULLM improves the coverage by 19.06 percentage points, branch coverage increased by 21.72 percentage points. The results show that PyULLM can effectively generate unit test cases for complex functions.

Key words: unit test generation, large language model, path constraint fragmentation, testing and analysis, test coverage

摘要: 自动化单元测试生成是现代化软件开发提升开发效率,确保软件质量保障的关键,大语言模型(LLM)因具备良好的代码理解能力被应用于测试用例自动生成,但在处理复杂函数时面临深层分支路径难覆盖等问题。本文提出PyULLM方法,将路径约束分片与LLM的生成能力有机结合以解决上述难题。具体而言:本文通过前序遍历抽象语法树,系统化收集所有的路径约束;在此基础上,建立细粒度的代码行-路径约束映射关系;根据得到的映射关系,对路径约束集合进行智能化分片。这种分片处理机制使LLM能聚焦于特定路径约束,显著提升了复杂场景下生成单元测试用例的覆盖率。实验结果表明,PyULLM相比sofa工具Pynguin,行覆盖率提升24.16个百分点,分支覆盖率提升26.61个百分点;相比当前先进的CODAMOSA方法覆盖率提升19.06个百分点,分支覆盖率提升21.72个百分点。可见,PyULLM能为复杂函数有效生成单元测试用例。

关键词: 单元测试生成, 大语言模型, 路径约束, 测试与分析, 测试覆盖率

CLC Number: