Selective generation method of test cases for Chinese text error correction software

doi:10.11772/j.issn.1001-9081.2023010080

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (1): 101-112.DOI: 10.11772/j.issn.1001-9081.2023010080

Special Issue: 人工智能

• Artificial intelligence • Previous Articles Next Articles

Selective generation method of test cases for Chinese text error correction software

Chenghao FENG¹, Zhenping XIE¹^,²(), Bowen DING¹

^1.College of Artificial Intelligence and Computer Science，Jiangnan University，Wuxi Jiangsu 214000，China
^2.Jiangsu Key Laboratory of Media Design and Software Technology （Jiangnan University），Wuxi Jiangsu 214000，China

Received:2023-02-06 Revised:2023-03-28 Accepted:2023-03-29 Online:2023-06-06 Published:2024-01-10
Contact: Zhenping XIE
About author:FENG Chenghao， born in 1997， M. S. candidate. His research interests include intelligent system software.
DING Bowen， born in 1996， M. S. candidate. His research interests include evolutionary algorithms.
Supported by:
National Natural Science Foundation of China(61872166);Jiangsu Provincial “Six Talented Peaks” Project(XYDXX-161)

中文文本纠错软件测试用例的选择生成方法

冯程皓¹, 谢振平¹^,²(), 丁博文¹

^1.江南大学人工智能与计算机学院, 江苏无锡 214000
^2.江苏省媒体设计与软件技术重点实验室(江南大学), 江苏无锡 214000

通讯作者: 谢振平
作者简介:冯程皓（1997—），男，河南焦作人，硕士研究生，主要研究方向：智能系统软件；
丁博文（1996—），男，河南商丘人，硕士研究生，主要研究方向：进化算法。
第一联系人：谢振平（1979—），男，江苏常州人，教授，博士，CCF会员，主要研究方向：知识计算与认知学习；
基金资助:
国家自然科学基金资助项目(61872166);江苏省“六大人才高峰”项目(XYDXX-161)

Abstract

Abstract:

To address the lack of an effective method for generating test cases for Chinese text error correction software， and to measure and optimize the correction performance of software， a multi-user engineering-oriented method was designed， called Selective Generation Method of Test cases for Chinese text error Correction Software （SGMT-CCS）. Two different criteria were defined for evaluating test cases that users can choose from： error quantity density and error type density. SGMT-CCS consists of three modules： test case automatic generation module， test case selection module， and test case priority sorting module. Users can： 1） customize the minimum error interval and the size of the test case set during the automated generation of test cases； 2） customize the minimum error interval and expected value during the selection process； 3） select different criteria for evaluating and prioritizing test cases to meet the requirements of different datasets. Experimental results show that SGMT-CCS can generate effective test cases in a short period of time. The selection module satisfies the user’s customized goals under simulated requirements， and the priority sorting module effectively improves test efficiency in different time periods under different evaluation criteria than before sorting.

Key words: test case generation, Chinese text error correction, selective generation, regression test, Natural Language Processing (NLP)

摘要：

针对目前尚无有效的中文文本纠错软件测试用例生成方法的情况，为了服务于软件纠错性能的测量并为软件提供优化方向，设计了一种面向多用户的、工程化的中文文本纠错软件测试用例选择生成方法（SGMT-CCS）。定义了两种不同的可供用户选择的用例评判标准：错误数量密度和错误种类密度。设计了三个模块：测试用例自动化生成模块、测试用例选择模块以及测试用例优先级排序模块。在SGMT-CCS中，用户可以：1）在测试用例自动化生成的过程中自定义错误最小间隔和用例集大小；2）在测试用例选择的过程中自定义错误最小间隔和期望值；3）在测试用例选择和优先级排序的过程中选择不同的用例评判标准进行自定义操作，以适应不同数据集的要求。实验结果表明，SGMT-CCS能够在较短的时间内获得有效的测试用例，选择模块实验在模拟的需求情况下都能满足用户自定义目标，优先级排序模块实验验证了相较于排序前，在不同评判标准下的不同时间段内都能有效提高测试效率。

关键词: 测试用例生成, 中文文本纠错, 可选择生成, 回归测试, 自然语言处理

CLC Number:

TP391

Chenghao FENG, Zhenping XIE, Bowen DING. Selective generation method of test cases for Chinese text error correction software[J]. Journal of Computer Applications, 2024, 44(1): 101-112.

冯程皓, 谢振平, 丁博文. 中文文本纠错软件测试用例的选择生成方法[J]. 《计算机应用》唯一官方网站, 2024, 44(1): 101-112.

Figures/Tables 18

References 34

1	陈德光，马金林，马自萍，等.自然语言处理预训练技术综述［J］.计算机科学与探索， 2021， 15（8）： 1359-1389.
	CHEN D G， MA J L， MA Z P， et al. Review of pre-training techniques for natural language processing ［J］. Journal of Frontiers of Computer Science and Technology， 2021， 15（8）： 1359-1389.
2	丁雅婷，伍麟.自然语言处理预测抑郁症的技术陷阱与道德风险［J］.心理科学， 2022， 45（5）： 1267-1272.
	DING Y T， WU L. Technology trap and moral hazard of natural language processing in predicting depression ［J］. Journal of Psychological Science， 2022， 45（5）： 1267-1272.
3	王颖洁，朱久祺，汪祖民，等.自然语言处理在文本情感分析领域应用综述［J］.计算机应用， 2022， 42（4）： 1011-1020.
	WANG Y J， ZHU J Q， WANG Z M， et al. Review of applications of natural language processing in text sentiment analysis ［J］. Journal of Computer Applications， 2022， 42（4）： 1011-1020.
4	周原.基于自然语言处理的纠错系统架构设计［J］.太原师范学院学报（自然科学版）， 2022， 21（3）： 37-41， 46.
	ZHOU Y. Architecture design of error correction system based on natural language processing ［J］. Journal of Taiyuan Normal University （Natural Science Edition）， 2022， 21（3）： 37-41， 46.
5	杨暑东.Emoji自然语言处理综述［J］.计算机应用与软件， 2022， 39（9）： 11-20， 44. 10.3969/j.issn.1000-386x.2022.09.002
	YANG S D. Survey on Emoji-embedded natural language processing ［J］. Computer Applications and Software， 2022， 39（9）： 11-20， 44. 10.3969/j.issn.1000-386x.2022.09.002
6	王晓琳，曾红卫，林玮玮.敏捷开发环境中的回归测试优化技术［J］.计算机学报， 2019， 42（10）： 2323-2338. 10.11897/SP.J.1016.2019.02323
	WANG X L， ZENG H W， LIN W W. Techniques for regression testing in agile development environment ［J］. Chinese Journal of Computers， 2019， 42（10）： 2323-2338. 10.11897/SP.J.1016.2019.02323
7	邓永康.基于神经机器翻译的中文文本纠错研究［D］.武汉：武汉大学， 2020： 32-40.
	DENG Y K. Research of Chinese text correction based on neural machine translation ［D］. Wuhan： Wuhan University， 2020： 32-40.
8	CHEN L， LI Q. Automated test case generation from use case： a model based approach ［C］// Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology. Piscataway： IEEE， 2010： 372-377. 10.1109/iccsit.2010.5563772
9	SABER T， DELAVERNHE F， PAPADAKIS M， et al. A hybrid algorithm for multi-objective test case selection ［C］// Proceedings of the 2018 IEEE Congress on Evolutionary Computation. Piscataway： IEEE， 2018： 225-237. 10.1109/cec.2018.8477875
10	TYAGI M， MALHOTRA S. Test case prioritization using multi objective particle swarm optimizer ［C］// Proceedings of the 2014 International Conference on Signal Propagation and Computer Technology. Piscataway： IEEE， 2014： 390-395. 10.1109/icspct.2014.6884931
11	EPITROPAKIS M G， YOO S， HARMAN M， et al. Empirical evaluation of Pareto efficient multi-objective regression test case prioritisation ［C］// Proceedings of the 2015 International Symposium on Software Testing and Analysis. New York： ACM， 2015： 234-245. 10.1145/2771783.2771788
12	王廷永，黄松.测试用例自动生成技术综述［J］.电子技术与软件工程， 2021（18）： 51-53.
	WANG T Y， HUANG S. A survey of test case automatic generation technology ［J］. Electronic Technology & Software Engineering， 2021（18）： 51-53.
13	DURAN J W， NTAFOS S C. An evaluation of random testing ［J］. IEEE Transactions on Software Engineering， 1984， SE-10（4）： 438-444. 10.1109/tse.1984.5010257
14	CHEN T Y， F-C KUO， LIU H， et al. Code coverage of adaptive random testing ［J］. IEEE Transactions on Reliability， 2013， 62（1）： 226-237. 10.1109/tr.2013.2240898
15	GANESH V， KIEZUN A， ARTZI S， et al. HAMPI： A string solver for testing analysis and vulnerability detection ［C］// Proceedings of the 23rd International Conference on Computer Aided Verification. Berlin： Springer， 2011： 1-19. 10.1007/978-3-642-22110-1_1
16	HARMAN M， McMINN P. A theoretical and empirical study of search-based testing： local global and hybrid search ［J］. IEEE Transactions on Software Engineering， 2010， 36（2）： 226-247. 10.1109/tse.2009.71
17	HEMMATI H， ARCURI A， BRIAND L. Achieving scalable model-based testing through test case diversity ［J］. ACM Transactions on Software Engineering and Methodology， 2013， 22（1）： No.6. 10.1145/2430536.2430540
18	DAMIA A H， ESNAASHARI M M. Automated test data generation using a combination of firefly algorithm and asexual reproduction optimization algorithm ［J］. International Journal of Web Research， 2020， 3（1）： 19-28.
19	ROTHERMEL G， HARROLD M J. Analyzing regression test selection techniques ［J］. IEEE Transactions on Software Engineering， 1996， 22（8）： 529-551. 10.1109/32.536955
20	陈晓琪，谢振平，刘渊，等.基于动态赋权近邻传播的数据增量采样方法［J］.软件学报， 2021， 32（12）： 3884-3900.
	CHEN X Q， XIE Z P， LIU Y， et al. Incremental data sampling method using affinity propagation with dynamic weighting ［J］. Journal of Software， 2021， 32（12）： 3884-3900.
21	程雪梅，杨秋辉，翟宇鹏，等.基于半监督聚类方法的测试用例选择技术［J］.计算机科学， 2018， 45（1）： 249-254. 10.11896/j.issn.1002-137X.2018.01.044
	CHENG X M， YANG Q H， ZHAI Y P， et al. Test case selection technique based on semi-supervised clustering method ［J］. Computer Science， 2018， 45（1）： 249-254. 10.11896/j.issn.1002-137X.2018.01.044
22	GUPTA N， SHARMA A， PACHARIYA M K. An insight into test case optimization： ideas and trends with future perspectives ［J］. IEEE Access， 2019， 7： 22310-22327. 10.1109/access.2019.2899471
23	MAIA C L B， CARMO R A F D， FREITAS F G D， et al. A multi-objective approach for the regression test case selection problem ［C］// Proceedings of the XLI Simpsio Brasileiro de Pesquisa Operacional. Rio de Janeiro： SOBRAPO， 2009： 1824-1835.
24	SOUZA L， PRUDÊNCIO R， BARROS F. Multi-objective test case selection： a study of the influence of the catfish effect on PSO based strategies ［C］// Proceedings of the 2014 Anais do Workshop de Testes e Tolerância a Falhas. Porto Alegre： Sociedade Brasileira de Computação， 2014： 3-16. 10.5753/wtf.2014.22943
25	CHOUDHARY A， AGRAWAL A P， KAUR A. An effective approach for regression test case selection using Pareto based multi-objective harmony search ［C］// Proceedings of the 2018 IEEE/ACM 11th International Workshop on Search-Based Software Testing. New York： ACM， 2018： 13-20. 10.1145/3194718.3194722
26	屈波，聂长海，徐宝文.回归测试中测试用例优先级技术研究综述［J］.计算机科学与探索， 2009， 3（3）： 225-233. 10.3724/sp.j.1016.2008.00431
	QU B， NIE C H， XU B W. Survey of test case prioritization for regression testing ［J］. Journal of Frontiers of Computer Science and Technology， 2009， 3（3）： 225-233. 10.3724/sp.j.1016.2008.00431
27	陈翔，陈继红，鞠小林，等.回归测试中的测试用例优先排序技术述评［J］.软件学报， 2013， 24（8）： 1695-1712. 10.3724/sp.j.1001.2013.04420
	CHEN X， CHEN J H， JU X L， et al. Survey of test case prioritization techniques for regression testing ［J］. Journal of Software， 2013， 24（8）： 1695-1712. 10.3724/sp.j.1001.2013.04420
28	李兴佳，杨秋辉，洪玫，等.基于历史数据和多目标优化的测试用例排序方法［J］.计算机应用， 2023， 43（1）： 221-226.
	LI X J， YANG Q H， HONG M， et al. Test case prioritization approach based on historical data and multi-objective optimization ［J］. Journal of Computer Applications， 2023， 43（1）： 221-226.
29	AMMAR A， BAHAROM S， GHANI A A A， et al. The effectiveness of an enhanced weighted method with a unique priority value for test case prioritization in regression testing ［J］. International Journal of Engineering & Technology， 2018， 7（4.31）： 20-27.
30	MARCHETTO A， ISLAM M M， ASGHAR W， et al. A multi-objective technique to prioritize test cases ［J］. IEEE Transactions on Software Engineering， 2016， 42（10）： 918-940. 10.1109/tse.2015.2510633
31	Y-H TSENG， LEE L-H， CHANG L-P， et al. Introduction to SIGHAN 2015 bake-off for Chinese spelling check ［C］// Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing. Stroudsburg， PA： Association for Computational Linguistics， 2015： 27-32. 10.18653/v1/w15-3106
32	GALEOTTI J P， FRASER G， ARCURI A. Extending a search-based test generator with adaptive dynamic symbolic execution ［C］// Proceedings of the 2014 International Symposium on Software Testing and Analysis. New York： ACM， 2014： 421-424. 10.1145/2610384.2628049
33	AZIZI M， DO H. Graphite： A greedy graph-based technique for regression test case prioritization ［C］// Proceedings of the 2018 IEEE International Symposium on Software Reliability Engineering Workshops. Piscataway： IEEE， 2018： 245-251. 10.1109/issrew.2018.00014
34	RAO G， ZHANG B， XUN E. IJCNLP-2017 task 1： Chinese grammatical error diagnosis ［C］// Proceedings of the IJCNLP 2017. Taipei： Asian Federation of Natural Language Processing， 2017： 1-8. 10.18653/v1/w18-3706

函数	参数	作用
Init	文本本身集合文本分词数集合错误个数集合错误种类频率集合	初始化
Generate	原文本测试集大小	初始化和声
Generate_alter	原文本测试集大小	迭代新和声
CalculateFitness_0	NULL	生成适应度
CalculateFitness_1	NULL	生成适应度

函数	参数	作用
Init	文本本身集合文本分词数集合错误个数集合错误种类频率集合	初始化
Generate	原文本测试集大小	初始化和声
Generate_alter	原文本测试集大小	迭代新和声
CalculateFitness_0	NULL	生成适应度
CalculateFitness_1	NULL	生成适应度

实验序号	错误数量密度		错误种类密度
实验序号	错误最小间隔	期望值	错误最小间隔	期望值
1	3	0.20	2	0.20
2	4	0.20	3	0.20
3	5	0.20	4	0.20
4	2	0.20	5	0.20
5	6	0.20	6	0.20
6	2	0.20	3	0.20
7	2	0.30	3	0.15
8	2	0.25	3	0.10
9	2	0.35	3	0.25
10	2	0.40	3	0.30

实验序号	错误数量密度		错误种类密度
实验序号	错误最小间隔	期望值	错误最小间隔	期望值
1	3	0.20	2	0.20
2	4	0.20	3	0.20
3	5	0.20	4	0.20
4	2	0.20	5	0.20
5	6	0.20	6	0.20
6	2	0.20	3	0.20
7	2	0.30	3	0.15
8	2	0.25	3	0.10
9	2	0.35	3	0.25
10	2	0.40	3	0.30

生成方法	用例集应用场景	是否考虑用例集优化	是否可以重用
SGMT-CCS	任意大小的用例集	是	是
手动生成	小型用例集	否	否
半自动生成	小型用例集（理论上可以生成较大型用例集）	否	否

Selective generation method of test cases for Chinese text error correction software

中文文本纠错软件测试用例的选择生成方法

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 18

References 34

Related Articles 15

Recommended Articles

Metrics

用例集大小/10³	中文文本生成方法	需求分析+生成字词表的时间/s	结合字词表生成用例时间/s
10¹	SGMT-CCS	0	≈15
	手动生成	≥600	≥10×10³
	半自动生成	≥600	≈15
10²	SGMT-CCS	0	≈100
	手动生成	≥600	≥100×10³
	半自动生成	≥600	≈100
10³	SGMT-CCS	0	≈1 000
	手动生成	≥600	≥1 000×10³
	半自动生成	≥600	≈1 000

队伍名称	IP	IF
YNU-HPCC	0.408 6	0.416 7
NTOUA	0.388 9	0.439 8
CVTE	0.606 0	0.297 8
BNU	0.552 7	0.211 8
AL_I_NLP	0.479 1	0.516 4

[1]	Qi SHUAI, Hairui WANG, Guifu ZHU. Chinese story ending generation model based on bidirectional contrastive training [J]. Journal of Computer Applications, 2024, 44(9): 2683-2688.
[2]	Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482.
[3]	Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712.
[4]	Longtao GAO, Nana LI. Aspect sentiment triplet extraction based on aspect-aware attention enhancement [J]. Journal of Computer Applications, 2024, 44(4): 1049-1057.
[5]	Xianfeng YANG, Yilei TANG, Ziqiang LI. Aspect-level sentiment analysis model based on alternating‑attention mechanism and graph convolutional network [J]. Journal of Computer Applications, 2024, 44(4): 1058-1064.
[6]	Baoshan YANG, Zhi YANG, Xingyuan CHEN, Bing HAN, Xuehui DU. Analysis of consistency between sensitive behavior and privacy policy of Android applications [J]. Journal of Computer Applications, 2024, 44(3): 788-796.
[7]	Kaitian WANG, Qing YE, Chunlei CHENG. Classification method for traditional Chinese medicine electronic medical records based on heterogeneous graph representation [J]. Journal of Computer Applications, 2024, 44(2): 411-417.
[8]	Yushan JIANG, Yangsen ZHANG. Large language model-driven stance-aware fact-checking [J]. Journal of Computer Applications, 2024, 44(10): 3067-3073.
[9]	Xinyue ZHANG, Rong LIU, Chiyu WEI, Ke FANG. Aspect-based sentiment analysis method with integrating prompt knowledge [J]. Journal of Computer Applications, 2023, 43(9): 2753-2759.
[10]	Xiaomin ZHOU, Fei TENG, Yi ZHANG. Automatic international classification of diseases coding model based on meta-network [J]. Journal of Computer Applications, 2023, 43(9): 2721-2726.
[11]	Zexi JIN, Lei LI, Ji LIU. Transfer learning model based on improved domain separation network [J]. Journal of Computer Applications, 2023, 43(8): 2382-2389.
[12]	Yao LIU, Xin TONG, Yifeng CHEN. Algorithm path self-assembling model for business requirements [J]. Journal of Computer Applications, 2023, 43(6): 1768-1778.
[13]	Zhongbo HU, Xupeng WANG. Multifactorial backtracking search optimization algorithm for solving automated test case generation problem [J]. Journal of Computer Applications, 2023, 43(4): 1214-1219.
[14]	Xingbin LIAO, Xiaolin QIN, Siqi ZHANG, Yangge QIAN. Review of interactive machine translation [J]. Journal of Computer Applications, 2023, 43(2): 329-334.
[15]	Ming XU, Linhao LI, Qiaoling QI, Liqin WANG. Abductive reasoning model based on attention balance list [J]. Journal of Computer Applications, 2023, 43(2): 349-355.

实验组序号	用例集大小/10³	用例集数
1	10	10
2	100	10
3	1 000	10

实验组序号	用例集大小/10³	用例集数
1	10	10
2	100	10
3	1 000	10

纠错软件	用例集大小/10³	用例集数
讯飞	10	10
讯飞	100	10
讯飞和百度	10	10

纠错软件	用例集大小/10³	用例集数
讯飞	10	10
讯飞	100	10
讯飞和百度	10	10