基于字典学习的软件缺陷检测算法

doi:10.11772/j.issn.1001-9081.2016.09.2486

计算机应用 ›› 2016, Vol. 36 ›› Issue (9): 2486-2491.DOI: 10.11772/j.issn.1001-9081.2016.09.2486

基于字典学习的软件缺陷检测算法

张蕾, 朱义鑫, 徐春, 于凯

新疆财经大学计算机科学与工程学院, 乌鲁木齐 830000

收稿日期:2016-02-02 修回日期:2016-03-25 发布日期:2016-09-08 出版日期:2016-09-10
通讯作者: 张蕾
作者简介:张蕾(1974-),女,新疆乌鲁木齐人,讲师,硕士,主要研究方向:计算机网络、信息安全、数据挖掘;朱义鑫(1974-),男,湖南湘乡人,讲师,博士,主要研究方向:计算机网络安全、复杂网络传播;徐春(1977-),女,新疆乌鲁木齐人,副教授,博士,主要研究方向:计算机网络、自然语言处理;于凯(1974-),男,新疆乌鲁木齐人,副教授,博士,主要研究方向:复杂网络、信息传播。
基金资助:
国家自然科学基金资助项目（71561025）；新疆社会科学基金资助项目（13CTJ023）；新疆自治区高校科研计划项目（XJEDU2013I27）。

Software defect detection algorithm based on dictionary learning

ZHANG Lei, ZHU Yixin, XU Chun, YU Kai

College of Computer Science and Engineering, Xinjiang University of Finance and Economics, Urumqi Xinjiang 830000, China

Received:2016-02-02 Revised:2016-03-25 Online:2016-09-08 Published:2016-09-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (71561025), the Xinjiang Social Science Foundation (13CTJ023) and the Xinjiang University Scientific Research Project (XJEDU2013I27).

摘要/Abstract

摘要： 针对目前存在的字典学习方法不能有效构造具有鉴别能力字典的问题，提出具有鉴别表示能力的字典学习算法，并将其应用于软件缺陷检测。首先，重新构建稀疏表示模型，通过在目标函数中设计字典鉴别项学习具有鉴别表示能力的字典，使某一类的字典对于本类的样本具有较强的表示能力，对于异类样本的表示效果则很差；其次，添加Fisher准则系数鉴别项，使得不同类的表示系数具有较好的鉴别能力；最后对设计的字典学习模型进行优化求解，以获得具有强鉴别和稀疏表示能力的结构化字典。选择经过预处理的NASA软件缺陷数据集作为实验数据，与主成分分析（PCA）、逻辑回归、决策树、支持向量机（SVM）和代表性的字典学习方法进行对比，结果表明所提出的具有鉴别表示能力的字典学习算法的准确率与F-measure值均有提高，能在改善分类器性能的基础上提高检测精度。

关键词: 字典学习, 稀疏表示, Fisher准则, 软件缺陷检测, 机器学习

Abstract: Since the exsiting dictionary learning methods can not effectively construct discriminant structured dictionary, a discriminant dictionary learning method with discriminant and representative ability was proposed and applied in software defect detection. Firstly, sparse representation model was redesigned to train structured dictionary by adding the discriminant constraint term into the object function, which made the class-dictionary have strong representation ability for the corresponding class-samples but poor representation ability for the irrelevant class-samples. Secondly, the Fisher criterion discriminant term was added to make the representative coefficients have discriminant ability in different classes. Finally, the optimization of the designed dictionary learning model was solved to obtain strongly structured and sparsely representative dictionary. The NASA defect dataset was selected as the experiment data, and compared with Principal Component Analysis (PCA), Logistics Regression (LR), decision tree, Support Vector Machine (SVM) and the typical dictionary learning method, the accuracy and F-measure value of the proposed method were both increased. Experimental results indicate that the proposed method can increase detection accuracy with improving the classifier performance.

Key words: dictionary learning, sparse representation, Fisher criterion, software defect detection, machine learning

中图分类号:

TP399

张蕾, 朱义鑫, 徐春, 于凯. 基于字典学习的软件缺陷检测算法[J]. 计算机应用, 2016, 36(9): 2486-2491.

ZHANG Lei, ZHU Yixin, XU Chun, YU Kai. Software defect detection algorithm based on dictionary learning[J]. Journal of Computer Applications, 2016, 36(9): 2486-2491.

参考文献

[1] BAGGEN R, CORREIA J P, SCHILL K, et al. Standardized code quality benchmarking for improving software maintainability [J]. Software Quality Journal, 2012, 20(2): 287-307.
[2] SHEPPERD M, SONG Q, SUN Z, et al. Data quality: some comments on the nasa software defect datasets [J]. IEEE Transactions on Software Engineering, 2013, 39(9): 1208-1215.
[3] MA Y, LUO G, ZENG X, et al. Transfer learning for cross-company software defect prediction [J]. Information and Software Technology, 2012, 54(3): 248-256.
[4] WANG S, YAO X. Using class imbalance learning for software defect prediction [J]. IEEE Transactions on Reliability, 2013, 62(2): 434-443.
[5] SONG Q, JIA Z, SHEPPERD M, et al. A general software defect-proneness prediction framework [J]. IEEE Transactions on Software Engineering, 2011, 37(3): 356-370.
[6] PENG Y, KOU G, WANG G, et al. Ensemble of software defect predictors: an AHP-based evaluation method [J]. International Journal of Information Technology and Decision Making, 2011, 10(1): 187-206.
[7] ZHENG J. Cost-sensitive boosting neural networks for software defect prediction [J]. Expert Systems with Applications, 2010, 37(6): 4537-4543.
[8] GRAY D, BOWES D, DAVEY N, et al. Reflections on the NASA MDP data sets [J]. IET Software, 2012, 6(6): 549-558.
[9] 姜慧研,宗茂,刘相莹.基于ACO-SVM的软件缺陷预测模型的研究[J].计算机学报,2011,34(6):1148-1154.(JIANG H Y, ZONG M, LIU X Y. Research of software defect prediction model based on ACO-SVM [J]. Chinese Journal of Computers, 2011, 34(6): 1148-1154.)
[10] ELISH K O, ELISH M O. Predicting defect-prone software modules using support vector machines [J]. Journal of Systems and Software, 2008, 81(5): 649-660.
[11] KHOSHGOFTAAR T M, SELIYA N. Software quality classification modeling using the SPRINT decision tree algorithm [C]// ICTAI '02: Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence. Washington, DC: IEEE Computer Society, 2002: 365-374.
[12] ARAR Ö F, AYAN K. Software defect prediction using cost-sensitive neural network [J]. Applied Soft Computing, 2015, 33(C): 263-277.
[13] ABDI H, WILLIAMS L J. Principal component analysis [J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459.
[14] VIDAL R, MA Y, SASTRY S. Generalized Principal Component Analysis (GPCA) [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1945-1959.
[15] SELIYA N, KHOSHGOFTAAR T M. Software quality analysis of unlabeled program modules with semisupervised clustering [J]. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 2007, 37(2): 201-211.
[16] BISHNU P S, BHATTACHERJEE V. Software fault prediction using quad tree-based K-means clustering algorithm [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1146-1150.
[17] MA Y, ZHU S, QIN K, et al. Combining the requirement information for software defect estimation in design time [J]. Information Processing Letters, 2014, 114(9): 469-474.
[18] GAO K, KHOSHGOFTAAR T M, WANG H, et al. Choosing software metrics for defect prediction: an investigation on feature selection techniques [J]. Software—Practice and Experience, 2011, 41(5): 579-606.
[19] SMITH L N, ELAD M. Improving dictionary learning: multiple dictionary updates and coefficient reuse [J]. IEEE Signal Processing Letters, 2013, 20(1): 79-82.
[20] YAN R, SHAO L, LIU Y. Nonlocal hierarchical dictionary learning using wavelets for image denoising [J]. IEEE Transactions on Image Processing, 2013, 22(12): 4689-4698.
[21] MAIRAL J, ELAD M, SAPIRO G. Sparse representation for color image restoration [J]. IEEE Transactions on Image Processing, 2008, 17(1): 53-69.
[22] MARCHESINI S. Invited article: a unified evaluation of iterative projection algorithms for phase retrieval [J]. Review of Scientific Instruments, 2007, 78(1): 011301.
[23] LUISIER F, BLU T, UNSER M. A new SURE approach to image denoising: interscale orthonormal wavelet thresholding [J]. IEEE Transactions on Image Processing, 2007, 16(3): 593-606.
[24] YANG M, ZHANG L, YANG J, et al. Metaface learning for sparse representation based face recognition [EB/OL]. [2015-11-26]. http://www4.comp.polyu.edu.hk/~cslzhang/paper/conf/ICIP2010/ICIP10_3551_YM.pdf.
[25] JING X-Y, YING S, ZHANG Z-W, et al. Dictionary learning based software defect prediction [C]// ICSE 2014: Proceedings of the 36th International Conference on Software Engineer. New York: ACM, 2014: 414-423.

基于字典学习的软件缺陷检测算法

Software defect detection algorithm based on dictionary learning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	陈学斌, 任志强, 张宏扬. 联邦学习中的安全威胁与防御措施综述[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1663-1672.
[2]	姚梓豪, 栗远明, 马自强, 李扬, 魏良根. 基于机器学习的多目标缓存侧信道攻击检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1862-1871.
[3]	佘维, 李阳, 钟李红, 孔德锋, 田钊. 基于改进实数编码遗传算法的神经网络超参数优化[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 671-676.
[4]	郑毅, 廖存燚, 张天倩, 王骥, 刘守印. 面向城区的基于图去噪的小区级RSRP估计方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 855-862.
[5]	刘晶鑫, 黄雯静, 徐亮胜, 黄冲, 吴建生. 字典学习与样本关联保持结合的无监督特征选择模型[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3766-3775.
[6]	李博, 黄建强, 黄东强, 王晓英. 基于异构平台的稀疏矩阵向量乘自适应计算优化[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3867-3875.
[7]	陈学斌, 屈昌盛. 面向联邦学习的后门攻击与防御综述[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3459-3469.
[8]	孙仁科, 皇甫志宇, 陈虎, 李仲年, 许新征. 神经架构搜索综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2983-2994.
[9]	柴汶泽, 范菁, 孙书魁, 梁一鸣, 刘竟锋. 深度度量学习综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2995-3010.
[10]	尹春勇, 周永成. 双端聚类的自动调整聚类联邦学习[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3011-3020.
[11]	崔昊阳, 张晖, 周雷, 杨春明, 李波, 赵旭剑. 有序规范实数对多相似度K最近邻分类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2673-2678.
[12]	钟静, 林晨, 盛志伟, 张仕斌. 基于汉明距离的量子K-Means算法[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2493-2498.
[13]	蓝梦婕, 蔡剑平, 孙岚. 非独立同分布数据下的自正则化联邦学习优化方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2073-2081.
[14]	黄晓辉, 杨凯铭, 凌嘉壕. 基于共享注意力的多智能体强化学习订单派送[J]. 《计算机应用》唯一官方网站, 2023, 43(5): 1620-1624.
[15]	郝劭辰, 卫孜钻, 马垚, 于丹, 陈永乐. 基于高效联邦学习算法的网络入侵检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1169-1175.