Abstract:Since the exsiting dictionary learning methods can not effectively construct discriminant structured dictionary, a discriminant dictionary learning method with discriminant and representative ability was proposed and applied in software defect detection. Firstly, sparse representation model was redesigned to train structured dictionary by adding the discriminant constraint term into the object function, which made the class-dictionary have strong representation ability for the corresponding class-samples but poor representation ability for the irrelevant class-samples. Secondly, the Fisher criterion discriminant term was added to make the representative coefficients have discriminant ability in different classes. Finally, the optimization of the designed dictionary learning model was solved to obtain strongly structured and sparsely representative dictionary. The NASA defect dataset was selected as the experiment data, and compared with Principal Component Analysis (PCA), Logistics Regression (LR), decision tree, Support Vector Machine (SVM) and the typical dictionary learning method, the accuracy and F-measure value of the proposed method were both increased. Experimental results indicate that the proposed method can increase detection accuracy with improving the classifier performance.
[1] BAGGEN R, CORREIA J P, SCHILL K, et al. Standardized code quality benchmarking for improving software maintainability [J]. Software Quality Journal, 2012, 20(2): 287-307. [2] SHEPPERD M, SONG Q, SUN Z, et al. Data quality: some comments on the nasa software defect datasets [J]. IEEE Transactions on Software Engineering, 2013, 39(9): 1208-1215. [3] MA Y, LUO G, ZENG X, et al. Transfer learning for cross-company software defect prediction [J]. Information and Software Technology, 2012, 54(3): 248-256. [4] WANG S, YAO X. Using class imbalance learning for software defect prediction [J]. IEEE Transactions on Reliability, 2013, 62(2): 434-443. [5] SONG Q, JIA Z, SHEPPERD M, et al. A general software defect-proneness prediction framework [J]. IEEE Transactions on Software Engineering, 2011, 37(3): 356-370. [6] PENG Y, KOU G, WANG G, et al. Ensemble of software defect predictors: an AHP-based evaluation method [J]. International Journal of Information Technology and Decision Making, 2011, 10(1): 187-206. [7] ZHENG J. Cost-sensitive boosting neural networks for software defect prediction [J]. Expert Systems with Applications, 2010, 37(6): 4537-4543. [8] GRAY D, BOWES D, DAVEY N, et al. Reflections on the NASA MDP data sets [J]. IET Software, 2012, 6(6): 549-558. [9] 姜慧研,宗茂,刘相莹.基于ACO-SVM的软件缺陷预测模型的研究[J].计算机学报,2011,34(6):1148-1154.(JIANG H Y, ZONG M, LIU X Y. Research of software defect prediction model based on ACO-SVM [J]. Chinese Journal of Computers, 2011, 34(6): 1148-1154.) [10] ELISH K O, ELISH M O. Predicting defect-prone software modules using support vector machines [J]. Journal of Systems and Software, 2008, 81(5): 649-660. [11] KHOSHGOFTAAR T M, SELIYA N. Software quality classification modeling using the SPRINT decision tree algorithm [C]// ICTAI '02: Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence. Washington, DC: IEEE Computer Society, 2002: 365-374. [12] ARAR Ö F, AYAN K. Software defect prediction using cost-sensitive neural network [J]. Applied Soft Computing, 2015, 33(C): 263-277. [13] ABDI H, WILLIAMS L J. Principal component analysis [J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433-459. [14] VIDAL R, MA Y, SASTRY S. Generalized Principal Component Analysis (GPCA) [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(12): 1945-1959. [15] SELIYA N, KHOSHGOFTAAR T M. Software quality analysis of unlabeled program modules with semisupervised clustering [J]. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 2007, 37(2): 201-211. [16] BISHNU P S, BHATTACHERJEE V. Software fault prediction using quad tree-based K-means clustering algorithm [J]. IEEE Transactions on Knowledge and Data Engineering, 2012, 24(6): 1146-1150. [17] MA Y, ZHU S, QIN K, et al. Combining the requirement information for software defect estimation in design time [J]. Information Processing Letters, 2014, 114(9): 469-474. [18] GAO K, KHOSHGOFTAAR T M, WANG H, et al. Choosing software metrics for defect prediction: an investigation on feature selection techniques [J]. Software—Practice and Experience, 2011, 41(5): 579-606. [19] SMITH L N, ELAD M. Improving dictionary learning: multiple dictionary updates and coefficient reuse [J]. IEEE Signal Processing Letters, 2013, 20(1): 79-82. [20] YAN R, SHAO L, LIU Y. Nonlocal hierarchical dictionary learning using wavelets for image denoising [J]. IEEE Transactions on Image Processing, 2013, 22(12): 4689-4698. [21] MAIRAL J, ELAD M, SAPIRO G. Sparse representation for color image restoration [J]. IEEE Transactions on Image Processing, 2008, 17(1): 53-69. [22] MARCHESINI S. Invited article: a unified evaluation of iterative projection algorithms for phase retrieval [J]. Review of Scientific Instruments, 2007, 78(1): 011301. [23] LUISIER F, BLU T, UNSER M. A new SURE approach to image denoising: interscale orthonormal wavelet thresholding [J]. IEEE Transactions on Image Processing, 2007, 16(3): 593-606. [24] YANG M, ZHANG L, YANG J, et al. Metaface learning for sparse representation based face recognition [EB/OL]. [2015-11-26]. http://www4.comp.polyu.edu.hk/~cslzhang/paper/conf/ICIP2010/ICIP10_3551_YM.pdf. [25] JING X-Y, YING S, ZHANG Z-W, et al. Dictionary learning based software defect prediction [C]// ICSE 2014: Proceedings of the 36th International Conference on Software Engineer. New York: ACM, 2014: 414-423.