Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (11): 3221-3226.DOI: 10.11772/j.issn.1001-9081.2019051108

• The 2019 CCF Conference on Artificial Intelligence (CCFAI2019) • Previous Articles     Next Articles

Improved attribute reduction algorithm and its application to prediction of microvascular invasion in hepatocellular carcinoma

TAN Yongqi1, FAN Jiancong1,2, REN Yande3, ZHOU Xiaoming3   

  1. 1. College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao Shandong 266590, China;
    2. Provincial Key Laboratory for Information Technology of Wisdom Mining of Shandong Province, Qingdao Shandong 266590, China;
    3. The Affiliated Hospital of Qingdao University, Qingdao Shandong 266555, China
  • Received:2019-05-24 Revised:2019-07-18 Online:2019-11-10 Published:2019-09-11
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFC0804406), the Shandong Natural Science Foundation (ZR2018MF009), the "Taishan Scholar" Climbing Plan in Shandong Province.

改进的属性约简算法及其在肝癌微血管侵犯预测中的应用

谭永奇1, 樊建聪1,2, 任延德3, 周晓明3   

  1. 1. 山东科技大学 计算机科学与工程学院, 山东 青岛 266590;
    2. 山东省智慧矿山信息技术重点实验室, 山东 青岛 266590;
    3. 青岛大学附属医院, 山东 青岛 266555
  • 通讯作者: 樊建聪
  • 作者简介:谭永奇(1994-),男,山东济南人,硕士研究生,CCF会员,主要研究方向:数据挖掘、机器学习;樊建聪(1977-),男,山东青岛人,教授,博士,CCF会员,主要研究方向:数据挖掘、机器学习;任延德(1973-),男,山东泰安人,副主任医师,博士,主要研究方向:神经影像学;周晓明(1977-),男,山东青岛人,副主任医师,硕士,主要研究方向:腹部影像学诊断。
  • 基金资助:
    国家重点研发计划项目(2017YFC0804406);山东省自然科学基金资助项目(ZR2018MF009);山东省"泰山学者"攀登计划项目。

Abstract: Focused on the issue that the attribute reduction algorithm based on neighborhood rough set only considers the influence of a single attribute on the decision attribute, and fails to consider the correlation among different attributes, a Neighborhood Rough Set attribute reduction algorithm based on Chi-square test (ChiS-NRS) was proposed. Firstly, the Chi-square test was used to calculate the correlation, and the influence between the related attributes was considered when selecting the important attributes, making the time complexity reduced and the classification accuracy improved. Then, the improved algorithm and the Gradient Boosting Decision Tree (GBDT) algorithm were combined to establish a classification model and the model was verified on UCI datasets. Finally, the proposed model was applied to predict the occurrence of microvascular invasion in hepatocellular carcinoma. The experimental results show that the proposed algorithm has the highest classification accuracy on some UCI datasets compared with the reduction algorithm without reduction and neighborhood rough set reduction algorithm. In the prediction of microvascular invasion in hepatocellular carcinoma, compared with Convolution Neural Network (CNN), Support Vector Machine (SVM) and Random Forest (RF) prediction models, the proposed model has the prediction accuracy of 88.13% in test set, the sensitivity, specificity and the Area Under Curve (AUC) of Receiver Operating Curve (ROC) of 88.89%, 87.5% and 0.90 respectively are the best. Therefore, the prediction model proposed can better predict the occurrence of microvascular invasion in hepatocellular carcinoma and assist doctors to make more accurate diagnosis.

Key words: attribute reduction, Chi-square test, gradient boosting tree, microvascular invasion, neighborhood rough set

摘要: 基于邻域粗糙集的属性约简算法在进行属性约简时只考虑单一属性对决策属性的影响,未能考虑各属性间的相关性,针对这个问题,提出了一种基于卡方检验的邻域粗糙集属性约简算法(ChiS-NRS)。首先,利用卡方检验计算相关性,在筛选重要属性时考虑相关属性之间的影响,在降低时间复杂度的同时提高了分类准确率;然后,将改进的算法与梯度提升决策树(GBDT)算法组合以建立分类模型,并在UCI数据集上对模型进行验证;最后,将该模型应用于预测肝癌微血管侵犯的发生。实验结果表明,与未约简、邻域粗糙集约简等几种约简算法相比,改进算法在一些UCI数据集上的分类准确率最高;在肝癌微血管侵犯预测中,与卷积神经网络(CNN)、支持向量机(SVM)、随机森林(RF)等预测模型相比,提出的模型在测试集上的预测准确率达到了88.13%,其灵敏度、特异度和受试者操作曲线(ROC)的曲线下面积(AUC)分别为87.10%、89.29%和0.90,各指标都达到了最好。因此,所提模型能更好地预测肝癌微血管侵犯的发生,能辅助医生进行更精确的诊断。

关键词: 属性约简, 卡方检验, 梯度提升树, 微血管侵犯, 邻域粗糙集

CLC Number: