计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3144-3149.DOI: 10.11772/j.issn.1001-9081.2018041308

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

规则半自动学习的概率软逻辑推理模型

张嘉1, 张晖2, 赵旭剑1, 杨春明1, 李波1,3   

  1. 1. 西南科技大学 计算机科学与技术学院, 四川 绵阳 621010;
    2. 西南科技大学 理学院, 四川 绵阳 621010;
    3. 中国科学技术大学 计算机科学与技术学院, 合肥 230027
  • 收稿日期:2018-04-25 修回日期:2018-06-15 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 张晖
  • 作者简介:张嘉(1992-),男,四川绵阳人,硕士研究生,主要研究方向:数据挖掘;张晖(1972-),男,安徽宿松人,教授,博士,主要研究方向:文本挖掘、知识工程;赵旭剑(1984-),男,四川西昌人,副教授,博士,主要研究方向:中文信息处理、Web信息检索;杨春明(1980-),男,云南华坪人,副教授,硕士,主要研究方向:文本挖掘、知识工程;李波(1977-),男,四川江油人,讲师,博士研究生,主要研究方向:信息过滤、信息安全。
  • 基金资助:
    赛尔网络下一代互联网技术创新项目(NGII20170901);教育部人文社会科学基金资助项目(17YJCZH260);四川省军民融合研究院开放基金资助项目(18sxb017,18sxb028);四川信息管理与服务研究中心基金资助项目(SCTQ2016YB13)。

Probabilistic soft logic reasoning model with semi-automatic rule learning

ZHANG Jia1, ZHANG Hui2, ZHAO Xujian1, YANG Chunming1, LI Bo1,3   

  1. 1. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
    2. School of Science, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
    3. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230027, China
  • Received:2018-04-25 Revised:2018-06-15 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the Sell Network Next Generation Internet Technology Innovation Project (NGII20170901), the Fund for Humanities and Social Sciences of the Ministry of Education (17YJCZH260), the Sichuan Institute of Military and Civilian Integration Open Fund (18sxb017, 18sxb028), the Fund Project of Sichuan Information Management and Service Research Center (SCTQ2016YB13).

摘要: 概率软逻辑(PSL)作为一种基于声明式规则的概率模型,具有极强的扩展性和多领域适应性,目前为止,它需要人为给出大量的常识和领域知识作为规则确立的先决条件,这些知识的获取往往非常昂贵并且其中包含的不正确的信息可能会影响推理的正确性。为了缓解这种困境,将C5.0算法和概率软逻辑相结合,让数据和知识共同驱动推理模型,提出了一种规则半自动学习方法。该方法利用C5.0算法提取规则,再辅以人工规则和优化调节后的规则作为改进的概率软逻辑输入。实验结果表明,在学生成绩预测问题上所提方法比C5.0算法和没有规则学习的概率软逻辑具有更高的精度;和纯手工定义规则的方法相比,所提方法能大幅降低人工成本;和贝叶斯网络(BN)、支持向量机(SVM)等算法相比,该方法也表现出不错的效果。

关键词: 概率软逻辑, 规则自动提取, 机器学习, C5.0算法, 半自动学习

Abstract: Probabilistic Soft Logic (PSL), as a kind of declarative rule-based probability model, has strong extensibility and multi-domain adaptability. So far, it requires a lot of common sense and domain knowledge as preconditions for rule establishment. The acquisition of these knowledge is often very expensive and the incorrect information contained therein may reduce the correctness of reasoning. In order to alleviate this dilemma, the C5.0 algorithm and probabilistic soft logic were combined to make the data and knowledge drive the reasoning model together, and a semi-automatic learning method was proposed. C5.0 algorithm was used to extract rules, and artificial rules and optimized adjusted rules were supplemented as improved probabilistic soft logic input. The experimental results show that the proposed method has higher accuracy than the C5.0 algorithm and the PSL without rule learning on student performance prediction. Compared with the past method with pure hand-defined rules, the proposed method can significantly reduce the manual costs. Compared with Bayesian Network (BN), Support Vector Machine (SVM) and other algorithms, the proposed method also shows good results.

Key words: Probabilistic Soft Logic (PSL), automatically rule extracting, machine learning, C5.0 algorithm, semi-automatic learning

中图分类号: