计算机应用 ›› 2013, Vol. 33 ›› Issue (05): 1446-1449.DOI: 10.3724/SP.J.1087.2013.01446

• 计算机软件技术 • 上一篇    下一篇

基于自然语言处理的通用信息模型自动调试

项炜1,2   

  1. 1. 乐山师范学院 计算机科学学院,四川 乐山 614000
    2. 乐山师范学院 智能信息处理及应用实验室,四川 乐山 614000
  • 收稿日期:2012-10-24 修回日期:2012-12-14 出版日期:2013-05-01 发布日期:2013-05-08
  • 通讯作者: 项炜
  • 作者简介:项炜(1977-),男,四川青神人,讲师,硕士,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:

    四川省教育厅青年基金资助项目(11ZB134)

Automated debug for common information model defect using natural language processing algorithm

XIANG Wei1,2   

  1. 1. Laboratory of Intelligent Information Processing and Application, Leshan Normal University, Leshan Sichuan 614000, China
    2. School of Computer Science, Leshan Normal University, Leshan Sichuan 614000, China
  • Received:2012-10-24 Revised:2012-12-14 Online:2013-05-08 Published:2013-05-01
  • Contact: XIANG Wei
  • Supported by:

    Youth fund project from education office in sichuan province

摘要: 通用信息模型(CIM) 是工业界的一种公开标准,并已实现于很多产品中,大量的bug被发现和修复。为了减少了人工查找错误根源所需的时间和精力,提出一种基于自然语言处理的方法对CIM 的bug进行自动调试。首先使用最大熵模型对已解决bug的文档描述进行分词,然后基于构建的词典使用simHash找出那些重复性很大的已修复的bug,最后使用文档处理的方法分析客户提供的trace找出问题所在和解决方法。实验结果取得了87.5%准确率, 表明了该方法的有效性。

关键词: 通用信息模型, 自然语言处理, 最大熵模型, 调试, 文档处理

Abstract: Common Information Model (CIM) is an open industrial standard, which has been implemented in products of many companies. Meanwhile, there are lots of bugs being reported and fixed. In order to reduce the cost time and effort of finding the root cause, in this paper, a method to debug automatically was proposed based on natural language processing algorithm. It firstly segmented those sentences using maximum entropy model, then used simHash to find the most similar fixed bug based on specifically constructed dictionary, finally used text mining to find the root cause and solution via analyzing the trace provided by customer. The experimental result achieves 87.5% accuracy, which shows its effectiveness.

Key words: Common Information Model (CIM), natural language processing, maximum entropy model, debug, text processing

中图分类号: