计算机应用 ›› 2015, Vol. 35 ›› Issue (4): 1021-1025.

赵佳鹏, 林民   

  1. 内蒙古师范大学 计算机与信息工程学院, 呼和浩特 010022
  • 收稿日期:2014-10-25 修回日期:2014-12-23 发布日期:2015-04-08 出版日期:2015-04-10
  • 通讯作者: 林民
  • 作者简介:赵佳鹏(1990-),男,内蒙古包头人,硕士研究生,主要研究方向:自然语言处理; 林民(1969-),男,内蒙古呼和浩特人,教授,博士,CCF会员,主要研究方向:自然语言处理、人工智能。
Information extraction of history evolution based on Wikipedia

ZHAO Jiapeng, LIN Min   

  1. College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot Nei Mongol 010022, China
  • Received:2014-10-25 Revised:2014-12-23 Online:2015-04-08 Published:2015-04-10



关键词: 软件工程, 历史沿革, 信息抽取, 关键词抽取, TextRank


The domain concepts are complex, various and hard to capture the development of concepts in software engineering. It's difficult for students to understand and remember. A new effective method which extracts the historical evolution information on software engineering was proposed. Firstly, the candidate sets included entities and entity relationships from Wikipedia were extracted with the Nature Language Processing (NLP) and information extraction technology. Secondly, the entity relationships which being closest to historical evolution from the candidate sets were extracted using TextRank; Finally, the knowledge base was constructed by quintuples composed of the neighboring time entities and concept entities with concerning the key entity relationship. In the process of information extraction, TextRank algorithm was improved based on the text semantic features to increase the accuracy rate. The results verify the effectiveness of the proposed algorithm, and the knowledge base can organize the concepts in software engineering field together according to the characteristics of time sequence.

Key words: software engineering, history evolution, information extraction, keyword extraction, TextRank
