计算机应用 ›› 2016, Vol. 36 ›› Issue (4): 992-996.DOI: 10.11772/j.issn.1001-9081.2016.04.0992

• 大数据 • 上一篇    下一篇

自动化构建的中文知识图谱系统

鄂世嘉, 林培裕, 向阳   

  1. 同济大学 电子与信息工程学院, 上海 201804
  • 收稿日期:2015-09-06 修回日期:2015-11-12 出版日期:2016-04-10 发布日期:2016-04-08
  • 通讯作者: 鄂世嘉
  • 作者简介:鄂世嘉(1991-),男,辽宁大连人,博士研究生,CCF会员,主要研究方向:云计算、知识图谱、大数据系统; 林培裕(1993-),男,江苏盐城人,硕士研究生,主要研究方向:知识图谱、大数据系统; 向阳(1962-),男,重庆人,教授,博士,CCF会员,主要研究方向:管理信息系统、云计算、语义计算、大数据挖掘。
  • 基金资助:
    国家973计划项目(2014CB340404);上海市科委科研计划项目(14511108002)。

Automatical construction of Chinese knowledge graph system

E Shijia, LIN Peiyu, XIANG Yang   

  1. College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
  • Received:2015-09-06 Revised:2015-11-12 Online:2016-04-10 Published:2016-04-08
  • Supported by:
    This work is partially supported by the National Basic Research Program (973 Program) of China (2014CB340404), the Shanghai Municipal Science and Technology Research Project (14511108002).

摘要: 为解决当前中文知识图谱构建的准确率低、耗时长且需要大量人工参与的问题,提出一种端到端基于中文百科数据的完整中文知识图谱自动化构建解决方案,并在此基础上开发实现了面向用户的中文知识图谱系统。在此方案中,通过自定义的网络爬虫,原始百科数据的词条属性以及相关的文本信息会不间断地被抓取到本地系统中,并以带扩展属性的三元组形式保存。后端系统则自动通过图数据库Cayley以及MongoDB数据库系统,对三元组文件数据进行导入,转换为庞大的知识图谱系统,从而在前端为用户提供丰富的基于知识图谱的应用服务。通过与其他知识图谱系统的比较,该方案在构建时间上明显减少,并且知识图谱中的实体及关系数量总规模高于YAGO、知网(HowNet)和中文概念词典等中文知识图谱系统至少50%。

关键词: 知识图谱, 网络爬虫, 三元组文件, 知识库, 图数据库

Abstract: To solve the problem that the methods currently used to construct Chinese knowledge graph system are time-consuming, have low accuracy and require a lot of manual intervention, an integrated end-to-end automatically constructed solution based on rich data from Chinese encyclopedia was proposed, and a user-oriented Chinese knowledge graph was implemented. In this solution, some property and related text information of the original encyclopedia data were scraped to local system uninterruptedly by the custom Web crawler, and saved as a triple with extended attributes. Through graph-oriented database Cayley and document-oriented database MongoDB, the data in the archived triple files was imported in the back-end system, and then converted to a huge knowledge graph system in order to provide various services dependent on the Chinese knowledge graph in the front-end system. Compared with other knowledge graph systems, the proposed system significantly reduces the construction time; moreover, the number of entities and relations is at least 50% higher than that of the other knowledge graph systems such as YAGO, HowNet and the Chinese Concept Dictionary.

Key words: knowledge graph, Web crawler, triple file, knowledge base, graph-oriented database

中图分类号: