Automatical construction of Chinese knowledge graph system

doi:10.11772/j.issn.1001-9081.2016.04.0992

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (4): 992-996.DOI: 10.11772/j.issn.1001-9081.2016.04.0992

Previous Articles Next Articles

Automatical construction of Chinese knowledge graph system

E Shijia, LIN Peiyu, XIANG Yang

College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China

Received:2015-09-06 Revised:2015-11-12 Online:2016-04-08 Published:2016-04-10
Supported by:
This work is partially supported by the National Basic Research Program (973 Program) of China (2014CB340404), the Shanghai Municipal Science and Technology Research Project (14511108002).

自动化构建的中文知识图谱系统

鄂世嘉, 林培裕, 向阳

同济大学电子与信息工程学院, 上海 201804

通讯作者: 鄂世嘉
作者简介:鄂世嘉(1991-),男,辽宁大连人,博士研究生,CCF会员,主要研究方向:云计算、知识图谱、大数据系统; 林培裕(1993-),男,江苏盐城人,硕士研究生,主要研究方向:知识图谱、大数据系统; 向阳(1962-),男,重庆人,教授,博士,CCF会员,主要研究方向:管理信息系统、云计算、语义计算、大数据挖掘。
基金资助:
国家973计划项目(2014CB340404);上海市科委科研计划项目(14511108002)。

Abstract

Abstract: To solve the problem that the methods currently used to construct Chinese knowledge graph system are time-consuming, have low accuracy and require a lot of manual intervention, an integrated end-to-end automatically constructed solution based on rich data from Chinese encyclopedia was proposed, and a user-oriented Chinese knowledge graph was implemented. In this solution, some property and related text information of the original encyclopedia data were scraped to local system uninterruptedly by the custom Web crawler, and saved as a triple with extended attributes. Through graph-oriented database Cayley and document-oriented database MongoDB, the data in the archived triple files was imported in the back-end system, and then converted to a huge knowledge graph system in order to provide various services dependent on the Chinese knowledge graph in the front-end system. Compared with other knowledge graph systems, the proposed system significantly reduces the construction time; moreover, the number of entities and relations is at least 50% higher than that of the other knowledge graph systems such as YAGO, HowNet and the Chinese Concept Dictionary.

Key words: knowledge graph, Web crawler, triple file, knowledge base, graph-oriented database

摘要： 为解决当前中文知识图谱构建的准确率低、耗时长且需要大量人工参与的问题,提出一种端到端基于中文百科数据的完整中文知识图谱自动化构建解决方案,并在此基础上开发实现了面向用户的中文知识图谱系统。在此方案中,通过自定义的网络爬虫,原始百科数据的词条属性以及相关的文本信息会不间断地被抓取到本地系统中,并以带扩展属性的三元组形式保存。后端系统则自动通过图数据库Cayley以及MongoDB数据库系统,对三元组文件数据进行导入,转换为庞大的知识图谱系统,从而在前端为用户提供丰富的基于知识图谱的应用服务。通过与其他知识图谱系统的比较,该方案在构建时间上明显减少,并且知识图谱中的实体及关系数量总规模高于YAGO、知网(HowNet)和中文概念词典等中文知识图谱系统至少50%。

关键词: 知识图谱, 网络爬虫, 三元组文件, 知识库, 图数据库

CLC Number:

TP311.5

E Shijia, LIN Peiyu, XIANG Yang. Automatical construction of Chinese knowledge graph system[J]. Journal of Computer Applications, 2016, 36(4): 992-996.

鄂世嘉, 林培裕, 向阳. 自动化构建的中文知识图谱系统[J]. 计算机应用, 2016, 36(4): 992-996.

References

[1] LENAT D B. CYC:A large-scale investment in knowledge infrastructure[J]. Communications of the ACM, 1995, 38(11):33-38.
[2] SINGHAL A. Introducing the knowledge graph:things, not strings[EB/OL].[2014-10-10]. https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html#!/2012/05/introducing-knowledge-graph-things-not.html.
[3] SUCHANEK F M, KASNECI G, WEIKUM G. Yago:a core of semantic knowledge[C]//Proceedings of the 16th International Conference on World Wide Web. New York:ACM, 2007:697-706.
[4] SUCHANEK F M, KASNECI G, WEIKUM G. Yago:a large ontology from Wikipedia and WordNet[J]. Web Semantics:Science, Services and Agents on the World Wide Web, 2008, 6(3):203-217.
[5] AUER S, BIZER C, KOBILAROV G, et al. DBpedia:a Nucleus for a Web of Open Data[M]. Berlin:Springer, 2007:722-735.
[6] BIZER C, LEHMANN J, KOBILAROV G, et al. DBpedia-a crystallization point for the Web of data[J]. Web Semantics:Science, Services and Agents on the World Wide Web, 2009, 7(3):154-165.
[7] BOLLACKER K, EVANS C, PARITOSH P, et al. Freebase:a collaboratively created graph database for structuring human knowledge[C]//Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2008:1247-1250.
[8] BUTLER D. Science searches shift up a gear as Google starts Scholar engine[J]. Nature, 2004, 432(7016):423-423.
[9] FERRUCCI D, BROWN E, CHU-CARROLL J, et al. Building Watson:an overview of the DeepQA project[J]. AI Magazine, 2010, 31(3):59-79.
[10] PAVLIDIS Y, MATHIHALLI M, CHAKRAVARTY I, et al. Anatomy of a gift recommendation engine powered by social media[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2012:757-764.
[11] DEROSE P, SHEN W, CHEN F, et al. Building structured Web community portals:a top-down, compositional, and incremental approach[C]//VLDB 2007:Proceedings of the 33rd International Conference on Very Large Data Bases. New York:ACM, 2007:399-410.
[12] NIU F, ZHANG C, RÉ C, et al. DeepDive:Web-scale knowledge-base construction using statistical learning and inference[EB/OL].[2014-10-10]. http://www.cs.stanford.edu/people/chrismre/papers/deepdive_vlds.pdf.
[13] Scrapy 1.0 documentation[EB/OL].[2015-07-11]. http://doc.scrapy.org/en/latest/index.html.
[14] TARJAN R E. Finding optimum branchings[J]. Networks, 1977, 7(1):25-35.
[15] BERNERS-LEE T, HENDLER J, LASSILA O. The semantic Web[J]. Scientific American, 2001, 284(5):28-37.
[16] PANKRATIUS W J. Building an organized knowledge base:concept mapping and achievement in secondary school physics[J]. Journal of Research in Science Teaching, 1990, 27(4):315-333.
[17] ZHU J, NIE Z, LIU X, et al. StatSnowball:a statistical approach to extracting entity relationships[C]//Proceedings of the 18th International Conference on World Wide Web. New York:ACM, 2009:101-110.
[18] DESHPANDE O, LAMBA D S, TOURN M, et al. Building, maintaining, and using knowledge bases:a report from the trenches[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2013:1209-1220.

Automatical construction of Chinese knowledge graph system

自动化构建的中文知识图谱系统

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957.
[2]	Jie WU, Ansi ZHANG, Maodong WU, Yizong ZHANG, Congbao WANG. Overview of research and application of knowledge graph in equipment fault diagnosis [J]. Journal of Computer Applications, 2024, 44(9): 2651-2659.
[3]	Yubo ZHAO, Liping ZHANG, Sheng YAN, Min HOU, Mao GAO. Relation extraction between discipline knowledge entities based on improved piecewise convolutional neural network and knowledge distillation [J]. Journal of Computer Applications, 2024, 44(8): 2421-2429.
[4]	Youren YU, Yangsen ZHANG, Yuru JIANG, Gaijuan HUANG. Chinese named entity recognition model incorporating multi-granularity linguistic knowledge and hierarchical information [J]. Journal of Computer Applications, 2024, 44(6): 1706-1712.
[5]	Jianjing LI, Guanfeng LI, Feizhou QIN, Weijun LI. Multi-relation approximate reasoning model based on uncertain knowledge graph embedding [J]. Journal of Computer Applications, 2024, 44(6): 1751-1759.
[6]	Jie GUO, Jiayu LIN, Zuhong LIANG, Xiaobo LUO, Haitao SUN. Recommendation method based on knowledge‑awareness and cross-level contrastive learning [J]. Journal of Computer Applications, 2024, 44(4): 1121-1127.
[7]	Xiaoyan ZHAO, Yan KUANG, Menghan WANG, Peiyan YUAN. Device-to-device content sharing mechanism based on knowledge graph [J]. Journal of Computer Applications, 2024, 44(4): 995-1001.
[8]	Linqin WANG, Te ZHANG, Zhihong XU, Yongfeng DONG, Guowei YANG. Fusing entity semantic and structural information for knowledge graph reasoning [J]. Journal of Computer Applications, 2024, 44(11): 3371-3378.
[9]	Beijing ZHOU, Hairong WANG, Yimeng WANG, Lisi ZHANG, He MA. Recommendation method using knowledge graph embedding propagation [J]. Journal of Computer Applications, 2024, 44(10): 3252-3259.
[10]	Wenjuan JIANG, Yi GUO, Jiaojiao FU. Reasoning question answering model of complex temporal knowledge graph with graph attention [J]. Journal of Computer Applications, 2024, 44(10): 3047-3057.
[11]	Chunlei WANG, Xiao WANG, Kai LIU. Multimodal knowledge graph representation learning： a review [J]. Journal of Computer Applications, 2024, 44(1): 1-15.
[12]	Hongbin WANG, Xiao FANG, Hong JIANG. Commonsense reasoning and question answering method with three-dimensional semantic features [J]. Journal of Computer Applications, 2024, 44(1): 138-144.
[13]	Hongjun HENG, Dingcheng YANG. Knowledge enhanced aspect word interactive graph neural network [J]. Journal of Computer Applications, 2023, 43(8): 2412-2419.
[14]	Haiwei FAN, Xinsiyu LU, Limiao ZHANG, Yisheng AN. Citation recommendation algorithm fusing knowledge graph and graph attention network [J]. Journal of Computer Applications, 2023, 43(8): 2420-2425.
[15]	Kezheng CHEN, Xiaoran GUO, Yong ZHONG, Zhenping LI. Relation extraction method based on negative training and transfer learning [J]. Journal of Computer Applications, 2023, 43(8): 2426-2430.