Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (4): 1051-1055.DOI: 10.11772/j.issn.1001-9081.2017.04.1051

Previous Articles     Next Articles

Bilingual collaborative Chinese relation extraction based on parallel corpus

GUO Bo1, FENG Xupeng2, LIU Lijun1, HUANG Qingsong1,3   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
    2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
    3. Yunnan Provincial Key Laboratory of Computer Technology Applications(Kunming University of Science and Technology), Kunming Yunnan 650500, China
  • Received:2016-09-26 Revised:2016-12-21 Online:2017-04-10 Published:2017-04-19
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (81360230, 81560296).

基于平行语料库的双语协同中文关系抽取

郭勃1, 冯旭鹏2, 刘利军1, 黄青松1,3   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;
    2. 昆明理工大学 教育技术与网络中心, 昆明 650500;
    3. 云南省计算机技术应用重点实验室(昆明理工大学), 昆明 650500
  • 通讯作者: 黄青松
  • 作者简介:郭勃(1992-),男,山西晋城人,硕士研究生,主要研究方向:机器学习、自然语言处理;冯旭鹏(1986-),男,河南郑州人,实验师,硕士,主要研究方向:信息检索;刘利军(1978-),男,河南新乡人,讲师,硕士,主要研究方向:医疗信息服务;黄青松(1962-),男,湖南长沙人,教授,主要研究方向:智能信息系统、信息检索。
  • 基金资助:
    国家自然科学基金资助项目(81360230,81560296)。

Abstract: In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.

Key words: weakly-supervised learning, relation extraction, n-gram, parallel corpus, bilingual collaborative training

摘要: 针对在中文资源的关系抽取中,由于中文长句句式复杂,句法特征提取难度大、准确度低等问题,提出了一种基于平行语料库的双语协同中文关系抽取方法。首先在中英双语平行语料库中的英文语料上利用英文成熟的句法分析工具,将得到依存句法特征用于英文关系抽取分类器的训练,然后与利用适合中文的n-gram特征在中文语料上训练的中文关系抽取分类器构成双语视图,最后再依靠标注映射后的平行语料库,将彼此高可靠性的语料加入对方训练语料进行双语协同训练,最终得到一个性能更好的中文关系抽取分类模型。通过对中文测试语料进行实验,结果表明该方法提高了基于弱监督方法的中文关系抽取性能,其F值提高了3.9个百分点。

关键词: 弱监督学习, 关系抽取, n-gram, 平行语料库, 双语协同训练

CLC Number: