基于平行语料库的双语协同中文关系抽取

doi:10.11772/j.issn.1001-9081.2017.04.1051

计算机应用 ›› 2017, Vol. 37 ›› Issue (4): 1051-1055.DOI: 10.11772/j.issn.1001-9081.2017.04.1051

基于平行语料库的双语协同中文关系抽取

郭勃¹, 冯旭鹏², 刘利军¹, 黄青松^1,3

1. 昆明理工大学信息工程与自动化学院, 昆明 650500;
2. 昆明理工大学教育技术与网络中心, 昆明 650500;
3. 云南省计算机技术应用重点实验室(昆明理工大学), 昆明 650500

收稿日期:2016-09-26 修回日期:2016-12-21 出版日期:2017-04-10 发布日期:2017-04-19
通讯作者: 黄青松
作者简介:郭勃(1992-),男,山西晋城人,硕士研究生,主要研究方向:机器学习、自然语言处理;冯旭鹏(1986-),男,河南郑州人,实验师,硕士,主要研究方向:信息检索;刘利军(1978-),男,河南新乡人,讲师,硕士,主要研究方向:医疗信息服务;黄青松(1962-),男,湖南长沙人,教授,主要研究方向:智能信息系统、信息检索。
基金资助:
国家自然科学基金资助项目（81360230，81560296）。

Bilingual collaborative Chinese relation extraction based on parallel corpus

GUO Bo¹, FENG Xupeng², LIU Lijun¹, HUANG Qingsong^1,3

1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
2. Educational Technology and Network Center, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
3. Yunnan Provincial Key Laboratory of Computer Technology Applications(Kunming University of Science and Technology), Kunming Yunnan 650500, China

Received:2016-09-26 Revised:2016-12-21 Online:2017-04-10 Published:2017-04-19
Supported by:
This work is partially supported by the National Natural Science Foundation of China (81360230, 81560296).

摘要/Abstract

摘要： 针对在中文资源的关系抽取中，由于中文长句句式复杂，句法特征提取难度大、准确度低等问题，提出了一种基于平行语料库的双语协同中文关系抽取方法。首先在中英双语平行语料库中的英文语料上利用英文成熟的句法分析工具，将得到依存句法特征用于英文关系抽取分类器的训练，然后与利用适合中文的n-gram特征在中文语料上训练的中文关系抽取分类器构成双语视图，最后再依靠标注映射后的平行语料库，将彼此高可靠性的语料加入对方训练语料进行双语协同训练，最终得到一个性能更好的中文关系抽取分类模型。通过对中文测试语料进行实验，结果表明该方法提高了基于弱监督方法的中文关系抽取性能，其F值提高了3.9个百分点。

关键词: 弱监督学习, 关系抽取, n-gram, 平行语料库, 双语协同训练

Abstract: In the relation extraction of Chinese resources, the long Chinese sentence style is complex, the syntactic feature extraction is very difficult, and its accuracy is low. A bilingual cooperative relation extraction method based on a parallel corpus was proposed to resolve these above problems. In a Chinese and English bilingual parallel corpus, the English relation extraction classification was trained by dependency syntactic features which obtained by mature syntax analytic tools of English, the Chinese relation extraction classification was trained by n-gram feature which is suitable for Chinese, then they constituted bilingual view. Finally, based on the annotated and mapped parallel corpus, the training corpus with high reliability of both classifications were added to each other for bilingual collaborative training, and a Chinese relation extraction classification model with better performance was acquired. Experimental results on Chinese test corpus show that the proposed method improves the performance of Chinese relation extraction method based on weak supervision, its F value is increased by 3.9 percentage points.

Key words: weakly-supervised learning, relation extraction, n-gram, parallel corpus, bilingual collaborative training

中图分类号:

TP391.1

郭勃, 冯旭鹏, 刘利军, 黄青松. 基于平行语料库的双语协同中文关系抽取[J]. 计算机应用, 2017, 37(4): 1051-1055.

GUO Bo, FENG Xupeng, LIU Lijun, HUANG Qingsong. Bilingual collaborative Chinese relation extraction based on parallel corpus[J]. Journal of Computer Applications, 2017, 37(4): 1051-1055.

参考文献

[1] 刘峤, 李杨, 段宏.知识图谱构建技术综述[J]. 计算机研究与发展, 2016, 53(3):582-600.(LIU Q, LI Y, DUAN H. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3):582-600.)
[2] APPELT D E, HOBBS J R, BEAR J, et al. SRI international FASTUS system: MUC-6 test results and analysis[C]//MUC61995: Proceedings of the 6th Conference on Message Understanding. Stroudsburg, PA, USA: Association for Computational Linguistics, 1995:237-248.
[3] AONE C, RAMOS-SANTA M. REES: a large-scale relation and event extraction system[C]//ANLC 2000: Proceedings of the Sixth Conference on Applied Natural Language Processing. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000:76-83.
[4] ZHANG Y, ZHOU J F.A trainable method for extracting Chinese entity names and their relations[C]//CLPW 2000: Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000, 12: 66-72.
[5] ZHANG Z. Weakly-supervised relation classification for information extraction[C]//CIKM 2004: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. New York: ACM, 2004:581-588.
[6] CRAVEN M, KUMLIEN J. Constructing biological knowledge bases by extracting information from text sources[C]//Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press, 1999:77-86.
[7] ZHAO S, GRISHMAN R. Extracting relations with integrated information using kernel methods[C]//ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2005: 419-426.
[8] ZHOU G, SU J, ZHANG J, et al. Exploring various knowledge in relation extraction[C]//ACL 2005: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2005:427-434.
[9] ZELENCO D, AONE C, RICHARDELLA A. Kernel methods for relation extraction[J]. Journal of Machine Learning Research, 2003, 3(6):1083-1106.
[10] SHAWE-TAYLOR J, CRISTIANINI N. Kernel methods for pattern analysis[M]. New York: Cambridge University Press, 2004:25-45.
[11] MILLER S, FOX H, RAMSHAW L, et al. A novel use of statistical parsing to extract information from text[C]//NAACL 2000: Proceedings of the 1st North American Chapter of the Association for Computational Linguistics Conference. Stroudsburg, PA, USA: Association for Computational Linguistics, 2000:226-233.
[12] 陈立玮, 冯岩松, 赵东岩.基于弱监督学习的海量网络数据关系抽取[J]. 计算机研究与发展, 2013, 50(9):1825-1835.(CHEN L W, FENG Y S, ZHAO D Y. Extracting relations from the Web via weakly supervised learning[J]. Journal of Computer Research and Development, 2013, 50(9):1825-1835.)
[13] BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//COLT 1998: Proceedings of the Eleventh Annual Conference on Computational Learning Theory. New York: ACM, 1998:92-100.
[14] KIM S, JEONG M, LEE J, et al. A cross-lingual annotation projection approach for relation detection[C]//COLING 2010: Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010:564-571.
[15] RIEDEL S, YAO L, MCCALLUM A. Modeling relations and their mentions without labeled text[C]//ECML PKDD 2010: Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, Part Ⅲ, LNCS 6323. Berlin: Springer, 2010:148-163.
[16] WANG T, LI Y, BONTOHEVA K, et al. Automatic extraction of hierarchical relations from text[C]//ESWC 2006: Proceedings of the 3rd European Conference on the Semantic Web: Research and Applications. Berlin: Springer, 2006: 215-229.
[17] BROWN P F, PIETRA V J D, PIETRA S A D, et al. The mathematics of statistical machine translation: parameter estimation[J]. Computational Linguistics, 1993, 19(2):263-311.
[18] BOHNET B.Top accuracy and fast dependency parsing is not a contradiction[C]//COLING 2010: Proceedings of the 23rd International Conference on Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics, 2010:89-97.
[19] MAMEFFE M, MACCARTNEY B, MANNING C. Generating typed dependency parses from phrase structure parses[EB/OL].[2016-03-10]. http://www.lrec-conf.org/proceedings/lrec2006/pdf/440_pdf.pdf.
[20] FINKEL J R, GRENAGER T, MANNING C. Incorporating non-local information into information extraction systems by Gibbs sampling[C]//ACL 2005 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 2005:363-370.

基于平行语料库的双语协同中文关系抽取

Bilingual collaborative Chinese relation extraction based on parallel corpus

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	刘雅璇, 钟勇. 基于头实体注意力的实体关系联合抽取方法[J]. 计算机应用, 2021, 41(9): 2517-2522.
[2]	王朱君, 王石, 李雪晴, 朱俊武. 基于深度学习的事件因果关系抽取综述[J]. 计算机应用, 2021, 41(5): 1247-1255.
[3]	崔博文, 金涛, 王建民. 自由文本电子病历信息抽取综述[J]. 计算机应用, 2021, 41(4): 1055-1063.
[4]	王晓霞, 钱雪忠, 宋威. 基于注意力与图卷积网络的关系抽取模型[J]. 计算机应用, 2021, 41(2): 350-356.
[5]	武小平, 张强, 赵芳, 焦琳. 基于BERT的心血管医疗指南实体关系抽取方法[J]. 计算机应用, 2021, 41(1): 145-149.
[6]	张心怡, 冯仕民, 丁恩杰. 面向煤矿的实体识别与关系抽取模型[J]. 计算机应用, 2020, 40(8): 2182-2188.
[7]	薛露, 宋威. 基于动态标签的关系抽取方法[J]. 计算机应用, 2020, 40(6): 1601-1606.
[8]	周健, 黄章进. 基于改进三维形变模型的三维人脸重建和密集人脸对齐方法[J]. 计算机应用, 2020, 40(11): 3306-3313.
[9]	严经纬, 李强, 王春茂, 谢迪, 王保青, 戴骏. 面部运动单元检测研究综述[J]. 计算机应用, 2020, 40(1): 8-15.
[10]	闻畅, 刘宇, 顾进广. 基于注意力机制的双向长短时记忆网络模型突发事件演化关系抽取[J]. 计算机应用, 2019, 39(6): 1646-1651.
[11]	张润岩, 孟凡荣, 周勇, 刘兵. 基于注意力与神经图灵机的语义关系抽取模型[J]. 计算机应用, 2018, 38(7): 1831-1838.
[12]	刘锦文, 邢凯, 芮伟康, 张利萍, 周慧. 基于信息关联拓扑的互联网社交关系挖掘[J]. 计算机应用, 2016, 36(7): 1875-1880.
[13]	闫旸, 赵佳鹏, 李全刚, 张洋, 柳厅文, 时金桥. 面向文本标题的人物关系抽取[J]. 计算机应用, 2016, 36(3): 726-730.
[14]	夭荣朋, 许国艳, 宋健. 基于改进互信息和邻接熵的微博新词发现方法[J]. 计算机应用, 2016, 36(10): 2772-2776.
[15]	邱云飞, 刘世兴, 魏海超, 邵良杉. W-POS语言模型及其选择与匹配算法[J]. 计算机应用, 2015, 35(8): 2210-2214.