计算机应用 ›› 2020, Vol. 40 ›› Issue (10): 2845-2849.DOI: 10.11772/j.issn.1001-9081.2020020280

• 人工智能 • 上一篇    下一篇

基于句子关联图的汉越双语多文档新闻观点句识别

王剑1,2, 唐珊1,2, 黄于欣1,2, 余正涛1,2   

  1. 1. 昆明理工大学 信息工程与自动化学院, 昆明 650500;
    2. 云南省人工智能重点实验室(昆明理工大学), 昆明 650500
  • 收稿日期:2020-03-14 修回日期:2020-04-22 出版日期:2020-10-10 发布日期:2020-05-18
  • 通讯作者: 余正涛
  • 作者简介:王剑(1976-),男,浙江长兴人,副教授,硕士,主要研究方向:自然语言处理、机器学习、软件过程与演化;唐珊(1993-),女,辽宁义县人,硕士研究生,主要研究方向:自然语言处理、跨语言情感分析;黄于欣(1983-),男,河南洛阳人,博士研究生,CCF会员,主要研究方向:自然语言处理、文本摘要;余正涛(1970-),男,云南曲靖人,教授,博士,CCF会员,主要研究方向:自然语言处理、机器翻译、信息检索。
  • 基金资助:
    国家自然科学基金资助项目(61972186,61762056,61472168);云南省高新技术产业专项(201606)。

Chinese-Vietnamese bilingual multi-document news opinion sentence recognition based on sentence association graph

WANG Jian1,2, TANG Shan1,2, HUANG Yuxin1,2, YU Zhengtao1,2   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming Yunnan 650500, China;
    2. Yunnan Key Laboratory of Artificial Intelligence(Kunming University of Science and Technology), Kunming Yunnan 650500, China
  • Received:2020-03-14 Revised:2020-04-22 Online:2020-10-10 Published:2020-05-18
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61972186, 61762056, 61472168), the High-tech Industry Special Project of Yunnan (201606).

摘要: 传统的观点句识别多利用句子内部的情感特征进行分类,而在跨语言的多文档观点句识别任务中,不同语言、不同文档的句子之间具有密切的关联,这些关联特征对于观点句识别有一定的支撑作用。因此,提出一种基于双向长短期记忆(Bi-LSTM)网络框架并融入句子关联特征的汉越双语多文档新闻观点句识别方法。首先提取汉越双语句子的情感要素和事件要素,构建句子关联图,并利用TextRank算法得到句子关联特征;然后基于双语词嵌入和Bi-LSTM将汉语和越语的新闻文本编码在同一个语义空间;最后联合考虑句子编码特征和关联特征进行观点句识别。理论分析和模拟结果表明,融入句子关联图能够有效地提升多文档观点句识别的准确率。

关键词: 汉越双语新闻, 观点句识别, 句子关联图, 事件要素, 情感要素

Abstract: The traditional opinion sentence recognition tasks mainly realize the classification by emotional features inside the sentence. In the task of cross-lingual multi-document opinion sentence recognition, the certain supporting function for opinion sentence recognition was provided by the association between sentences in different languages and documents. Therefore, a Chinese-Vietnamese bilingual multi-document news opinion sentence recognition method was proposed by combining Bi-directional Long Short Term Memory (Bi-LSTM) network framework and sentence association features. Firstly, emotional elements and event elements were extracted from the Chinese-Vietnamese bilingual sentences to construct the sentence association diagram, and the sentence association features were obtained by using TextRank algorithm. Secondly, the Chinese and Vietnamese news texts were encoded in the same semantic space based on the bilingual word embedding and Bi-LSTM. Finally, the opinion sentence recognition was realized by jointly considering the sentence coding features and semantic features. The theoretical analysis and simulation results show that integrating sentence association diagram can effectively improve the precision of multi-document opinion sentence recognition.

Key words: Chinese-Vietnamese bilingual news, opinion sentence recognition, sentence association diagram, event element, emotional element

中图分类号: