计算机应用 ›› 2015, Vol. 35 ›› Issue (5): 1314-1319.DOI: 10.11772/j.issn.1001-9081.2015.05.1314

• 人工智能 • 上一篇    下一篇

基于多元判别分析的汉语句群自动划分方法

王荣波1, 李杰1, 黄孝喜1, 周昌乐1,2, 谌志群1, 王小华1   

  1. 1. 杭州电子科技大学 认知与智能计算研究所, 杭州 310018;
    2. 厦门大学 智能科学与技术系, 福建 厦门 361005
  • 收稿日期:2014-12-05 修回日期:2014-12-24 出版日期:2015-05-10 发布日期:2015-05-14
  • 通讯作者: 李杰
  • 作者简介:王荣波(1978-),男,浙江义乌人,副教授,博士,CCF会员,主要研究方向:自然语言处理、篇章分析; 李杰(1989-),男,浙江温州人,硕士研究生,主要研究方向:中文信息处理; 黄孝喜(1979-),男,浙江温州人,讲师,博士,主要研究方向:自然语言处理、认知逻辑学;周昌乐(1959-),男,苏州太仓人,教授,博士,主要研究方向:人工智能、中文信息处理; 谌志群(1973-),男,江西南昌人,副教授,硕士,主要研究方向:中文信息处理、语言网络; 王小华(1961-),男,浙江温州人,教授,主要研究方向:自然语言处理、模式识别.
  • 基金资助:

    国家自然科学基金资助项目(61202281,61103101);教育部人文社会科学研究项目青年基金资助项目(10YJCZH052, 12YJCZH201).

Automatic Chinese sentences group method based on multiple discriminant analysis

WANG Rongbo1, LI Jie1, HUANG Xiaoxi1, ZHOU Changle1,2, CHEN Zhiqun1, WANG Xiaohua1   

  1. 1. Institute of Cognitive and Intelligent Computing, Hangzhou Dianzi University, Hangzhou Zhejiang 310018, China;
    2. Department of Intelligent Science and Technology, Xiamen University, Xiamen Fujian 361005, China
  • Received:2014-12-05 Revised:2014-12-24 Online:2015-05-10 Published:2015-05-14

摘要:

针对目前句群划分工作缺乏计算语言学数据支持、忽略篇章衔接词的问题以及当前篇章分析较少研究句群语法单位的现象,提出一种汉语句群自动划分方法.该方法以汉语句群理论为指导,构建汉语句群划分标注评测语料,并且基于多元判别分析(MDA)方法设计了一组评价函数J,从而实现汉语句群的自动划分.实验结果表明,引入切分片段长度因素和篇章衔接词因素可以改善句群划分性能,并且利用Skip-Gram Model比传统的向量空间模型(VSM)有更好的效果,其正确分割率Pμ 达到85.37%、错误分割率WindowDiff降到24.08%.同时该方法在句群划分任务上有更大的优势,比传统MDA方法有更好的句群划分效果.

关键词: 汉语句群划分, 多元判别分析, 篇章分析, Skip-Gram模型, 篇章衔接

Abstract:

In order to solve the problems in Chinese sentence grouping domain, including the lack of computational linguistics data and the joint makers in a discourse, this paper proposed an automatic Chinese sentence grouping method based on Multiple Discriminant Analysis (MDA). Moreover, sentences group was rarely considered as a grammar unit. An annotated evaluation corpus for Chinese sentence group was constructed based on Chinese sentence group theory. And then, a group of evaluation functions J was designed based on the MDA method to realize automatic Chinese sentence grouping. The experimental results show that the length of a segmented unit and one discourse's joint makers contribute to the performance of Chinese sentence group. And the Skip-Gram model has a better effect than the traditional Vector Space Model (VSM). The evaluation parameter Pμ reaches to 85.37% and WindowDiff reduces to 24.08% respectively. The proposed method has better grouping performance than that of the original MDA method.

Key words: Chinese sentences grouping, Multiple Discriminant Analysis (MDA), discourse analysis, Skip-Gram model, discourse coherence

中图分类号: