计算机应用 ›› 2014, Vol. 34 ›› Issue (5): 1345-1349.DOI: 10.11772/j.issn.1001-9081.2014.05.1345

• 人工智能 • 上一篇    下一篇

基于细粒度特征的话题句识别方法

蒋玉茹1,2,宋柔1,3   

  1. 1. 北京工业大学 计算机学院,北京 100022
    2. 北京信息科技大学 计算机学院,北京 100101;
    3. 北京语言大学 信息科学学院,北京 100083
  • 收稿日期:2013-10-25 修回日期:2014-01-07 出版日期:2014-05-01 发布日期:2014-05-30
  • 通讯作者: 蒋玉茹
  • 作者简介:蒋玉茹(1978-),女,辽宁沈阳人,讲师,博士研究生,CCF会员,主要研究方向:自然语言处理;宋柔(1946-),男,江苏苏州人,教授, CCF会员,主要研究方向:自然语言处理。
  • 基金资助:

    国家自然科学基金资助项目;北京市教委科研计划面上项目

Topic clause identification method based on specific features

JIANG Yuru1,2,SONG Rou1,3   

  1. 1. College of Computer Science, Beijing University of Technology, Beijing 100022, China;
    2. Computer School, Beijing Information Science and Technology University, Beijing 100101, China;
    3. College of Information Science, Beijing Language and Culture University, Beijing 100083, China
  • Received:2013-10-25 Revised:2014-01-07 Online:2014-05-01 Published:2014-05-30
  • Contact: JIANG Yuru

摘要:

话题句(TC)识别中采用穷举方法生成标点句的候选话题句(CTC)影响系统的执行效率和话题句识别的准确率。提出一种新的候选话题句生成方法,利用标点句在篇章中的位置特征、话题的语法特征以及话题串与说明的邻接性特征,指导候选话题句的生成过程。实验结果表明,该方法减少了候选话题句的个数,提高了系统效率。而且,通过与基于穷举式候选话题句生成策略的话题句识别工作进行对比,该方法使单个标点句话题句识别的准确率提高了0.96个百分点,使标点句序列话题句识别的准确率提高了1.31个百分点。

Abstract:

When identifying the Topic Clause (TC) of Punctuation Clause (PClause), the brute-force method to generate Candidate Topic Clause (CTC) causes high time consumption and low accuracy of the identification system. A new CTC generating method was proposed, which used specific features such as the PClause location in the text, the grammatical features of the topic and the adjacent features of topic and its comment. The experimental result shows that the improved method can not only improve the efficiency of the system by reducing the number of CTCs, but also make the accuracy of TC identification for single PClause and PClause sequence increase by 0.96 percentage points and 1.31 percentage points respectively over the current state.

中图分类号: