计算机应用 ›› 2018, Vol. 38 ›› Issue (3): 626-632.DOI: 10.11772/j.issn.1001-9081.2017082087

• 人工智能 • 上一篇    下一篇

中文电子病历中的时间关系识别

孙健1, 高大启1, 阮彤1, 殷亦超2, 高炬2, 王祺1   

  1. 1. 华东理工大学 信息科学与工程学院, 上海 200237;
    2. 上海中医药大学附属曙光医院, 上海 200021
  • 收稿日期:2017-08-28 修回日期:2017-11-04 出版日期:2018-03-10 发布日期:2018-03-07
  • 通讯作者: 王祺
  • 作者简介:孙健(1993-),女,安徽六安人,硕士研究生,CCF会员,主要研究方向:信息抽取、知识图谱;高大启(1957-),男,湖北襄阳人,教授,博士,CCF会员,主要研究方向:机器嗅觉、智能理论、模式识别;阮彤(1973-),女,江苏扬州人,教授,博士,CCF会员,主要研究方向:信息抽取、知识图谱、数据质量评估;殷亦超(1983-),男,上海人,工程师,硕士,主要研究方向:医院信息化;高炬(1966-),男,上海人,主任医师,硕士,主要研究方向:医院行政管理、中西医结合治疗肝胆病;王祺(1993-),男,江苏苏州人,硕士研究生,CCF会员,主要研究方向:信息抽取、知识图谱、机器翻译。
  • 基金资助:
    国家863计划项目(2015AA020107);国家科技支撑计划项目(2015BAH12F01-05)。

Recognition of temporal relation in Chinese electronic medical records

SUN Jian1, GAO Daqi1, RUAN Tong1, YIN Yichao2, GAO Ju2, WANG Qi1   

  1. 1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China;
    2. Shuguang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai 200021, China
  • Received:2017-08-28 Revised:2017-11-04 Online:2018-03-10 Published:2018-03-07
  • Supported by:
    This work is partially supported by the National High Technology Research and Development Program (863 Program) of China (2015AA020107), the National Key Technology Research and Development Program (2015BAH12F01-05).

摘要: 中文电子病历中的时间关系包括句内时间关系和句间时间关系,其中,句内时间关系包括句内事件-事件的时间关系和句内事件-时间的时间关系,句间时间关系即是句间事件-事件的时间关系。把中文电子病历文本中的时间关系识别转化成实体对分类问题,针对句内时间关系的识别,制定了高准确率的启发式规则,并设计了基本特征、短语句法特征、依存特征和其他特征,训练分类器缓解句内时间关系的识别错误;针对句间时间关系的识别,在高准确率的启发式规则之外,设计了基本特征、短语句法特征和其他特征,训练分类器减少句间时间关系的识别错误。实验结果表明,当分别使用支持向量机(SVM)、SVM和随机森林(RF)算法时,所提方法在句内事件-事件、句内事件-时间和句间事件-事件的时间关系识别上的效果最好,其F1值分别达到了84.0%、85.6%和63.5%。

关键词: 时间关系识别, 实体对分类, 句内时间关系, 句间时间关系, 语言特征

Abstract: The temporal relation or temporal links (denoted by the TLink tag) in Chinese electronic medical records includes temporal relations within a sentence (hereafter referred to as "within-sentence TLinks"), and between-sentence TLinks. Among them, within-sentence TLinks include event/event TLinks and event/time TLinks, and between-sentence TLinks include event/event TLinks. The recognition of temporal relation in Chinese electronic medical record was transformed into classification problem on entity pairs. Heuristic rules with high accuracy were developed and two different classifiers with basic features, phrase syntax, dependency features, and other features were trained to determine within-sentence TLinks. Apart from heuristic rules with high accuracy, basic features, phrase syntax, and other features were used to train the classifiers to determine between-sentence TLinks. The experimental results show that Support Vector Machine (SVM), SVM and Random Forest (RF) algorithms achieve the best performance of recognition on within-sentence event/event TLinks, within-sentence event/time TLinks and between-sentence event/event TLinks, with F1-scores of 84.0%, 85.6% and 63.5% respectively.

Key words: temporal relation recognition, entity pair classification, within-sentence temporal relation, between-sentence temporal relation, linguistic feature

中图分类号: