Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (3): 751-757.DOI: 10.11772/j.issn.1001-9081.2016.03.751

Previous Articles     Next Articles

Blocked person relation recognition system based on multiple features

ZHANG Zhihua1, WANG Jianxiang1, TIAN Junfeng1, WU Guoshun1, LAN Man1,2   

  1. 1. Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China;
    2. Shanghai Key Laboratory of Multidimensional Information Processing (East China Normal University), Shanghai 200241, China
  • Received:2015-08-17 Revised:2015-10-18 Online:2016-03-10 Published:2016-03-17
  • Supported by:
    This work is supported by Science and Technology Commission of Shanghai Municipality (14DZ2260800, 15ZR1410700) and Shanghai Knowledge Service Platform Project (ZF1213).

基于多元特征的分块人物关系识别系统

张志华1, 王建祥1, 田俊峰1, 吴国顺1, 兰曼1,2   

  1. 1. 华东师范大学 计算机科学技术系, 上海 200241;
    2. 上海市多维度信息处理重点实验室(华东师范大学), 上海 200241
  • 通讯作者: 兰曼
  • 作者简介:张志华(1992-),男,上海人,硕士研究生,主要研究方向:情感分析;王建祥(1991-),男,上海人,硕士研究生,主要研究方向:语篇解析;田俊峰(1993-),男,上海人,硕士研究生,主要研究方向:信息检索;吴国顺(1990-),男,上海人,硕士研究生,主要研究方向:自动问答系统;兰曼(1974-),女,上海人,副教授,博士生导师,博士,主要研究方向:自然语言处理。
  • 基金资助:
    上海市科委资助项目(14DZ2260800,15ZR1410700),上海高校知识服务平台项目(ZF1213)。

Abstract: With the rapid development of Internet, huge amount of textual information is accessible on the Internet. The task of reliable person-person relation extraction from Web page has become an import research topic in the field of information extraction. To address this problem, this work implemented a blocked person relation recognition system and adopted abundant of features, i.e., bag-of-word, relevant frequency, Dependency Tree (DT), Named Entity Recognition (NER) features, etc. A series of experiments were conducted to select out optimal feature set and classification algorithm for each relation type to improve the performance. This system was performed on two tasks in China Conference of Machine Learn Competition (CCML Competition) of 2015, to recognize person relation from single or a set of news titles in Chinese (Task1 and Task2, respectively). For these two tasks, this system achieved the MacroF1 score of 75.68% and 76.58%, respectively and ranked the 1st on both tasks.

Key words: person relation recognition, information extraction, feature selection, classification algorithm, feature extraction

摘要: 随着互联网的飞速发展,大量的文本信息被分享到网上,如何在海量的网络信息中提取出可靠性较高的人物关系已成为信息抽取领域中的一个重要研究课题。为深入进行人物关系识别任务在中文方面的研究,提出了基于多元特征的分块人物关系识别系统,设计了较为完备的特征池,包括词袋特征、相关频率特征、依存树(DT)特征、命名实体识别(NER)特征等,为不同的关系从特征池中选择效果最佳的特征集合,并实验了多种基于有监督的机器学习分类算法。本系统在2015年中国机器学习会议竞赛(CCML Competition)举办的两个任务(Task1是从单个新闻标题中判定给定人物的关系;Task2是从多个新闻标题中判定人物的关系)的数据集上分别取得了75.68%和76.58%的MacroF1值,均位列参赛成绩的第一名。

关键词: 人物关系识别, 信息抽取, 特征选择, 分类算法, 特征抽取

CLC Number: