• •    

基于词汇-语义模式的金融事件信息抽取方法

罗明,黄海量   

  1. 上海财经大学
  • 收稿日期:2017-07-07 修回日期:2017-09-23 发布日期:2017-09-23
  • 通讯作者: 罗明

Information Extraction Method of Financial Event Based on Lexical-Semantic Patterns

  • Received:2017-07-07 Revised:2017-09-23 Online:2017-09-23
  • Contact: Ming LUO

摘要: 信息抽取是自然语言处理工作中的重要任务之一,针对由于自然语言的多样性、歧义性和结构性而导致的信息抽取困难的问题,提出了一种面向金融事件信息抽取的层次化词汇-语义模式方法。在该方法中,作者首先定义了一个金融事件表示模型,然后应用基于深度学习的词向量方法来实现自动生成同义概念词典,最后采用基于有限状态机驱动的层次化词汇-语义规则模式实现了对各类金融事件信息自动抽取的目标。通过实验证明,采用这种方法可以从金融新闻文本中准确地抽取各类金融事件信息,并且对26类金融事件的识别准确率(Precision)达到95.9%,F1值达到91%。

关键词: 词汇-语义模式, 信息抽取, 金融事件, 词向量, 词列表, 概念词典

Abstract: Information Extraction is one of the most important tasks in NLP, in order to solve the problems of Information Extraction which caused by the diversity, ambiguity and structure of natural language, this paper proposed an approach of financial event information extraction based on hierarchical lexical-semantic patterns. In this method, a financial event model is defined at first, and secondly we employ a deep learning based method –word vector to obtain concept gazetteer automatically, in the last some hierarchical lexical-semantic patterns which drivened by finite state machine are used to extract some factors of financial events. The experiment shows that this method can extract the event information flexibly and effectively; the recognition precision of 26 types of financial events reached to 95.9%, F1 value reached to 91%.

Key words: Lexical-Semantic Patterns, Information Extraction, Financial Event, Word Vector, Word List, Gazetteer

中图分类号: