计算机应用 ›› 2018, Vol. 38 ›› Issue (1): 84-90.DOI: 10.11772/j.issn.1001-9081.2017071678

• 人工智能 • 上一篇    下一篇

基于词汇语义模式的金融事件信息抽取方法

罗明1, 黄海量1,2   

  1. 1. 上海财经大学 信息管理与工程学院, 上海 200433;
    2. 上海财经大学 上海市金融信息技术研究重点实验室, 上海 200433
  • 收稿日期:2017-07-10 修回日期:2017-09-09 出版日期:2018-01-10 发布日期:2018-01-22
  • 通讯作者: 黄海量
  • 作者简介:罗明(1974-),男,重庆人,高级工程师,博士,主要研究方向:数据挖掘、自然语言处理、人工智能;黄海量(1975-),男,江苏无锡人,教授,博士,主要研究方向:大数据及AI方法在财经领域内的应用。
  • 基金资助:
    上海市科技人才计划项目(14XD1421000);上海市科技创新行动计划项目(16511102900)。

Information extraction method of financial events based on lexical-semantic pattern

LUO Ming1, HUANG Hailiang1,2   

  1. 1. College of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China;
    2. Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai 200433, China
  • Received:2017-07-10 Revised:2017-09-09 Online:2018-01-10 Published:2018-01-22
  • Supported by:
    This work is partially supported by the Shanghai Science and Technology Talents Project (14XD1421000), the Shanghai Science and Technology Innovation Action Plan Project (16511102900).

摘要: 信息抽取是自然语言处理工作中的重要任务之一。针对由于自然语言的多样性、歧义性和结构性而导致的信息抽取困难的问题,提出了一种面向金融事件信息抽取的层次化词汇-语义模式方法。首先,定义了一个金融事件表示模型;然后应用基于深度学习的词向量方法来实现自动生成同义概念词典;最后采用基于有限状态机驱动的层次化词汇-语义规则模式实现了对各类金融事件信息自动抽取的目标。实验结果表明,所提方法可以从金融新闻文本中准确地抽取出各类金融事件信息,并且对26类金融事件的微平均识别准确率达到93.9%,微平均召回率达到86.9%,微平均F1值达到90.3%。

关键词: 词汇-语义模式, 信息抽取, 金融事件, 词向量, 词列表, 概念词典

Abstract: Information extraction is one of the most important tasks in natural language processing. A hierarchical Lexical-Semantic Pattern (LSP) method for the extraction of financial events was proposed for the problem of information extraction in natural language processing due to linguistic diversity, ambiguity and structure. Firstly, a financial event representation model was defined. Secondly, a word vector method based on deep learning was used to realize the automatic generation of synonymous concept dictionary. Finally, some hierarchical LSPs based on finite state machine were used to extract various kinds of financial events. The experimental results show that by using the proposed method various kinds of financial events can be accurately extracted from the financial news text, and for 26 types of financial events recognition the micro average precision is 93.9%, the micro average recall is 86.9%, the micro average F1 value reaches 90.3%.

Key words: Lexical-Semantic Pattern (LSP), information extraction, financial event, word vector, word list, concept gazetteer

中图分类号: