Information extraction method of financial events based on lexical-semantic pattern
LUO Ming1, HUANG Hailiang1,2
1. College of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai 200433, China; 2. Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai 200433, China
Abstract:Information extraction is one of the most important tasks in natural language processing. A hierarchical Lexical-Semantic Pattern (LSP) method for the extraction of financial events was proposed for the problem of information extraction in natural language processing due to linguistic diversity, ambiguity and structure. Firstly, a financial event representation model was defined. Secondly, a word vector method based on deep learning was used to realize the automatic generation of synonymous concept dictionary. Finally, some hierarchical LSPs based on finite state machine were used to extract various kinds of financial events. The experimental results show that by using the proposed method various kinds of financial events can be accurately extracted from the financial news text, and for 26 types of financial events recognition the micro average precision is 93.9%, the micro average recall is 86.9%, the micro average F1 value reaches 90.3%.
罗明, 黄海量. 基于词汇语义模式的金融事件信息抽取方法[J]. 计算机应用, 2018, 38(1): 84-90.
LUO Ming, HUANG Hailiang. Information extraction method of financial events based on lexical-semantic pattern. Journal of Computer Applications, 2018, 38(1): 84-90.
[1] 中国中文信息学会.中文信息处理发展报告[EB/OL].(2016-12-23)[2017-01-15].http://cips-upload.bj.bcebos.com/cips2016.pdf. (Chinese Information Processing Society of China. Chinese information processing development report[EB/OL].(2016-12-23)[2017-01-15]. http://cips-upload.bj.bcebos.com/cips2016.pdf.) [2] LI P, ZHU Q, DIAO H, et al. Joint modeling of trigger identification and event type determination in Chinese event extraction[C]//COLING 2012:Proceedings of the 24th International Conference on Computational Linguistics. Mumbai:[s.n.], 2012:1635-1652. [3] HOGENBOOM F, FRASINCAR F, KAYMAK U, et al. A survey of event extraction methods from text for decision support systems[J]. Decision Support Systems, 2016, 85:12-22. [4] 罗明,黄海量.一种基于有限状态机的中文地址标准化方法[J].计算机应用研究,2016,33(12):3691-3695.(LUO M, HUANG H L. New method of Chinese address standardization based on finite state machine theory[J]. Application Research of Computers, 2016, 33(12):3691-3695.) [5] CHANG C H, CHUANG H M, HUANG C Y, et al. Enhancing POI search on maps via online address extraction and associated information segmentation[J]. Applied Intelligence, 2016, 44(3):539-556. [6] AL ZAMIL M G H, CAN A B, et al. ROLEX-SP:rules of lexical syntactic patterns for free text categorization[J]. Knowledge-Based Systems, 2011, 24(1):58-65. [7] 刘丹丹,彭成,钱龙华,等.词汇语义信息对中文实体关系抽取影响的比较[J].计算机应用,2012,32(8):2238-2244.(LIU D D, PENG C, QIAN L H, et al. Comparative analysis of impact of lexical semantic information on Chinese entity relation extraction[J].Journal of Computer Applications, 2012, 32(8):2238-2244.) [8] 宗成庆.统计自然语言处理[M].2版.北京:清华大学出版社,2013:110-128.(ZONG C Q. Statistical Natural Language Processing[M]. 2nd ed. Beijing:Tsinghua University Press, 2013:110-128.) [9] 李培峰,周国栋,朱巧明.基于语义的中文事件触发词抽取联合模型[J].软件学报,2016,27(2):280-294.(LI P F, ZHOU G D, ZHU Q M. Semantics-based joint model of Chinese event trigger extraction[J]. Journal of Software, 2016, 27(2):280-294.) [10] ATKINSON M, DU M, PISKORSKI J, et al. Techniques for multilingual security-related event extraction from online news[M]//Computational Linguistics. Berlin:Springer, 2013:163-186. [11] 孙明.语义Web使用挖掘若干关键技术研究[D].成都:电子科技大学,2009:37-49.(SUN M. Research on some key issues for semantic Web usage mining[D]. Chengdu:University of Electronic Science and Technology of China, 2009:37-49.) [12] WANG W, ZHAO D, et al. Ontology-based event modeling for semantic understanding of Chinese news story[M]//Natural Language Processing and Chinese Computing. Berlin:Springer, 2012:58-68. [13] ZHANG Y, LIU J. Microblogging short text classification based on Word2Vec[C]//Proceedings of the 2016 International Conference on Electronic, Mechanical, Information and Management.[S.l.]:Atlantis Press, 2016:395-401. [14] CUNNINGHAM H, MAYNARD D, BONTCHEVA K, et al. GATE:a framework and graphical development environment for robust NLP tools and applications[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Oxford:Oxford University Press, 2002:47-51. [15] CUNNINGHAM H, MAYNARD D, TABLAN V, et al. JAPE:a Java Annotation Patterns Engine[EB/OL]. (2000-10-12)[2016-06-12]. http://www.dcs.shef.ac.uk/intranet/research/public/resmes/CS0010.pdf. [16] FUENTES-LORENZO D, NDEZ N, FISTEUS J, et al. Improving large-scale search engines with semantic annotations[J]. Expert Systems with Applications, 2013, 40(6):2287-2296. [17] GOOCH P, ROUDSARI A. Lexical patterns, features and knowledge resources for conference resolution in clinical notes[J]. Journal of Biomedical Informatics, 2012, 45(5):901-912. [18] FERNANDEZ M, PICCOLO L S G, MAYNARD D, et al. Talking climate change via social media:communication, engagement and behavior[C]//Proceedings of the 2016 ACM Conference on Web Science. New York:ACM, 2016:85-94. [19] 王俊华,左万利,彭涛.面向文本的本体学习方法[J].吉林大学学报(工学版),2015,45(1):236-244.(WANG J H, ZUO W L, PENG T. Test-oriented ontology learning methods[J]. Journal of Jilin University (Engineering and Technology Edition), 2015, 45(1):236-244.) [20] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 2013 International Conference on Neural Information Processing Systems. West Chester, OH:Curran Associates Inc., 2013:3111-3119. [21] ALTINEL B, DIRI B, GANIZ M C. A novel semantic smoothing kernel for text classification with class-based weighting[J]. Knowledge-Based Systems, 2015, 89:265-277. [22] ZHANG L, JIANG L, LI C, et al. Two feature weighting approaches for naive Bayes text classifiers[J]. Knowledge-Based Systems, 2016, 100:137-144. [23] ZHANG X, LI Y, KOTAGIRI R, et al. KRNN:K, Rare-class nearest neighbour classification[J]. Pattern Recognition, 2016, 62:33-44. [24] 李文波,孙乐,张大鲲.基于Labeled-LDA模型的文本分类新算法[J].计算机学报,2008,31(4):620-627.(LI W B, SUN L, ZHANG D K. Text classification based on labeled-LDA model[J]. Chinese Journal of Computers, 2008, 31(4):620-627.)