Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (11): 3513-3519.DOI: 10.11772/j.issn.1001-9081.2022010106

• ChinaService 2021 • Previous Articles    

Personal event detection method based on text mining in social media

Rui XIAO, Mingyi LIU, Zhiying TU, Zhongjie WANG()   

  1. Faculty of Computing,Harbin Institute of Technology,Harbin Heilongjiang 150001,China
  • Received:2022-01-27 Revised:2022-03-20 Accepted:2022-04-02 Online:2022-11-14 Published:2022-11-10
  • Contact: Zhongjie WANG
  • About author:XIAO Rui, born in 1997, M. S. candidate. His research interests include service computing.
    LIU Mingyi, born in 1995, Ph. D. His research interests include service computing.
    TU Zhiying, born in 1983, Ph. D., associate professor. His research interests include software engineering, service computing, knowledge engineering, enterprise business modeling.
    WANG Zhongjie, born in 1978, Ph. D., professor. His research interests include service computing, software engineering.
  • Supported by:
    National Natural Science Foundation of China(61772155)

基于社交媒体文本挖掘的个人事件检测方法

肖锐, 刘明义, 涂志莹, 王忠杰()   

  1. 哈尔滨工业大学 计算学部,哈尔滨 150001
  • 通讯作者: 王忠杰
  • 作者简介:肖锐(1997—),男,重庆人,硕士研究生,主要研究方向:服务计算
    刘明义(1995—),男,安徽宣城人,博士,CCF会员,主要研究方向:服务计算
    涂志莹(1983—),男,福建龙岩人,副教授,博士,CCF会员,主要研究方向:软件工程、服务计算、知识工程、企业业务建模
    王忠杰(1978—),男,山东龙口人,教授,博士,CCF高级会员,主要研究方向:服务计算、软件工程。rainy@hit.edu.cn
  • 基金资助:
    国家自然科学基金资助项目(61772155)

Abstract:

Users’ social media contains their past personal experiences and potential life patterns, and the study of their patterns is of great value for predicting users’ future behaviors and performing personalized recommendations for users. By collecting Weibo data, 11 types of events were defined, and a three?stage Pipeline system was proposed to detect personal events by using BERT (Bidirectional Encoder Representations from Transformers) pre?trained models in three stages respectively, including BERT+BiLSTM+Attention, BERT+FullConnect and BERT+BiLSTM+CRF. The information of whether the text contained defined events, the event types of events contained, and the elements contained in each event were extracted from the Weibo, and the specific elements are Subject (subject of the event), Object (event element), Time (event occurrence time), Place (place where the event occurred) and Tense (tense of the event), thereby exploring the change law of user’s personal event timeline to predict personal events. Comparative experiments and analysis were conducted with classification algorithms such as logistic regression, naive Bayes, random forest and decision tree on a collected real user Weibo dataset. Experimental results show that the BERT+BiLSTM+Attention, BERT+FullConnect, BERT+BiLSTM+CRF methods used in three stages achieve the highest F1?score, verifying the effectiveness of the proposed methods. Finally, the personal event timeline was visually built according to the extracted events with time information.

Key words: social media, personal event, event detection, BERT (Bidirectional Encoder Representations from Transformers) model, personal event timeline

摘要:

用户的社交媒体中蕴含着他们过去的个人经历和潜在的生活规律,研究其规律对预测用户未来的行为以及对用户进行个性化推荐有很大的价值。通过收集微博数据,定义了11种类型的事件,并提出了一个三阶段的Pipeline的系统,利用BERT预训练模型,分别在三个阶段使用BERT+BiLSTM+Attention、BERT+FullConnect、BERT+BiLSTM+CRF方法进行个人事件检测。从微博文本中抽取出该文本是否包含定义的事件、包含的事件类型、每种事件包含的元素等信息,具体元素为Subject(事件主语)、Object(事件元素)、Time(事件发生时间)、Place(事件发生的地点)和Tense(事件发生的时态),从而探究用户个人时间轴上的事件变化规律来预测个人事件。在收集的真实用户微博数据集上进行实验,并与逻辑回归、朴素贝叶斯、随机森林、决策树等分类算法进行对比分析。实验结果表明,三个阶段中的BERT+BiLSTM+Attention、BERT+FullConnect和BERT+BiLSTM+CRF方法均取得了最高的F1值,验证了所提方法的有效性。最后根据所提方法抽取出的事件和其中的时间信息可视化地构建了用户的个人事件时间轴

关键词: 社交媒体, 个人事件, 事件检测, BERT模型, 个人事件时间轴

CLC Number: