Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (6): 1565-1573.DOI: 10.11772/j.issn.1001-9081.2019101871

• Artificial intelligence • Previous Articles     Next Articles

Survey of sub-topic detection technology based on internet social media

LI Shanshan1, YANG Wenzhong2,3, WANG Ting1, WANG Lihua1   

  1. 1. College of Software, Xinjiang University, Urumqi Xinjiang 830046, China
    2. College of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
    3. National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data (China Academy of Electronics and Information Technology), Urumqi Xinjiang 830000, China
  • Received:2019-11-01 Revised:2019-12-12 Online:2020-06-10 Published:2020-06-18
  • Contact: YANG Wenzhong, born in 1971, Ph. D. , associate professor. His research interests include Internet public opinion, intelligence analysis, information security, wireless sensor network.
  • About author:LI Shanshan, born in 1996, M. S. candidate. Her research interests include natural language processing, text data mining, information security.YANG Wenzhong, born in 1971, Ph. D. , associate professor. His research interests include Internet public opinion, intelligence analysis, information security, wireless sensor network.WANG Ting, born in 1996, M. S. candidate. Her research interests include natural language processing, text emotional analysis, information security.WANG Lihua, born in 1995, M. S. candidate. Her research interests include natural language processing, text intention detection.
  • Supported by:

    National Key Research and Development Program of China (2017YFC0820702-3), the National Natural Science Foundation of China (U1603115, U1435215), the Laboratory Director Foundation of National Engineering Laboratory for Public Safety Risk Perception and Control by Big Data.

基于网络社交媒体的子话题检测技术综述

理姗姗1, 杨文忠2,3, 王婷1, 王丽花1   

  1. 1.新疆大学 软件学院,乌鲁木齐 830046
    2.新疆大学 信息科学与工程学院,乌鲁木齐 830046
    3.社会安全风险感知与防控大数据应用国家工程实验室(中国电子科学研究院),乌鲁木齐 830000
  • 通讯作者: 杨文忠(1971—)
  • 作者简介:理姗姗(1996—),女,河南周口人,硕士研究生,主要研究方向:自然语言处理、文本数据挖掘、信息安全。杨文忠(1971—),男,河南南阳人,副教授,博士,CCF会员,主要研究方向:网络舆情、情报分析、信息安全、无线传感器网络。王婷(1996—),女,新疆阿克苏人,硕士研究生,主要研究方向:自然语言处理、文本情感分析、信息安全。王丽花(1995—),女,河北邯郸人,硕士研究生,主要研究方向:自然语言处理、文本意图检测。
  • 基金资助:

    国家重点研发计划项目(2017YFC0820702-3);国家自然科学基金资助项目(U1603115,U1435215);社会安全风险感知与防控大数据应用国家工程实验室主任基金资助项目。

Abstract:

The data in internet social media has the characteristics of fast transmission, high user participation and complete coverage compared with traditional media under the background of the rise of various platforms on the internet.There are various topics that people pay attention to and publish comments in, and there may exist deeper and more fine-grained sub-topics in the related information of one topic. A survey of sub-topic detection based on internet social media, as a newly emerging and developing research field, was proposed. The method of obtaining topic and sub-topic information through social media and participating in the discussion is changing people’s lives in an all-round way. However, the technologies in this field are not mature at present, and the researches are still in the initial stage in China. Firstly, the development background and basic concept of the sub-topic detection in internet social media were described. Secondly, the sub-topic detection technologies were divided into seven categories, each of which was introduced, compared and summarized. Thirdly, the methods of sub-topic detection were divided into online and offline methods, and the two methods were compared, then the general technologies and the frequently used technologies of the two methods were listed. Finally, the current shortages and future development trends of this field were summarized.

Key words: sub-topic, Topic Detection and Tracking (TDT), internet social media, topic hierarchy, sub-event

摘要:

在当前多种平台崛起的互联网背景下,与传统媒体相比,网络社交媒体中的数据具有传递速度快、用户参与度高、内容覆盖全等特点,其中存在着人们关注并发布评论的众多话题,而一个话题的相关信息中可能存在更深层次、更细粒度的子话题,针对该问题进行基于网络社交媒体的子话题检测技术的研究,这是一个新兴且不断发展的研究领域。通过社交媒体获取话题及子话题信息并参与讨论,这一方式正全方位、深层次改变着人们的生活,但是该领域技术还不成熟,且相关研究在国内尚处于起步阶段。首先,简述网络社交媒体中子话题检测的发展背景和基本概念;其次,将子话题检测技术分为七大类,对每类方法均加以介绍、对比和总结;然后,将子话题检测方式分为在线检测和离线检测两种方式,并将这两种方式进行对比,列举通用技术及两种方式下的常用技术;最后,概括了该领域当前不足及未来发展趋势。

关键词: 子话题, 话题检测和追踪, 网络社交媒体, 话题层次, 子事件

CLC Number: