Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages

doi:10.3724/SP.J.1087.2005.00014

Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (01): 14-16.DOI: 10.3724/SP.J.1087.2005.00014

• Artificial intelligence • Previous Articles Next Articles

Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages

CHEN Ke, JIA Yan, YANG Shu-qiang, WANG Yong-heng

College of Computer Science, National University of Defense Technology

Online:2011-04-22 Published:2005-01-01

汉语短文话题提取系统中SDTF*PDF算法的研究

陈科，贾焰，杨树强，王永恒

国防科学技术大学计算机学院

基金资助:
国家自然科学基金(60003001)

Abstract

Abstract: More and more information, especially text information,has spread widely on Internet. To detect hot topics from plenty of Chinese text information,a term weight counting algorithm SDTF*PDF(Short Document Term Frequency * Proportional Document Frequency)was discussed. There were lots of channels in the system implementing this algorithm of detecting topics from short Chinese passages, and the passages in channels were usually short. Results worked out by it indicate that the system of detecting topic from short Chinese passages based on this algorithm can accurately extract the hot topics in a period of time, a day or a week, from enormous Chinese text information.

Key words: short Chinese passages, topic detection, SDTF*PDF, word semantic similarity measure

摘要： 互联网技术得到迅速发展以来,大量信息尤其是文本信息在网上传播。文中面向海量汉语短文话题提取系统中多信源、短文篇幅小的特点,结合词汇语义相似性度量,提出了一个词汇权重计算算法———SDTF PDF(ShortDocumentTermFrequency ProportionalDocumentFrequency),测试表明,基于该算法的汉语短文话题识别系统能够较准确地在海量中文文本信息中自动提取一段时间内(一天或一周,可以指定)的主要话题。

关键词: 汉语短文, 话题识别, SDTFPDF, 词汇语义相似性度量

CLC Number:

TP391.1

CHEN Ke, JIA Yan, YANG Shu-qiang, WANG Yong-heng. Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages[J]. Journal of Computer Applications, 2005, 25(01): 14-16.

陈科，贾焰，杨树强，王永恒. 汉语短文话题提取系统中SDTF*PDF算法的研究[J]. 计算机应用, 2005, 25(01): 14-16.

[1]	LI Shanshan, YANG Wenzhong, WANG Ting, WANG Lihua. Survey of sub-topic detection technology based on internet social media [J]. Journal of Computer Applications, 2020, 40(6): 1565-1573.
[2]	QIU Yunfei GUO Milun SHAO Liangshan. Microblog bursty topic detection based on topic tree [J]. Journal of Computer Applications, 2014, 34(8): 2332-2335.
[3]	YANG Wu LI Yang LU Ling. Micro-blog hot topics detection method based on user role orientation [J]. Journal of Computer Applications, 2013, 33(11): 3076-3079.
[4]	PANG Hai-jie. Text sentiment analysis-oriented commodity review detection [J]. Journal of Computer Applications, 2012, 32(07): 2038-2040.
[5]	Xiao peng Tao . Interaction model of community mining and topic detection and tracking [J]. Journal of Computer Applications, 2009, 29(3): 908-911.

Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages

汉语短文话题提取系统中SDTF*PDF算法的研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 5

Recommended Articles

Metrics