汉语短文话题提取系统中SDTF*PDF算法的研究

doi:10.3724/SP.J.1087.2005.00014

计算机应用 ›› 2005, Vol. 25 ›› Issue (01): 14-16.DOI: 10.3724/SP.J.1087.2005.00014

汉语短文话题提取系统中SDTF*PDF算法的研究

陈科，贾焰，杨树强，王永恒

国防科学技术大学计算机学院

发布日期:2011-04-22 出版日期:2005-01-01
基金资助:
国家自然科学基金(60003001)

Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages

CHEN Ke, JIA Yan, YANG Shu-qiang, WANG Yong-heng

College of Computer Science, National University of Defense Technology

Online:2011-04-22 Published:2005-01-01

摘要/Abstract

摘要： 互联网技术得到迅速发展以来,大量信息尤其是文本信息在网上传播。文中面向海量汉语短文话题提取系统中多信源、短文篇幅小的特点,结合词汇语义相似性度量,提出了一个词汇权重计算算法———SDTF PDF(ShortDocumentTermFrequency ProportionalDocumentFrequency),测试表明,基于该算法的汉语短文话题识别系统能够较准确地在海量中文文本信息中自动提取一段时间内(一天或一周,可以指定)的主要话题。

关键词: 汉语短文, 话题识别, SDTFPDF, 词汇语义相似性度量

Abstract: More and more information, especially text information,has spread widely on Internet. To detect hot topics from plenty of Chinese text information,a term weight counting algorithm SDTF*PDF(Short Document Term Frequency * Proportional Document Frequency)was discussed. There were lots of channels in the system implementing this algorithm of detecting topics from short Chinese passages, and the passages in channels were usually short. Results worked out by it indicate that the system of detecting topic from short Chinese passages based on this algorithm can accurately extract the hot topics in a period of time, a day or a week, from enormous Chinese text information.

Key words: short Chinese passages, topic detection, SDTF*PDF, word semantic similarity measure

中图分类号:

TP391.1

陈科，贾焰，杨树强，王永恒. 汉语短文话题提取系统中SDTF*PDF算法的研究[J]. 计算机应用, 2005, 25(01): 14-16.

CHEN Ke, JIA Yan, YANG Shu-qiang, WANG Yong-heng. Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages[J]. Journal of Computer Applications, 2005, 25(01): 14-16.

[1]	邱云飞林明明邵良杉. 基于三维坐标的消费情绪本体库建立及应用[J]. 计算机应用, 2013, 33(09): 2540-2545.
[2]	周慧娟向荣. 基于MicroWindows的多设备支持智能中文输入系统[J]. 计算机应用, 2013, 33(07): 2067-2070.
[3]	杨立公朱俭汤世平. 文本情感分析综述[J]. 计算机应用, 2013, 33(06): 1574-1607.
[4]	王静何婷婷衣马木艾山·阿布都力克木. 协同过滤在中文维基百科类别推荐上的应用[J]. 计算机应用, 2013, 33(03): 838-840.
[5]	修驰宋柔. 基于无监督学习的专业领域分词歧义消解方法[J]. 计算机应用, 2013, 33(03): 780-783.
[6]	贺春林谢琪. 基于协同过滤的个性化Web服务选择方法[J]. 计算机应用, 2013, 33(01): 239-242.
[7]	常晓龙张晖. 融合语素特征的中文褒贬词典构建[J]. 计算机应用, 2012, 32(07): 2033-2037.
[8]	王希杰. 词位标注汉语分词中上下文有效范围定量分析[J]. 计算机应用, 2012, 32(05): 1340-1342.
[9]	李明涛罗军勇尹美娟路林. 结合词义的文本特征词权重计算方法[J]. 计算机应用, 2012, 32(05): 1355-1358.
[10]	张玉芳徐安龙. 改进的基因术语间语义相似度计算方法[J]. 计算机应用, 2012, 32(05): 1329-1331.
[11]	王建江邱涤珊彭黎. 基于排队网络的空间信息数据处理系统效能评估[J]. 计算机应用, 2012, 32(03): 870-873.
[12]	何凤英. 基于语义理解的中文博文倾向性分析[J]. 计算机应用, 2011, 31(08): 2130-2133.
[13]	杨攀李淼张建. 基于短语统计翻译的汉维机器翻译系统[J]. 计算机应用, 2009, 29(07): 2022-2025.
[14]	李智超;何丕廉;雷鸣. 移动计算中一种基于预定义区域的语义圆缓存模型[J]. 计算机应用, 2005, 25(12): 2865-2867.
[15]	陈瑞芬. 一种结合反馈方法的中文文本分类算法[J]. 计算机应用, 2005, 25(12): 2862-2864.

汉语短文话题提取系统中SDTF*PDF算法的研究

Study on SDTF*PDF algorithm implemented in system of topic retrieval from short Chinese passages

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics