Journal of Computer Applications
• Database Technology • Previous Articles Next Articles
Yuan-yuan LI Yong-qiang MA
Received:
Revised:
Online:
Published:
Contact:
李媛媛 马永强
通讯作者:
Abstract: Latent Semantic Indexing (LSI) is a new document retrieval model that has been developed during the last ten years. It is easy to compute and requires less human intervention. Term weighting, which is a difficult problem and of great importance in LSI, was studied in detail. In view of the most popular term weighting algorithms, TF-IDF, which is unreasonable to make use of linear and unable to emphasize the significance of key terms which contribute mainly to the content of a text, a new weighting design based on Sigmiod function and location factor was proposed. The new method highlights the importance of the different terms in documents and is in more favor of constructing the latent semantic space. It was tested in the experimental platform named "Chinese LSI Retrieval Analysis System", and the results show that the new method enhances the performance of LSI information retrieve.
Key words: Latent Semantic Indexing, Sigmiod function, location factor, weighting algorithms
摘要: 潜在语义索引具有可计算性强,需要人参与少等优点。对其中重要的优化过程--权重计算,进行了深入分析。针对目前应用最广泛的TF-IDF方法中,采用线性处理的不合理性以及难以突出对文本内容起关键性作用的特征的缺点,提出了一种基于"Sigmiod函数"和"位置因子"的新权重方案。突出了文本中不同特征词的重要程度,更有利于潜在语义空间的构造。通过实验平台"中文潜在语义索引分析系统"的测试结果表明,该权重方法更利于基于潜在语义的检索性能的提高。
关键词: 潜在语义索引, Sigmiod函数, 位置因子, 权重算法
Yuan-yuan LI Yong-qiang MA. Text term weighting approach based on latent semantic indexing[J]. Journal of Computer Applications.
李媛媛 马永强. 基于潜在语义索引的文本特征词权重计算方法[J]. 计算机应用.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/
http://www.joca.cn/EN/Y2008/V28/I6/1460