Exploratory text mining algorithm based on high-dimensional clustering

doi:10.3724/SP.J.1087.2013.00988

Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (04): 988-990.DOI: 10.3724/SP.J.1087.2013.00988

• Artificial intelligence • Previous Articles Next Articles

Exploratory text mining algorithm based on high-dimensional clustering

ZHANG Aike,FU Baolong

Electronic Information Engineering Department，Liuzhou Vocational Technological College, Liuzhou Guangxi 545006, China

Received:2012-11-05 Revised:2012-11-29 Online:2013-04-23 Published:2013-04-01
Contact: ZHANG Aike

基于高维聚类的探索性文本挖掘算法

张爱科,符保龙

柳州职业技术学院电子信息工程系，广西柳州 545006

通讯作者: 张爱科
作者简介:张爱科 (1973-)，女（壮族），广西贵港人，副教授，主要研究方向：数据挖掘、演化计算；符保龙（1978-），男（壮族），广西龙州人，副教授，主要研究方向：数据挖掘、演化计算。
基金资助:
广西教育厅科研项目基金资助项目(201106LX745，201204LX593)

Abstract

Abstract: Because of the unstructured characteristics of free text, text mining becomes an important branch of data mining. In recent years, types of text mining algorithms emerged in large numbers. In this paper, an exploratory text mining algorithm was proposed based on high-dimensional clustering. The algorithm required only a small number of iterations to produce favorable clusters from very large text. Mapping to other recorded data and recording the text to the user group enabled the result of the algorithm be improved further. The feasibility and validity of the proposed method is verified by related data test and the analysis of experimental results.

Key words: free text, high-dimensional clustering, data coverage, text mining, data mining

摘要： 建立了一种基于高维聚类的探索性文本挖掘算法，利用文本挖掘的引导作用实现数据类文本中的数据挖掘。算法只需要少量迭代，就能够从非常大的文本集中产生良好的集群；映射到其他数据与将文本记录到用户组，能进一步提高算法的结果。通过对相关数据的测试以及实验结果的分析，证实了该方法的可行性与有效性。

关键词: 自由文本, 高维聚类, 数据覆盖, 文本挖掘, 数据挖掘

ZHANG Aike FU Baolong. Exploratory text mining algorithm based on high-dimensional clustering[J]. Journal of Computer Applications, 2013, 33(04): 988-990.

张爱科符保龙. 基于高维聚类的探索性文本挖掘算法[J]. 计算机应用, 2013, 33(04): 988-990.

References

［1］张学冰．Web 数据挖掘中 XML应用及关联算法改进［D］. 济南：山东大学，2008.

［2］马强，陶导，钱卫宁，等．基于图模型的 Web 数据分析性查询语言［J］. 广西师范大学学报: 自然科学版, 2009, 27(1): 121-124.

［3］李健，徐超，谭守标．一种 Web 数据挖掘系统的设计和研究［J］.计算机技术与发展，2009,19(2): 70-73.

［4］杨科，赖朝安，赵阳．基于XML数据的FP-Growh算法挖掘研究［J］. 计算机工程与应用，2008, 44(19): 150-152.

［5］杨云，罗艳霞．FP-Growth算法的改进［J］. 计算机工程与设计，2010, 31(7): 1506-1509.

［6］WALMSLEY P. XQuery权威指南［M］. 王银辉，译. 北京：电子工业出版社，2009.

［7］AGRAWAL R, IMIELINSKI T, SWAMI A. Mining association rules between sets of items in large databases［C］// Proceedings of the ACM SIGMOD Conference on Management of data. New York: ACM Press, 2012: 207-216.

［8］潘有能, 邓三鸿. 基于 XML 和关联规则的Web挖掘研究［J］.现代图书情报技术，2004,112(7): 30-34.

［9］巩知乐, 张德贤, 胡明明. 一种改进的支持向量机的文本分类算法［J］. 计算机学报, 2009, 26(7): 165-168.

［10］XU B, GUAN Q, CHEN K. Multi-Agent coalition formation based on quantum-behaved particle swarm optimization ［J］. Journal of Information & Computational Science, 2010,7(5):1059-1064.

［11］周戈. 一种基于反向文本频率互信息的文本挖掘算法研究［J］. 计算机应用研究, 2012, 29(2): 487-489.

［12］曲武, 隋海峰, 杨炳儒,等. 分布式数据流挖掘的研究进展［J］. 计算机科学, 2012, 39(1): 1-36.

［13］刘全升,姚天昉. 基于关联度模型的文本倾向性检索研究［J］.中文信息学报,2011,25(1):15-19.

［14］于海群,刘万军,邱云飞. 基于用户话题偏好的社会网络二级人脉推荐［J］.计算机应用, 2012, 32(5):1366-1370.

［15］马素琴,施化吉. 阈值优化的文本密度聚类算法［J］.计算机工程与应用,2011,47(17):134-136.

[1]	Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072.
[2]	Yao DONG, Yixue FU, Yongfeng DONG, Jin SHI, Chen CHEN. Survey of incomplete multi-view clustering [J]. Journal of Computer Applications, 2024, 44(6): 1673-1682.
[3]	Keshuai YANG, Youxi WU, Meng GENG, Jingyu LIU, Yan LI. Top-k high average utility sequential pattern mining algorithm under one-off condition [J]. Journal of Computer Applications, 2024, 44(2): 477-484.
[4]	Haodong ZHENG, Hua MA, Yingchao XIE, Wensheng TANG. Knowledge tracing model based on graph neural network blending with forgetting factors and memory gate [J]. Journal of Computer Applications, 2023, 43(9): 2747-2752.
[5]	Hua JIANG, Xing LI, Huijiao WANG, Jinghai WEI. Cross-level high utility itemsets mining algorithm based on data index structure [J]. Journal of Computer Applications, 2023, 43(7): 2200-2208.
[6]	Shuo HUANG, Yanhui LI, Jianqiu CAO. PrivSPM： frequent sequential pattern mining algorithm under local differential privacy [J]. Journal of Computer Applications, 2023, 43(7): 2057-2064.
[7]	Chaoshuai QI, Wensi HE, Yi JIAO, Yinghong MA, Wei CAI, Suping REN. Survey on anomaly detection algorithms for unmanned aerial vehicle flight data [J]. Journal of Computer Applications, 2023, 43(6): 1833-1841.
[8]	Yuanjiang LI, Jinsheng QUAN, Yangyi TAN, Tian YANG. Attribute reduction for high-dimensional data based on bi-view of similarity and difference [J]. Journal of Computer Applications, 2023, 43(5): 1467-1472.
[9]	Xiaomeng SHAO, Meng ZHANG. Temporal convolutional knowledge tracing model with attention mechanism [J]. Journal of Computer Applications, 2023, 43(2): 343-348.
[10]	Wenquan LI, Yimin MAO, Xindong PENG. Agglomerative hierarchical clustering algorithm based on hesitant fuzzy set [J]. Journal of Computer Applications, 2023, 43(12): 3755-3763.
[11]	Jun WU, Aijia OUYANG, Lin ZHANG. Statistically significant sequential patterns mining algorithm under influence degree [J]. Journal of Computer Applications, 2022, 42(9): 2713-2721.
[12]	Shunkun YU, Hongxu YAN. Heuristic attribute value reduction model based on certainty factor [J]. Journal of Computer Applications, 2022, 42(2): 469-474.
[13]	LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572.
[14]	LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499.
[15]	CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184.

Exploratory text mining algorithm based on high-dimensional clustering

基于高维聚类的探索性文本挖掘算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics