Journal of Computer Applications ›› 2014, Vol. 34 ›› Issue (10): 2820-2826.DOI: 10.11772/j.issn.1001-9081.2014.10.2820

Previous Articles     Next Articles

MWARM-SRCCCI :efficient algorithm for mining matrix-weighted positive and negative association rules

ZHOU Xiumei1,HUANG Mingxuan2   

  1. 1. Department of Mathematics and Computer Science, Nanning Prefecture Education College, Nanning Guangxi 530001, China;
    2. Office of Scientific Research Administration, Guangxi College of Education, Nanning Guangxi 530023, China
  • Received:2014-04-15 Revised:2014-06-12 Online:2014-10-01 Published:2014-10-30
  • Contact: HUANG Mingxuan

有效的矩阵加权正负关联规则挖掘算法——MWARM-SRCCCI

周秀梅1,黄名选2   

  1. 1. 南宁地区教育学院 数学与计算机科学系,南宁 530001
    2. 广西教育学院 科研处,南宁 530023
  • 通讯作者: 黄名选
  • 作者简介:周秀梅(1972-),女,广西上林人,副教授,主要研究方向:数据挖掘;黄名选(1966-),男,广西乐业人,教授,CCF会员,主要研究方向:数据挖掘、信息检索。
  • 基金资助:

    国家自然科学基金资助项目;广西自然科学基金资助项目;广西教育厅科研项目;广西高校优秀人才资助计划项目

Abstract:

In view of the deficiency of the existing weighted association rules mining algorithms which are not applied to deal with matrix-weighted data, a new pruning strategy of itemsets was given and the evaluation framework of matrix-weighted association patterns, SRCCCI (Support-Relevancy-Correlation Coefficient-Confidence-Interest), was introduced in this paper firstly, and then a novel mining algorithm, MWARM-SRCCCI (Matrix-Weighted Association Rules Mining based on SRCCCI), was proposed, which was used for mining matrix-weighted positive and negative patterns in databases. Using the new pruning technique and the evaluation standard of patterns, the algorithm could overcome the defects of the existing mining techniques, mine valid matrix-weighted positive and negative association rules, avoid the generation of ineffective and uninteresting patterns. Based on Chinese Web test dataset CWT200g (Chinese Web Test collection with 200GB web Pages) for the experimental data, MWARM-SRCCCI could make the biggest decline of its mining time by up to 74.74% compared with the existing no-weighted positive and negative association rules mining algorithms. The theoretical analysis and experimental results show that, the proposed algorithm has better pruning effect, which can reduce the number of candidate itemsets and mining time and improve mining efficiency markedly, and the association patterns of this algorithm can provide reliable query expansion terms for information retrieval.

摘要:

针对现有加权关联规则挖掘算法不能适用于矩阵加权数据的缺陷,给出一种新的矩阵加权项集剪枝策略,构建矩阵加权正负关联模式评价框架SRCCCI,提出一种新的基于SRCCCI评价框架的矩阵加权正负关联规则挖掘算法MWARM-SRCCCI。该算法克服了现有挖掘技术的缺陷,采用新的剪枝技术和模式评价方法,挖掘有效的矩阵加权正负关联规则,避免一些无效和无趣的模式产生。以中文Web测试集CWT200g为实验数据,与现有无加权正负关联规则挖掘算法比较,MWARM-SRCCCI算法的挖掘时间减幅最大可达74.74%。理论分析和实验结果表明,MWARM-SRCCCI算法具有较好的剪枝效果,候选项集数量和挖掘时间明显减少,挖掘效率得到极大提高,其关联模式可为信息检索提供可靠的查询扩展词来源。

CLC Number: