计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3073-3077.DOI: 10.11772/j.issn.1001-9081.2014.11.3073

• 2014年全国开放式分布与并行计算学术年会(DPCS 2014)论文 • 上一篇    下一篇

基于MapReduce的并行化模糊划分算法

张广蓉1,陈庆奎1,2,章刚2,赵海燕1,高丽萍1,霍欢1   

  1. 1. 上海理工大学 光电信息与计算机工程学院,上海 200093;
    2. 上海理工大学 管理学院,上海 200093
  • 收稿日期:2014-07-23 修回日期:2014-08-01 出版日期:2014-11-01 发布日期:2014-12-01
  • 通讯作者: 张广蓉
  • 作者简介:张广蓉(1989-),女,上海人,硕士研究生,主要研究方向:大规模数据处理;陈庆奎(1966-),男,上海人,教授,博士生导师,主要研究方向:网络计算、云计算、并行计算;章刚(1981-),男,上海人,博士研究生,主要研究方向:网络计算;赵海燕(1975-),女,河南温县人,副教授,主要研究方向:服务计算;高丽萍(1980-),女,山东烟台人,副教授,主要研究方向:计算机支持协同工作与协同计算;霍欢(1979-),女,辽宁沈阳人,副教授,主要研究方向:XML数据流。
  • 基金资助:

    国家自然科学基金资助项目;高等学校博士学科点专项科研博导基金;上海教委创新重点项目;上海市一流学科建设项目;上海市工程中心建设项目;上海重点科技攻关项目

Parallel fuzzy partition algorithm based on MapReduce

ZHANG Guangrong1,CHEN Qingkui1,2,ZHANG Gang2,ZHAO Haiyan1,GAO Liping1,HUO Huan1   

  1. 1. School of Optical Electrical Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;
    2. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China
  • Received:2014-07-23 Revised:2014-08-01 Online:2014-11-01 Published:2014-12-01
  • Contact: ZHANG Guangrong

摘要:

针对大规模项目资源库中项目资源信息无序而导致无法准确快速找出项目资源库中所需资源的问题,提出了基于MapReduce的并行化模糊聚类划分算法。该算法首先抽象原始项目资源特征属性并标准化;其次,根据标准化后的特征属性建立项目相似矩阵,运用矩阵分块思想分割矩阵;然后,利用MapReduce技术处理分块矩阵并合并结果;最后,运用阈值评判划分成若干个有序的项目组。与K-means算法和遗传算法的对比实验结果证明:该算法具有较高的准确率和查全率,并且在大规模数据计算时能够得到较高的加速比,可以有效准确地划分项目资源。

Abstract:

It is difficult for users to find the needed items from a large-scale project resource repository because the project resources in it are disordered, so a parallel fuzzy partition algorithm based on MapReduce was proposed. The algorithm firstly abstracted and standardized characteristic attributes of original project resource. Then a similarity matrix was established based on the standardized characteristic attributes of the project, and it was segmented by using block matrix. MapReduce was used to process the block matrix and merge the results. Finally, the algorithm obtained the partition results according to the threshold. The contrast experiment among the proposed algorithm, K-means algorithm and genetic algorithm shows that the proposed algorithm has higher accuracy and recall, it can achieve better speedup in large-scale data calculation and divide project resources effectively and accurately.

中图分类号: