基于MapReduce的并行化模糊划分算法

doi:10.11772/j.issn.1001-9081.2014.11.3073

计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3073-3077.DOI: 10.11772/j.issn.1001-9081.2014.11.3073

• 2014年全国开放式分布与并行计算学术年会(DPCS 2014)论文 • 上一篇下一篇

基于MapReduce的并行化模糊划分算法

张广蓉¹,陈庆奎¹,²,章刚²,赵海燕¹,高丽萍¹,霍欢¹

1. 上海理工大学光电信息与计算机工程学院,上海 200093;
2. 上海理工大学管理学院,上海 200093

收稿日期:2014-07-23 修回日期:2014-08-01 出版日期:2014-11-01 发布日期:2014-12-01
通讯作者: 张广蓉
作者简介:张广蓉(1989-),女,上海人,硕士研究生,主要研究方向:大规模数据处理;陈庆奎(1966-),男,上海人,教授,博士生导师,主要研究方向:网络计算、云计算、并行计算;章刚(1981-),男,上海人,博士研究生,主要研究方向:网络计算;赵海燕(1975-),女,河南温县人,副教授,主要研究方向:服务计算;高丽萍(1980-),女,山东烟台人,副教授,主要研究方向:计算机支持协同工作与协同计算;霍欢(1979-),女,辽宁沈阳人,副教授,主要研究方向:XML数据流。
基金资助:
国家自然科学基金资助项目;高等学校博士学科点专项科研博导基金;上海教委创新重点项目;上海市一流学科建设项目;上海市工程中心建设项目;上海重点科技攻关项目

Parallel fuzzy partition algorithm based on MapReduce

ZHANG Guangrong¹,CHEN Qingkui¹,²,ZHANG Gang²,ZHAO Haiyan¹,GAO Liping¹,HUO Huan¹

1. School of Optical Electrical Information and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China;
2. Business School, University of Shanghai for Science and Technology, Shanghai 200093, China

Received:2014-07-23 Revised:2014-08-01 Online:2014-11-01 Published:2014-12-01
Contact: ZHANG Guangrong

摘要/Abstract

摘要：

针对大规模项目资源库中项目资源信息无序而导致无法准确快速找出项目资源库中所需资源的问题,提出了基于MapReduce的并行化模糊聚类划分算法。该算法首先抽象原始项目资源特征属性并标准化;其次,根据标准化后的特征属性建立项目相似矩阵,运用矩阵分块思想分割矩阵;然后,利用MapReduce技术处理分块矩阵并合并结果;最后,运用阈值评判划分成若干个有序的项目组。与K-means算法和遗传算法的对比实验结果证明:该算法具有较高的准确率和查全率，并且在大规模数据计算时能够得到较高的加速比,可以有效准确地划分项目资源。

Abstract:

It is difficult for users to find the needed items from a large-scale project resource repository because the project resources in it are disordered, so a parallel fuzzy partition algorithm based on MapReduce was proposed. The algorithm firstly abstracted and standardized characteristic attributes of original project resource. Then a similarity matrix was established based on the standardized characteristic attributes of the project, and it was segmented by using block matrix. MapReduce was used to process the block matrix and merge the results. Finally, the algorithm obtained the partition results according to the threshold. The contrast experiment among the proposed algorithm, K-means algorithm and genetic algorithm shows that the proposed algorithm has higher accuracy and recall, it can achieve better speedup in large-scale data calculation and divide project resources effectively and accurately.

中图分类号:

TP338.6

张广蓉陈庆奎章刚赵海燕高丽萍霍欢. 基于MapReduce的并行化模糊划分算法[J]. 计算机应用, 2014, 34(11): 3073-3077.

ZHANG Guangrong CHEN Qingkui ZHANG Gang ZHAO Haiyan GAO Liping HUO Huan. Parallel fuzzy partition algorithm based on MapReduce[J]. Journal of Computer Applications, 2014, 34(11): 3073-3077.

参考文献

[1]CHEN J,WANG S,SUN L. Flexible model and architecture of public service platform for business-related multi-industrial chain collaboration[J]. Computer Integrated Manufacturing Systems, 2011,17(1):177-185.(陈静,王淑营,孙林夫. 面向柔性的业务关联的多产业链协作公共服务平台模型和架构[J]. 计算机集成制造系统,2011,17(1):177-185.)
[2]WANG S. Integrated framework of collaborative commercial platform for manufacturing industrial chain[J]. Journal of Southwest Jiaotong University, 2008,43(5): 643-647.(王淑营. 面向制造业产业链的协同商务平台集成框架[J]. 西南交通大学学报,2008,43(5): 643-647.)
[3]AGRAWAL R, BAYARDO R J, Jr, GRUHL D, et al.Vinci: a service-oriented architecture for rapid development of Web applications [C]// Proceedings of the 10th International Conference on World Web. New York: ACM Press, 2001:355-365.
[4]ZHANG F, CHANG J, ZHOU Q. Context-aware recommendation algorithm based on fuzzy C-means clustering[J]. Journal of Computer Research and Development, 2013,50(10):2185-2194.(张付志,常俊风,周全强.基于模糊C均值聚类的环境感知推荐算法[J].计算机研究与发展,2013,50(10):2185-2194.)
[5]ZHANG M, YU J. Fuzzy partitional clustering algorithms[J]. Journal of Software, 2004,15(6):858-868.(张敏,于剑. 基于划分的模糊聚类算法[J]. 软件学报,2004,15(6):858-868.)
[6]XU X, XIAO Y. KBAC: K-means based adaptive clustering for massive dataset[J]. Journal of Chinese Computer Systems,2012,33(10):2268-3372.(徐晓旻,肖仰华. KBAC:一种基于K-means的自适应聚类[J]. 小型微型计算机系统,2012,33(10):2268-3372.)
[7]JIA R, GUAN Y, LI Y. Parallel K-means clustering algorithm based on MapReduce model[J]. Computer Engineering and Design,2014,35(2):657-660.(贾瑞玉,管玉勇,李亚龙. 基于MapReduce模型的并行遗传K-means聚类算法[J].计算机工程与设计, 2014,35(2):657-660.)
[8]LU W, DU C, WEI B, et al.Distributed affinity propagation clustering based on MapReduce[J].Journal of Computer Research and Development, 2012,49(8):1762-1772.(鲁伟明,杜晨阳,魏宝刚,等. 基于MapReduce的分布式近邻传播聚类算法 [J].计算机研究与发展,2012,49(8):1762-1772.)
[9]YU J, HUANG H. A new weighting fuzzy c-means algorithm [C]// FUZZ 2003: Proceedings of the 12th IEEE International Conference on Fuzzy Systems. Piscataway: IEEE Press, 2003, 2: 896-901.
[10]RUNKLER T A, KATZ C. Fuzzy clustering by particle swarm optimization [C]// Proceedings of the 2006 IEEE International Conference on Fuzzy Systems. Piscataway: IEEE Press 2006:601-608.
[11]ZHANG H, DING F, JIANG L. A collaborative filtering recommendation method based on fuzzy clustering[J].Computer Simulation, 2005,33(12):144-147.(张海燕,丁峰,姜丽红. 基于模糊聚类的协同过滤推荐方法[J].计算机仿真,2005,33(12):144-147.)〖HJ1.75mm〗
[12]JIN C, VECCHIOLA C, BUYYA R. MRPGA: an extension of MapReduce for parallelizing genetic algorithms[C]// eScience 2008: Proceedings of the Fourth IEEE International Conference on eScience. Piscataway: IEEE Press,2008:214-221.
[13]ZHAO H, YANG S, CHEN Z, et al.Optimization of range queries and analysis for MapReduce systems[J]. Journal of Computer Research and Development, 2014,51(3): 606-617.(赵辉,杨树强,陈志坤,等. 基于MapReduce模型的范围查询分析优化技术研究[J]. 计算机研究与发展,2014,51(3): 606-617.)
[14]SUN Y, CHEN Y, GUAN X, et al.Approach of large matrix multiplication based on Hadoop[J]. Journal of Computer Applications, 2013,33(12):3339-3344,3358.(孙远帅,陈垚,官新均,等. 基于Hadoop的大矩阵乘法处理方法[J].计算机应用, 2013,33(12):3339-3344,3358.)
[15]LIU B, CHEN Q. Fuzzy clustering partition model for computer cluster in cloud computing[J]. Computer Science, 2011,38(10): 157-160,168.(刘伯成,陈庆奎. 云计算中的集群资源模糊聚类划分模型[J]. 计算机科学,2011,38(10): 157-160,168.)
[16]WU H, WANG X, CHENG Y, et al.Advanced recommendation based on collaborative filtering and partition clustering[J].Journal of Computer Research and Development, 2011,48(S3): 205-212.(吴泓辰,王新军,成勇,等. 基于协同过滤与划分聚类的改进推荐算法[J]. 计算机研究与发展,2011,48(增刊3): 205-212.)
[17]XIAO Y, YU J. Semi-supervised clustering based on affinity propagation algorithm[J].Journal of Software, 2008,19(11):2803-2813.(肖宇,于剑. 基于近邻传播算法的半监督聚类[J].软件学报,2008,19(11):2803-2813.)

基于MapReduce的并行化模糊划分算法

Parallel fuzzy partition algorithm based on MapReduce

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	杨先凤, 贵红军, 傅春常. 统一计算设备架构下的F-X域预测滤波并行算法[J]. 计算机应用, 2021, 41(2): 486-491.
[2]	李宏梅杨天国张磊莫瑞超许小龙徐占洋. 云环境下面向数据密集型应用的容错性资源配置方法[J]. 计算机应用, 0, (): 0-0.
[3]	龚道永, 宋长明, 刘沙, 漆锋滨. 超级计算机网络引导技术研究与分析[J]. 计算机应用, 2019, 39(6): 1577-1582.
[4]	伏舒存, 付章杰, 邢国稳, 刘庆祥, 许小龙. 移动边缘环境下面向工作流管理的计算迁移方法[J]. 计算机应用, 2019, 39(5): 1523-1527.
[5]	赵瑞祥, 郑凯, 刘垚, 王肃, 刘艳, 沈焕学, 周谦豪. 基于申威众核处理器的混合并行遗传算法[J]. 计算机应用, 2017, 37(9): 2518-2523.
[6]	付晨, 钟诚, 叶波. MapReduce并行加速数据流多模式相似性搜索[J]. 计算机应用, 2017, 37(1): 37-41.
[7]	薛胜军, 胡敏达, 许小龙. 云环境下公平性优化的资源分配方法[J]. 计算机应用, 2016, 36(10): 2686-2691.
[8]	薛胜军, 邱爽, 许小龙. 云环境下能耗感知的公平性提升资源调度策略[J]. 计算机应用, 2016, 36(10): 2692-2697.
[9]	陈诚, 战荫伟, 李鹰. 基于网页链接分类的PageRank并行算法[J]. 计算机应用, 2015, 35(1): 48-52.
[10]	刘智翔宋安平徐磊郑汉垣张武. 多重网格格子Boltzmann方法的并行算法[J]. 计算机应用, 2014, 34(11): 3065-3068.
[11]	伍世刚钟诚. 融合遗传和蚁群算法并行求解最短公共超串[J]. 计算机应用, 2014, 34(7): 1857-1861.
[12]	邓亮徐传福刘巍张理论. 交替方向隐式CFD解法器的GPU并行计算及其优化[J]. 计算机应用, 2013, 33(10): 2783-2786.
[13]	柯琦钟诚陈清媛陆向艳. 多核机群上通信高效的整数序列并行排序方法[J]. 计算机应用, 2013, 33(03): 821-824.
[14]	刘霞贾智平. 基于软实时多处理器系统的动态低功耗算法[J]. 计算机应用, 2007, (12): 3126-3128.
[15]	刘伟辉唐鹏宋安平刘智翔徐磊张武. 基于可视化库和信息传递接口云图显示的并行化处理[J]. , 0, (): 0-0.