Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (07): 1890-1893.DOI: 10.11772/j.issn.1001-9081.2013.07.1890

• Network and distributed techno • Previous Articles     Next Articles

Auto-clustering algorithm based on compute unified device architecture and gene expression programming

DU Xin,LIU Dagang,ZHANG Kaihuo,SHEN Yuan,ZHAO Kang,NI Youcong   

  1. Faculty of Software, Fujian Normal University, Fuzhou Fujian 350108, China
  • Received:2013-01-04 Revised:2013-03-01 Online:2013-07-06 Published:2013-07-01
  • Contact: LIU Dagang

基于统一计算设备架构和基因表达式编程的自动聚类算法

杜欣,刘大刚,张开活,申远,赵康,倪友聪   

  1. 福建师范大学 软件学院, 福州 350108
  • 通讯作者: 刘大刚
  • 作者简介:杜欣(1979-),女,新疆石河子人,副教授,博士,主要研究方向:演化计算、分布式计算;刘大刚(1988-),男,山东潍坊人,硕士研究生,主要研究方向:演化计算、分布式计算;张开活(1990-),男,福建福州人,主要研究方向:分布式计算。
  • 基金资助:

    福建省自然科学基金资助项目(2011J05146,2012J01250);福建省杰出青年培育计划项目(福建省教育厅[2011]29号);福建师范大学青年骨干教师培育计划项目(fjsdjk2012083);福建省科技计划重大项目(2011H6006);武汉大学软件工程国家重点实验室开放基金资助项目(SKLSE2012-09-28);福建省教育厅科技项目(JA12077,JA12080,JB11028,JB11029)

Abstract: There are two inefficient steps in GEP-Cluster algorithm: one is screening and aggregation of clustering centers and the other is the calculation of distance between data objects and clustering centers. To solve the inefficiency, an auto-clustering algorithm based on Compute Unified Device Architecture (CUDA) and Gene Expression Programming (GEP), named as CGEP-Cluster, was proposed. Specifically, the screening, and aggregation of clustering center step was improved by Gene Read & Compute Machine (GRCM) method, and CUDA was used to parallel the calculation of distance between data objects and clustering centers. The experimental results show that compared with GEP-Cluster algorithm, CGEP-Cluster algorithm can speed up by almost eight times when the scale of data objects is large. CGEP-Cluster can be used to implement automatic clustering with the clustering number unknown and large data object scale.

Key words: Compute Unified Device Architecture (CUDA), Gene Expression Programming (GEP), clustering algorithm, GEP-cluster, evolutionary algorithm

摘要: 针对基于基因表达式编程(GEP)的自动聚类算法GEP-Cluster中聚类中心的筛选和聚合、计算数据对象到各聚类中心距离两个关键步骤效率不高的问题,提出了一种基于统一计算设备架构(CUDA)和GEP的自动聚类改进算法(CGEP-Cluster)。CGEP-Cluster算法采用基因阅读运算器方法对GEP-Cluster算法的聚类中心筛选和聚合步骤进行改进,并基于CUDA将GEP-Cluster算法中数据对象到各聚类中心距离的计算并行化。实验结果表明,在数据对象规模较大时,CGEP-Cluster算法可获得8倍左右的加速比。CGEP-Cluster算法可用于聚类数未知且数据对象规模较大情况下的自动聚类。

关键词: 统一计算设备架构, 基因表达式编程, 聚类算法, GEP-Cluster, 演化算法

CLC Number: