Abstract:There are two inefficient steps in GEP-Cluster algorithm: one is screening and aggregation of clustering centers and the other is the calculation of distance between data objects and clustering centers. To solve the inefficiency, an auto-clustering algorithm based on Compute Unified Device Architecture (CUDA) and Gene Expression Programming (GEP), named as CGEP-Cluster, was proposed. Specifically, the screening, and aggregation of clustering center step was improved by Gene Read & Compute Machine (GRCM) method, and CUDA was used to parallel the calculation of distance between data objects and clustering centers. The experimental results show that compared with GEP-Cluster algorithm, CGEP-Cluster algorithm can speed up by almost eight times when the scale of data objects is large. CGEP-Cluster can be used to implement automatic clustering with the clustering number unknown and large data object scale.