计算机应用 ›› 2005, Vol. 25 ›› Issue (06): 1388-1391.DOI: 10.3724/SP.J.1087.2005.1388

• 数据库与数据挖掘 • 上一篇    下一篇

一种用于基因表达数据的无参数聚类算法

赵宇海1,2,王国仁1,印莹1   

  1. 1.东北大学计算机科学与工程系; 2.鞍山师范学院数学与计算机系
  • 发布日期:2011-04-06 出版日期:2005-06-01
  • 基金资助:

    国家自然科学基金资助项目(60273079,60473074)

Non-parameter clustering method for gene expression data

ZHAO Yu-hai1,2, WANG Guo-ren1,YIN Ying1   

  1. 1.Department of Computer Science and Engineering, Northeastern University, Shenyang Liaoning 110004, China; 2. Department of Mathematics, Anshan Normal University, Anshan Liaoning 114005,China
  • Online:2011-04-06 Published:2005-06-01

摘要: 提出了一种用于基因表达数据的无参数聚类算法。该算法把多维数据的模糊聚类方法与CTWC相结合,并引入基于范数的方法进一步对该方法加以改进和论证。将该算法应用于真实的结肠癌基因表达数据集,确定了含8个基因的特征基因组合,该特征基因组合不仅达到了90%左右的结肠癌样本识别率,还能鉴别结肠癌样本的亚型。实验结果充分验证了这种算法的可行性。

关键词: 基因表达数据, 双向聚类, 模糊聚类, 范数, 无参数聚类

Abstract: This paper proposed a new non-parametric algorithm for clustering gene expression data. This algorithm combined the fuzzy clustering of multi-dimensional data with CTWC. Furthermore, it introduced the norm-based method to improve and prove reasonable. The colon tumor gene expression dataset was analyzed and the interesting combination of 8 genes is discovered, which could identify the colon tumor samples whih 90% accuracy as well as the subtypes of the colon tumor. Experiments were proved the feasibility of the method.

Key words: gene expression data, two-way clustering, fuzzy clustering, norm, non-parametric clustering

中图分类号: