计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2188-2193.

• 数据库技术 • 上一篇    下一篇

相对行常量差异共表达双聚类挖掘算法

谢华博,尚学群,王淼   

  1. 西北工业大学 计算机学院,西安 710129
  • 收稿日期:2013-03-05 修回日期:2013-04-15 出版日期:2013-08-01 发布日期:2013-09-11
  • 通讯作者: 谢华博
  • 作者简介:谢华博(1987-),男,江西于都人,硕士研究生,主要研究方向:生物数据挖掘、差异共表达;
    尚学群(1973-),女,陕西西安人,教授,博士,主要研究方向:数据库、数据挖掘、生物信息学;
    王淼(1981-),男, 河南义马人,博士,主要研究方向:数据挖掘、生物信息学。
  • 基金资助:

    国家973计划项目;国家自然科学基金资助项目

Differential co-expression relative constant row bicluster mining algorithm

XIE Huabo,SHANG Xuequn,WANG Miao   

  1. School of Computer Science, Northwestern Polytechnical University, Xi'an Shaanxi 710129, China
  • Received:2013-03-05 Revised:2013-04-15 Online:2013-09-11 Published:2013-08-01
  • Contact: XIE Huabo

摘要: 在生物信息学上,挖掘差异共表达双聚类有助于研究衰老、癌变类变化的生物过程。以往的差异共表达双聚类定义仅仅从一组基因的角度来衡量差异,导致包含了很多噪声。为了克服上述缺点提出新的差异共表达支持度MiSupport,可以将一组基因的差异细化到基因级别;并由此定义提出MiCluster算法,可以在两个真实的基因芯片数据中挖掘最大的差异共表达双聚类。MiCluster算法首先基于两个基因芯片数据构建差异共表达权值图,然后基于权值图,采用样本扩展和层次扩展,并利用精确的候选产生方法和高效的剪枝策略,挖掘出最大的差异共表达双聚类。实验结果证明,MiCluster算法比现有的算法快速高效,而且通过均方误差(MSE)测试和基因本体(GO)评价,挖掘出来结果具有更大的统计意义和生物学意义。

关键词: 基因芯片, 基因共表达, 双聚类, 差异, 行常量

Abstract: Bioinformaticly, it is useful to study the change process of biology, such as aging and canceration, by mining differential co-expression bicluster. The definition in the past only measured from the perspective of all set of genes, thus containing a lot of noise. Therefore, a new definition named MiSupport was put forward to measure the difference on gene level, and on the basis of MiSupport, an algorithm named MiCluster was proposed to mine the maximal differential co-expression bicluster in two real gene chips. Firstly, MiCluster constructed a differential weighted undirected sample-sample relational graph in two real-valued gene expression datasets. Secondly, the maximal differential biclusters was produced in the above differential weighted undirected sample-sample relational graph with efficiently pruning techniques and accurately generating candidates method by sample-growth and level-growth. The experimental results show that MiCluster is more efficient than the existing methods. Furthermore, the performance is evaluated by Mean Square Error (MSE) score and Gene Ontology (GO). The results show that this algorithm can find better statistical and biological significance.

Key words: gene chip, gene co-expression, bicluster, differential, constant row

中图分类号: