计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1826-1830.DOI: 10.11772/j.issn.1001-9081.2017112749
• 应用前沿、交叉与综合 • 上一篇
收稿日期:
2017-11-22
修回日期:
2018-01-16
出版日期:
2018-06-10
发布日期:
2018-06-13
通讯作者:
张少强
作者简介:
冯艳霞(1991-),女,山西吕梁人,硕士研究生,CCF会员,主要研究方向:生物信息计算;张志红(1991-),女,河南周口人,硕士研究生,主要研究方向:生物信息计算;张少强(1976-),男,天津人,教授,博士,CCF会员,主要研究方向:生物信息计算。
基金资助:
FENG Yanxia, ZHANG Zhihong, ZHANG Shaoqiang
Received:
2017-11-22
Revised:
2018-01-16
Online:
2018-06-10
Published:
2018-06-13
Supported by:
摘要: 针对新一代测序(NGS)的染色质免疫共沉淀的高通量测序(ChIP-Seq)数据集的模体发现问题,提出一种基于费舍尔(Fisher)精确检验的模体发现算法——FisherNet。首先运用费舍尔精确检验计算所有k长短序的P值并筛选出模体的种子;然后,构建初始模体的位置赋权矩阵;最后,用位置赋权矩阵扫描所有k长短序形成最终模体。通过小鼠胚胎干细胞(mESC)和红细胞、人类淋巴母细胞系的ChIP-Seq数据集以及ENCODE数据库的数据进行验证,结果表明所提算法精度和计算速度均高于其他常见的模体发现算法,并且能够发现超过80%的已知转录因子核心模体及其辅调控因子模体。该算法在保证高精度的同时可以应用到大规模测序数据集。
中图分类号:
冯艳霞, 张志红, 张少强. 基于染色质免疫共沉淀的高通量测序数据集的 顺式调控模体发现算法[J]. 计算机应用, 2018, 38(6): 1826-1830.
FENG Yanxia, ZHANG Zhihong, ZHANG Shaoqiang. Cis-regulatory motif finding algorithm in chromatin immunoprecipitation sequencing datasets[J]. Journal of Computer Applications, 2018, 38(6): 1826-1830.
[1] LIU X S, BRUTLAG D L, LIU J S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments[J]. Nature Biotechnology, 2002, 20(8):835-839. [2] ZAMBELLI F, PESOLE G, PAVESI G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era[J]. Briefings in Bioinformatics, 2013, 14(2):225-237. [3] BAILEY T L, WILLIAMS N, MISLEH C, et al. MEME:discovering and analyzing DNA and protein sequence motifs[J]. Nucleic Acids Research, 2006, 34(Web Server issue):W369-W373. [4] THOMAS-CHOLLIER M, DARBO E, HERRMANN C, et al. A complete workflow for the analysis of full-size ChIP-Seq (and similar) data sets using peak-motifs[J]. Nature Protocols, 2012, 7(8):1551-1568. [5] PAVESI G, MEREGHETTI P, MAURI G, et al. Weeder Web:discovery of transcription factor binding sites in a set of sequences from co-regulated genes[J]. Nucleic Acids Research, 2004, 32(Web Server issue):W199-W203. [6] QU H Z, FANG X D. A brief review on the human encyclopedia of DNA elements (ENCODE) project[J]. Genomics, Proteomics & Bioinformatics, 2013, 11(3):135-141. [7] WILBANKS E G, FACCIOTTI M T. Evaluation of algorithm performance in ChIP-seq peak detection[J]. PLoS One, 2010, 5(7):e11471. [8] YAO Z, MACQUARRIE K L, FONG A P, et al. Discriminative motif analysis of high-throughput dataset[J]. Bioinformatics, 2014, 30(6):775-783. [9] DOWN T A, HUBBARD TJ. NestedMICA:sensitive inference of over-represented motifs in nucleic acid sequence[J]. Nucleic Acids Research, 2005, 33(5):1445-1453. [10] BAILEY T L. DREME:motif discovery in transcription factor ChIP-Seq data[J]. Bioinformatics, 2011, 27(12):1653-1659. [11] LINHART C, HALPERIN Y, SHAMIR R. Transcription factor and microRNA motif discovery:the Amadeus platform and a compendium of metazoan target sets[J]. Genome Research, 2008, 18(7):1180-1189. [12] ETTWILLER L, PATEN B, RAMIALISON M, et al. Trawler:de novo regulatory motif discovery pipeline for chromatin immuneprecipitation[J]. Nature Methods, 2007, 4(7):563-565. [13] CHEN X, XU H, YUAN P, et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells[J]. Cell, 2008, 133(6):1106-1117. [14] PEVNY L, SIMON M C, ROBERTSON E, et al. Erythroid differentiation in chimaeric mice blocked by a targeted mutation in the gene for transcription factor GATA-1[J]. Nature, 1991, 349(6306):257-260. [15] TALLACK M R, WHITINGTON T, YUEN W S, et al. A global role for KLF1 in erythropoiesis revealed by ChIP-seq in primary erythroid cells[J]. Genome Research, 2010, 20(8):1052-1063. [16] DUNHAM I, KUNDAJE A, ALDRED S F, et al. An integrated encyclopedia of DNA elements in the human genome[J]. Nature, 2012, 89(7414):57-74. [17] HSU D, KAKADE S M, ZHANG T. A spectral algorithm for learning hidden Markov models[J]. Journal of Computer and System Sciences, 2012, 78(5):1460-1480. [18] GHANDI M, MOHAMMAD-NOORI M, BEER M A. Robust k-mer frequency estimation using gapped k-mers[J]. Journal of Mathematical Biology, 2014, 69(2):469-500. [19] MCLEAY R C, BAILEY T L. Motif enrichment analysis:a unified framework and an evaluation on ChIP data[EB/OL].[2017-10-16]. https://core.ac.uk/download/pdf/81051776.pdf. [20] TOUZET H, VARRE J S. Efficient and accurate P-value computation for position weight matrices[EB/OL].[2017-10-16]. https://almob.biomedcentral.com/track/pdf/10.1186/1748-7188-2-15. [21] SCHONES D E, SUMAZIN P, ZHANG M Q. Similarity of position frequency matrices for transcription factor binding sites[J], Bioinformatics, 2005, 21(3):307-313. [22] KORHONEN J, MARTINMAKI P, PIZZI C, et al. MOODS:fast search for position weight matrix matches in DNA sequences[J]. Bioinformatics, 2009, 25(23):3181-3182. [23] GUPTA S, STAMATOYANNOPOULOS J A, BAILEY T L, et al. Quantifying similarity between motifs[J]. Genome Biology, 2007, 8(2):Article R24. |
[1] | 平凡, 汤小春, 潘彦宇, 李战怀. 不规则任务在图形处理器集群上的调度策略[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3295-3301. |
[2] | 雷鹰, 郑万波, 魏嵬, 夏云霓, 李晓波, 刘诚武, 谢洪. 基于概率性能感知演化博弈策略的“云+边”混合环境中任务卸载方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3302-3308. |
[3] | 乔钢柱, 王瑞, 孙超利. 基于分解的高维多目标改进进化算法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3097-3103. |
[4] | 邹复民, 罗思杰, 陈志辉, 廖律超. 基于轨迹数据的出租车交接班时空分布识别方法[J]. 《计算机应用》唯一官方网站, 2021, 41(11): 3376-3384. |
[5] | 肖智豪 胡志华 朱琳. 求解冷链物流时间依赖型车辆路径问题的混合自适应大邻域搜索算法[J]. 计算机应用, 0, (): 0-0. |
[6] | 沙林秀,聂凡,高倩,孟号. 基于布朗运动与梯度信息的交替优化算法[J]. 计算机应用, 0, (): 0-0. |
[7] | 董永峰 孙跃华 高立超 韩鹏 季海鹏. 基于改进一维卷积和双向长短期记忆神经网络的故障诊断方法[J]. 计算机应用, 0, (): 0-0. |
[8] | 李大海 刘庆腾 艾志刚 王振东. 基于动态D向分割和混沌扰动的阴阳对算法[J]. 计算机应用, 0, (): 0-0. |
[9] | 朱诚 潘旭华 张勇. 基于趋化校正的哈里斯鹰优化算法[J]. 计算机应用, 0, (): 0-0. |
[10] | 杨杰 张名扬 芮晓彬 王志晓. 融合节点覆盖范围和结构洞的影响力最大化算法[J]. 计算机应用, 0, (): 0-0. |
[11] | 汤安迪, 韩统, 徐登武, 谢磊. 混沌精英哈里斯鹰优化算法[J]. 计算机应用, 2021, 41(8): 2265-2272. |
[12] | 李蒙蒙, 秦伟, 刘艺, 刁兴春. 结合头脑风暴优化的混合蚁群优化算法[J]. 计算机应用, 2021, 41(8): 2412-2417. |
[13] | 张闻强, 邢征, 杨卫东. 基于多区域采样策略的混合粒子群优化求解多目标柔性作业车间调度问题[J]. 计算机应用, 2021, 41(8): 2249-2257. |
[14] | 张祥飞, 鲁宇明, 张平生. 基于协同进化的约束多目标优化算法[J]. 计算机应用, 2021, 41(7): 2012-2018. |
[15] | 张萌, 李维华. 用户互动表示下的影响力最大化算法[J]. 计算机应用, 2021, 41(7): 1964-1969. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||