基于染色质免疫共沉淀的高通量测序数据集的 顺式调控模体发现算法

doi:10.11772/j.issn.1001-9081.2017112749

计算机应用 ›› 2018, Vol. 38 ›› Issue (6): 1826-1830.DOI: 10.11772/j.issn.1001-9081.2017112749

• 应用前沿、交叉与综合 • 上一篇

基于染色质免疫共沉淀的高通量测序数据集的顺式调控模体发现算法

冯艳霞, 张志红, 张少强

天津师范大学计算机与信息工程学院, 天津 300387

收稿日期:2017-11-22 修回日期:2018-01-16 发布日期:2018-06-13 出版日期:2018-06-10
通讯作者: 张少强
作者简介:冯艳霞(1991-),女,山西吕梁人,硕士研究生,CCF会员,主要研究方向:生物信息计算;张志红(1991-),女,河南周口人,硕士研究生,主要研究方向:生物信息计算;张少强(1976-),男,天津人,教授,博士,CCF会员,主要研究方向:生物信息计算。
基金资助:
国家自然科学基金资助项目（61572358）；天津自然科学基金资助项目（16JCYBJC23600）。

Cis-regulatory motif finding algorithm in chromatin immunoprecipitation sequencing datasets

FENG Yanxia, ZHANG Zhihong, ZHANG Shaoqiang

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

Received:2017-11-22 Revised:2018-01-16 Online:2018-06-13 Published:2018-06-10
Supported by:
This work is partially supported by the National Natural Science Foundation of China (61572358), the Natural Science Foundation of Tianjin (16JCYBJC23600).

摘要/Abstract

摘要： 针对新一代测序（NGS）的染色质免疫共沉淀的高通量测序（ChIP-Seq）数据集的模体发现问题，提出一种基于费舍尔（Fisher）精确检验的模体发现算法——FisherNet。首先运用费舍尔精确检验计算所有k长短序的P值并筛选出模体的种子；然后，构建初始模体的位置赋权矩阵；最后，用位置赋权矩阵扫描所有k长短序形成最终模体。通过小鼠胚胎干细胞（mESC）和红细胞、人类淋巴母细胞系的ChIP-Seq数据集以及ENCODE数据库的数据进行验证，结果表明所提算法精度和计算速度均高于其他常见的模体发现算法，并且能够发现超过80%的已知转录因子核心模体及其辅调控因子模体。该算法在保证高精度的同时可以应用到大规模测序数据集。

关键词: 模体发现算法, 顺式调控, 真核生物, 染色质免疫共沉淀的高通量测序, 转录因子

Abstract: Aiming at the motif finding problem in Chromatin Immunoprecipitation Sequencing (ChIP-Seq) datasets of Next-Generation Sequencing (NGS), a new motif finding algorithm based on Fisher's exact test, called FisherNet, was proposed. Firstly, Fisher's exact test was used to calculate the P values of all k-mers, some of which were selected as motif seeds. Secondly, the position weight matrix of the initial motif was constructed. Finally, the position weight matrix was employed to scan all k-mers for obtaining the final motif. The ChIP-Seq datasets of mouse Embryonic Stem cells (mESC), mouse erythrocytes, human lymphoblastic lines and the ENCODE database were used for verifying. The verification results show that, the accuracy and calculation speed of the proposed algorithm are higher than those of other common motif finding algorithms, and it can find more than 80% of core motifs for known transcription factors and their co-factors. The proposed algorithm can be applied to large-scale sequencing datasets while ensuring high accuracy.

Key words: motif finding algorithm, cis-regulatory, eukaryote, Chromatin Immunoprecipitation Sequencing (ChIP-Seq), transcription factor

中图分类号:

TP301.6

冯艳霞, 张志红, 张少强. 基于染色质免疫共沉淀的高通量测序数据集的顺式调控模体发现算法[J]. 计算机应用, 2018, 38(6): 1826-1830.

FENG Yanxia, ZHANG Zhihong, ZHANG Shaoqiang. Cis-regulatory motif finding algorithm in chromatin immunoprecipitation sequencing datasets[J]. Journal of Computer Applications, 2018, 38(6): 1826-1830.

参考文献

[1] LIU X S, BRUTLAG D L, LIU J S. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments[J]. Nature Biotechnology, 2002, 20(8):835-839.
[2] ZAMBELLI F, PESOLE G, PAVESI G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era[J]. Briefings in Bioinformatics, 2013, 14(2):225-237.
[3] BAILEY T L, WILLIAMS N, MISLEH C, et al. MEME:discovering and analyzing DNA and protein sequence motifs[J]. Nucleic Acids Research, 2006, 34(Web Server issue):W369-W373.
[4] THOMAS-CHOLLIER M, DARBO E, HERRMANN C, et al. A complete workflow for the analysis of full-size ChIP-Seq (and similar) data sets using peak-motifs[J]. Nature Protocols, 2012, 7(8):1551-1568.
[5] PAVESI G, MEREGHETTI P, MAURI G, et al. Weeder Web:discovery of transcription factor binding sites in a set of sequences from co-regulated genes[J]. Nucleic Acids Research, 2004, 32(Web Server issue):W199-W203.
[6] QU H Z, FANG X D. A brief review on the human encyclopedia of DNA elements (ENCODE) project[J]. Genomics, Proteomics & Bioinformatics, 2013, 11(3):135-141.
[7] WILBANKS E G, FACCIOTTI M T. Evaluation of algorithm performance in ChIP-seq peak detection[J]. PLoS One, 2010, 5(7):e11471.
[8] YAO Z, MACQUARRIE K L, FONG A P, et al. Discriminative motif analysis of high-throughput dataset[J]. Bioinformatics, 2014, 30(6):775-783.
[9] DOWN T A, HUBBARD TJ. NestedMICA:sensitive inference of over-represented motifs in nucleic acid sequence[J]. Nucleic Acids Research, 2005, 33(5):1445-1453.
[10] BAILEY T L. DREME:motif discovery in transcription factor ChIP-Seq data[J]. Bioinformatics, 2011, 27(12):1653-1659.
[11] LINHART C, HALPERIN Y, SHAMIR R. Transcription factor and microRNA motif discovery:the Amadeus platform and a compendium of metazoan target sets[J]. Genome Research, 2008, 18(7):1180-1189.
[12] ETTWILLER L, PATEN B, RAMIALISON M, et al. Trawler:de novo regulatory motif discovery pipeline for chromatin immuneprecipitation[J]. Nature Methods, 2007, 4(7):563-565.
[13] CHEN X, XU H, YUAN P, et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells[J]. Cell, 2008, 133(6):1106-1117.
[14] PEVNY L, SIMON M C, ROBERTSON E, et al. Erythroid differentiation in chimaeric mice blocked by a targeted mutation in the gene for transcription factor GATA-1[J]. Nature, 1991, 349(6306):257-260.
[15] TALLACK M R, WHITINGTON T, YUEN W S, et al. A global role for KLF1 in erythropoiesis revealed by ChIP-seq in primary erythroid cells[J]. Genome Research, 2010, 20(8):1052-1063.
[16] DUNHAM I, KUNDAJE A, ALDRED S F, et al. An integrated encyclopedia of DNA elements in the human genome[J]. Nature, 2012, 89(7414):57-74.
[17] HSU D, KAKADE S M, ZHANG T. A spectral algorithm for learning hidden Markov models[J]. Journal of Computer and System Sciences, 2012, 78(5):1460-1480.
[18] GHANDI M, MOHAMMAD-NOORI M, BEER M A. Robust k-mer frequency estimation using gapped k-mers[J]. Journal of Mathematical Biology, 2014, 69(2):469-500.
[19] MCLEAY R C, BAILEY T L. Motif enrichment analysis:a unified framework and an evaluation on ChIP data[EB/OL].[2017-10-16]. https://core.ac.uk/download/pdf/81051776.pdf.
[20] TOUZET H, VARRE J S. Efficient and accurate P-value computation for position weight matrices[EB/OL].[2017-10-16]. https://almob.biomedcentral.com/track/pdf/10.1186/1748-7188-2-15.
[21] SCHONES D E, SUMAZIN P, ZHANG M Q. Similarity of position frequency matrices for transcription factor binding sites[J], Bioinformatics, 2005, 21(3):307-313.
[22] KORHONEN J, MARTINMAKI P, PIZZI C, et al. MOODS:fast search for position weight matrix matches in DNA sequences[J]. Bioinformatics, 2009, 25(23):3181-3182.
[23] GUPTA S, STAMATOYANNOPOULOS J A, BAILEY T L, et al. Quantifying similarity between motifs[J]. Genome Biology, 2007, 8(2):Article R24.

基于染色质免疫共沉淀的高通量测序数据集的顺式调控模体发现算法

Cis-regulatory motif finding algorithm in chromatin immunoprecipitation sequencing datasets

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	侯阳张琼赵紫煊朱正宇张晓博. 基于YOLOv5s的复杂场景下高效烟火检测算法——YOLOv5s-MRD[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	张奇业, 曾心蕊. 带高斯核的支持向量数据描述问题的高效积极集法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3808-3814.
[3]	曹晓意罗煦琼李景贺恩锋. 改进人工势场法下的多无人机编队路径规划方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	袁志超杨磊田井林魏晓威李康顺. 面向复杂约束多目标优化问题的双种群双阶段进化算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	冷琴, 毛政元. 考虑设施规模决策的两级选址-路径优化[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3513-3520.
[6]	彭庆媛, 王晓峰, 王军霞, 华盈盈, 唐傲, 何飞. 可满足性问题相变研究综述[J]. 《计算机应用》唯一官方网站, 2024, 44(11): 3503-3512.
[7]	孙仁科, 皇甫志宇, 陈虎, 李仲年, 许新征. 神经架构搜索综述[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 2983-2994.
[8]	孙安泰, 刘烨, 徐冬梅. 多智能体系统的动态面渐近补偿算法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3151-3157.
[9]	颜超英, 张紫仪, 曲映楠, 李秋禹, 郑地翔, 孙丽珺. 基于联盟链的双向拍卖碳交易[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3240-3245.
[10]	杨志龙邹德旋李灿邵莹莹马乐杰. 融入限制反向学习与柯西-高斯变异的蜣螂优化算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[11]	姚光磊, 熊菊霞, 杨国武. 基于神经网络优化的花朵授粉算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2829-2837.
[12]	力尚龙, 刘建华, 贾鹤鸣. 融合多狩猎协调策略的爬行动物搜索算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2818-2828.
[13]	李焱, 潘大志, 郑思情. 多车场带时间窗车辆路径问题的改良自适应大邻域搜索算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1897-1904.
[14]	张倩婷胡丽莹陈黎飞. 时间序列的鲁棒形态表征方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	胡林波倪志伟程家乐刘文涛朱旭辉. 基于融合社区检测的复杂协作众包任务分配方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.

基于染色质免疫共沉淀的高通量测序数据集的 顺式调控模体发现算法

Cis-regulatory motif finding algorithm in chromatin immunoprecipitation sequencing datasets

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

基于染色质免疫共沉淀的高通量测序数据集的顺式调控模体发现算法