《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (11): 3404-3412.DOI: 10.11772/j.issn.1001-9081.2021111956
所属专题: 第九届CCF大数据学术会议(CCF Bigdata 2021)
收稿日期:
2021-11-17
修回日期:
2021-11-23
接受日期:
2021-12-06
发布日期:
2021-12-31
出版日期:
2022-11-10
通讯作者:
贾韬
作者简介:
李昕(1997—),女,四川绵阳人,硕士研究生,CCF会员,主要研究方向:数据挖掘、生物信息学、机器学习基金资助:
Received:
2021-11-17
Revised:
2021-11-23
Accepted:
2021-12-06
Online:
2021-12-31
Published:
2022-11-10
Contact:
Tao JIA
About author:
LI Xinborn in 1997, M. S. candidate. Her research interests include data mining, bioinformatics, machine learning.Supported by:
摘要:
针对使用大规模组蛋白修饰(HM)数据预测基因差异性表达(DGE)时未合理利用细胞型特异性(CS)和细胞型间异同两类信息,且输入规模大、计算量高等问题,提出一种深度学习方法dcsDiff。首先,使用多个自编码器(AE)和双向长短时记忆(Bi?LSTM)网络降维,并建模HM信号得到嵌入表示;然后,利用多个卷积神经网络(CNN)分别挖掘每类CS的HM组合效应以及两细胞型间每种HM的异同信息和所有HM的联合影响;最后,融合两类信息预测两细胞型间的 DGE。在对REMC数据库中10对细胞型的实验中,与DeepDiff相比,dcsDiff的预测DGE的皮尔逊相关系数(PCC)最高提升了7.2%、平均提升了3.9%,准确检测出差异表达基因的数量最多增加了36、平均增加了17.6,运行时间节省了78.7%;进一步的成分分析实验证明了合理整合上述两类信息的有效性;并通过实验确定了算法的参数。实验结果表明dcsDiff能有效提高DGE预测的效率。
中图分类号:
李昕, 贾韬. 基于组蛋白修饰数据预测基因差异性表达的深度融合模型[J]. 计算机应用, 2022, 42(11): 3404-3412.
Xin LI, Tao JIA. Deep fusion model for predicting differential gene expression by histone modification data[J]. Journal of Computer Applications, 2022, 42(11): 3404-3412.
组蛋白修饰 | 相关基因组区域 |
---|---|
H3K4me1 | 增强子(Enhancer) |
H3K4me3 | 启动子(Promoter) |
H3K9me3 | 异染色质(Heterochromatin) |
H3K27me3 | Polycomb抑制(Polycomb repression) |
H3K36me3 | 转录(Transcribed) |
表1 组蛋白修饰及相关基因组区域
Tab. 1 Histone modification and related genome regions
组蛋白修饰 | 相关基因组区域 |
---|---|
H3K4me1 | 增强子(Enhancer) |
H3K4me3 | 启动子(Promoter) |
H3K9me3 | 异染色质(Heterochromatin) |
H3K27me3 | Polycomb抑制(Polycomb repression) |
H3K36me3 | 转录(Transcribed) |
表2 REMC数据库中选择的9种细胞型和ID
Tab. 2 Nine selected cell types and IDs in REMC database
表3 选择的实验细胞型对
Tab. 3 Chosen cell type pairs for experiments
序号 | ||||
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 | ||||
8 | ||||
9 | ||||
10 |
表4 每个细胞型对的DE基因以及被dcsDiff和DeepDiff正确检测到的DE基因的统计
Tab. 4 Statistics of DE genes and correctly detected DE genes by dcsDiff and DeepDiff on each cell type pair
序号 | ||||
---|---|---|---|---|
1 | ||||
2 | ||||
3 | ||||
4 | ||||
5 | ||||
6 | ||||
7 | ||||
8 | ||||
9 | ||||
10 |
表5 bin大小及相应的bin数目
Tab. 5 Size of bin and corresponding number of bins
表6 运行时间统计 ( min)
Tab. 6 Statistics of running time
1 | BANNISTER A J, KOUZARIDES T. Regulation of chromatin by histone modifications[J]. Cell Research, 2011, 21(3): 381-395. 10.1038/cr.2011.22 |
2 | HEINTZMAN N D, STUART R K, HON G, et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome[J]. Nature Genetics, 2007, 39(3): 311-318. 10.1038/ng1966 |
3 | BONASIO R, TU S J, REINBERG D. Molecular signals of epigenetic states[J]. Science, 2010, 330(6004): 612-616. 10.1126/science.1191078 |
4 | LI B, CAREY M, WORKMAN J L. The role of chromatin during transcription[J]. Cell, 2007, 128(4): 707-719. 10.1016/j.cell.2007.01.015 |
5 | LIM P S, HARDY K, BUNTING K L, et al. Defining the chromatin signature of inducible genes in T cells[J]. Genome Biology, 2009, 10(10): No.R107. 10.1186/gb-2009-10-10-r107 |
6 | CAIN C E, BLEKHMAN R, MARIONI J C, et al. Gene expression differences among primates are associated with changes in a histone epigenetic modification[J]. Genetics, 2011, 187(4): 1225-1234. 10.1534/genetics.110.126177 |
7 | GJONESKA E, PFENNING A R, MATHYS H, et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease[J]. Nature, 2015, 518(7539): 365-369. 10.1038/nature14252 |
8 | WENG N P, ARAKI Y, SUBEDI K. The molecular basis of the memory T cell response: differential gene expression and its epigenetic regulation[J]. Nature Reviews Immunology, 2012, 12(4): 306-315. 10.1038/nri3173 |
9 | RINTISCH C, HEINIG M, BAUERFEIND A, et al. Natural variation of histone modification and its impact on gene expression in the rat genome[J]. Genome Research, 2014, 24(6): 942-953. 10.1101/gr.169029.113 |
10 | 刘建敏,曹蜀炜,张萌,等. RNF2基因在喉癌组织及细胞中的表达情况研究[J]. 医学综述, 2018, 24(14): 2886-2889, 2895. 10.3969/j.issn.1006-2084.2018.14.036 |
LIU J M, CAO S W, ZHANG M, et al. Study of RNF2 expression in larynocarcinoma tissue and cells[J]. Medical Recapitulate, 2018, 24(14): 2886-2889, 2895. 10.3969/j.issn.1006-2084.2018.14.036 | |
11 | 岳峰,孙亮,王宽全,等. 基因表达数据的聚类分析研究进展[J]. 自动化学报, 2008, 34(2):113-120. 10.3724/SP.J.1004.2008.00113 |
YUE F, SUN L, WANG K Q, et al. State‑of‑the‑art of cluster analysis of gene expression data[J]. Acta Automatica Sinica, 2008, 34(2):113-120. 10.3724/SP.J.1004.2008.00113 | |
12 | 闫麒,蔺亚妮,黄先琪,等. 急性髓系白血病融合基因表达特点分析[J]. 中华血液学杂志, 2021, 42(6):480-486. 10.3760/cma.j.issn.0253-2727.2021.06.007 |
YAN Q, LIN Y N, HUANG X Q, et al. Analysis of fusion gene expression in acute myeloid leukemia[J]. Chinese Journal of Hematology, 2021, 42(6):480-486. 10.3760/cma.j.issn.0253-2727.2021.06.007 | |
13 | 雷越,万婕,文韬宇,等. siRNA沉默FOXM1基因表达对人鼻咽癌细胞增殖、凋亡及化疗敏感性的影响[J]. 肿瘤, 2018, 38(1):25-34. 10.3781/j.issn.1000-7431.2018.11.741 |
LEI Y, WAN J, WEN T Y, et al. Effects of siRNA silencing the expression of FOXM1 gene on proliferation, apoptosis and chemosensitivity of human nasopharyngeal carcinoma cells[J]. Tumor, 2018, 38(1): 25-34. 10.3781/j.issn.1000-7431.2018.11.741 | |
14 | 袁佳仪,何恒晶,毕娅琼,等. TOP2A基因表达对膀胱癌的预后价值分析[J]. 国际肿瘤学杂志, 2018, 45(1):22-26. 10.3760/cma.j.issn.1673-422X.2018.01.005 |
YUAN J Y, HE H J, BI Y Q, et al. Prognostic value analysis of TOP2A gene expression for bladder cancer[J]. Journal of International Oncology, 2018, 45(1): 22-26. 10.3760/cma.j.issn.1673-422X.2018.01.005 | |
15 | 刘潇,李文桂. 铜绿假单胞菌重组Bb‑OprI疫苗诱导小鼠保护力和脾细胞因子基因表达变化的研究[J]. 中国病原生物学杂志, 2018, 13(3):226-229. |
LIU X, LI W G. Study on protection by and changes in expression of cytokine genes in splenocytes from mice inoculated with a recombinant Bb‑Oprl vaccine against Pseudomonas aeruginosa[J]. Journal of Pathogen Biology, 2018, 13(3):226-229. | |
16 | JIA T, KULKARNI R V. Intrinsic noise in stochastic models of gene expression with molecular memory and bursting[J]. Physical Review Letters, 2011, 106(5): No.058102. 10.1103/physrevlett.106.058102 |
17 | QIU S G, JIA T. Quantifying the noise in bursty gene expression under regulation by small RNAs[J]. International Journal of Modern Physics C, 2019, 30(7): No.1940002. 10.1142/s0129183119400023 |
18 | ZHANG J J, ZHOU T S. Markovian approaches to modeling intracellular reaction processes with molecular memory[J]. Proceedings of the National Academy of Sciences of the United States of America, 2019, 116(47): 23542-23550. 10.1073/pnas.1913926116 |
19 | ZHANG Z Q, DENG Q Q, WANG Z H, et al. Exact results for queuing models of stochastic transcription with memory and crosstalk[J]. Physical Review E, 2021, 103(6): No.062414. 10.1103/physreve.103.062414 |
20 | KUMAR N, JIA T, ZARRINGHALAM K, et al. Frequency modulation of stochastic gene expression bursts by strongly interacting small RNAs[J]. Physical Review E, 2016, 94(4): No.042419. 10.1103/physreve.94.042419 |
21 | OUYANG Z Q, ZHOU Q, WONG W H. ChIP‑Seq of transcription factors predicts absolute and differential gene expression in embryonic stem cells[J]. Proceedings of the National Academy of Sciences of the United States of America, 2009, 106(51): 21521-21526. 10.1073/pnas.0904863106 |
22 | XU J F, SHI J J, CUI X D, et al. Cellular Heterogeneity‑Adjusted cLonal Methylation (CHALM) improves prediction of gene expression[J]. Nature Communications, 2021, 12: No.400. 10.1038/s41467-020-20492-7 |
23 | NATARAJAN A, YARDIMCI G G, SHEFFIELD N C, et al. Predicting cell‑type‑specific gene expression from regions of open chromatin[J]. Genome Research, 2012, 22(9): 1711-1722. 10.1101/gr.135129.111 |
24 | KARLIĆ R, CHUNG H R, LASSERRE J, et al. Histone modification levels are predictive for gene expression[J]. Proceedings of the National Academy of Sciences of the United States of America, 2010, 107(7): 2926-2931. 10.1073/pnas.0909344107 |
25 | HO B H, HASSEN R M K, LE N T. Combinatorial roles of DNA methylation and histone modifications on gene expression[C]// Proceedings of the 2014 National Foundation for Science and Technology Development (NAFOSTED) Conference on Information and Computer Science, AISC 341. Cham: Springer, 2015: 123-135. |
26 | CHENG C, GERSTEIN M. Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells[J]. Nucleic Acids Research, 2012, 40(2): 553-568. 10.1093/nar/gkr752 |
27 | LI Z C, GAO N, MARTINI J W R, et al. Integrating gene expression data into genomic prediction[J]. Frontiers in Genetics, 2019, 10: No.126. 10.3389/fgene.2019.00126 |
28 | SCHMIDT F, KERN F, SCHULZ M H. Integrative prediction of gene expression with chromatin accessibility and conformation data[J]. Epigenetics and Chromatin, 2020, 13: No.4. 10.1186/s13072-020-0327-0 |
29 | AVSEC Ž, AGARWAL V, VISENTIN D, et al. Effective gene expression prediction from sequence by integrating long‑range interactions[J]. Nature Methods, 2021, 18(10):1196-1203. 10.1038/s41592-021-01252-x |
30 | LI J, CHING T, HUANG S J, et al. Using epigenomics data to predict gene expression in lung cancer[J]. BMC Bioinformatics, 2015, 16(S5): No.S10. 10.1186/1471-2105-16-s5-s10 |
31 | FRASCA M, PAVESI G. A neural network based algorithm for gene expression prediction from chromatin structure[C]// Proceedings of the 2013 International Joint Conference on Neural Networks. Piscataway: IEEE, 2013: 1-8. 10.1109/ijcnn.2013.6706954 |
32 | KUMAR V, MURATANI M, RAYAN N A, et al. Uniform, optimal signal processing of mapped deep sequencing data[J]. Nature Biotechnology, 2013, 31(7): 615-622. 10.1038/nbt.2596 |
33 | ERNST J, KELLIS M. Large‑scale imputation of epigenomic datasets for systematic annotation of diverse human tissues[J]. Nature Biotechnology, 2015, 33(4): 364-367. 10.1038/nbt.3157 |
34 | COSTA I G, ROIDER H G, REGO T G DO, et al. Predicting gene expression in t cell differentiation from histone modifications and transcription factor binding affinities by linear mixture models[J]. BMC Bioinformatics, 2011, 12(S1): No.S29. 10.1186/1471-2105-12-s1-s29 |
35 | CHENG C, YAN K K, YIP K Y, et al. A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets[J]. Genome Biology, 2011, 12(2): No.R15. 10.1186/gb-2011-12-2-r15 |
36 | DONG X J, GREVEN M C, KUNDAJE A, et al. Modeling gene expression using chromatin features in various cellular contexts[J]. Genome Biology, 2012, 13(9): No.R53. 10.1186/gb-2012-13-9-r53 |
37 | SINGH R, LANCHANTIN J, ROBINS G, et al. DeepChrome: deep‑learning for predicting gene expression from histone modifications[J]. Bioinformatics, 2016, 32(17): i639-i648. 10.1093/bioinformatics/btw427 |
38 | SINGH R, LANCHANTIN J, SEKHON A, et al. Attend and predict: understanding gene regulation by selective attention on chromatin[J]. Advances in Neural Information Processing Systems, 2017, 30: 6785-6795. |
39 | SEKHON A, SINGH R, QI Y J. DeepDiff: DEEP‑learning for predicting DIFFerential gene expression from histone modifications[J]. Bioinformatics, 2018, 34(17): i891-i900. 10.1093/bioinformatics/bty612 |
40 | HOCHREITER S, SCHMIDHUBER J. Long short‑term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. 10.1162/neco.1997.9.8.1735 |
41 | LeCUN Y, BOTTOU L, BENGIO Y, et al. Gradient‑based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. 10.1109/5.726791 |
42 | Roadmap Epigenomics Consortium, KUNDAJE A, MEULEMAN W, et al. Integrative analysis of 111 reference human epigenomes[J]. Nature, 2015, 518(7539): 317-330. |
43 | CELNIKER S E, DILLON L A L, GERSTEIN M B, et al. Unlocking the secrets of the genome[J]. Nature, 2009, 459(7249): 927-930. 10.1038/459927a |
44 | The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome[J]. Nature, 2012, 489(7414): 57-74. 10.1038/nature11247 |
45 | TOMCZAK K, CZERWIŃSKA P, WIZNEROWICZ M. The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge[J]. Contemporary Oncology/Współczesna Onkologia, 2015, 19(1A): A68-A77. 10.5114/wo.2014.47136 |
46 | On behalf of The ENCODE Project Consortium. The ENCODE (ENCyclopedia of DNA elements) project[J]. Science, 2004, 306(5696): 636-640. 10.1126/science.1105136 |
47 | KONONENKO I, ŠIMEC E, ROBNIK‑ŠIKONJA M. Overcoming the myopia of inductive learning algorithms with RELIEFF[J]. Applied Intelligence, 1997, 7(1): 39-55. 10.1023/a:1008280620621 |
48 | NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]// Proceedings of the 27th International Conference on Machine Learning. Madison, WI: Omnipress, 2010: 807-814. |
49 | KINGMA D P, BA J L. Adam: a method for stochastic optimization[EB/OL]. (2017-01-30) [2021-06-30].. |
50 | PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit‑learn: machine learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830. |
[1] | 邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603. |
[2] | 李宗禹, 强思维, 郭晓波, 朱振峰. 重加权的对抗变分自编码器及其在工业因果效应估计中的应用[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1099-1106. |
[3] | 张卓, 陈花竹. 基于一致性和多样性的多尺度自表示学习的深度子空间聚类[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 353-359. |
[4] | 蒋辉, 闫秋艳, 姜竹郡. 面向多元时间序列异常检测的对称正定自编码器方法[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3294-3299. |
[5] | 马国帅, 钱宇华, 张亚宇, 李俊霞, 刘郭庆. 动态异构信息融合的科研合作潜力预测[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2775-2783. |
[6] | 李豆豆, 李汪根, 夏义春, 束阳, 高坤. 基于特征交互与自适应融合的骨骼动作识别[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2581-2587. |
[7] | 王静红, 周志霞, 王辉, 李昊康. 双路自编码器的属性网络表示学习[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2338-2344. |
[8] | 黄梦林, 段磊, 张袁昊, 王培妍, 李仁昊. 基于Prompt学习的无监督关系抽取模型[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2010-2016. |
[9] | 王思蕊, 程世娟, 袁非梦. 基于改进证据融合的高可靠产品可靠性评估方法[J]. 《计算机应用》唯一官方网站, 2023, 43(7): 2140-2146. |
[10] | 王昱, 范子琳, 任田君, 姬晓飞. 不完备信息下基于切换推理证据网络的空中目标识别方法[J]. 《计算机应用》唯一官方网站, 2023, 43(4): 1071-1078. |
[11] | 尹春勇, 周立文. 基于再编码的无监督时间序列异常检测模型[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 804-811. |
[12] | 徐少康, 张战成, 姚浩男, 邹智伟, 张宝成. 基于姿态编码器的2D/3D脊椎医学图像实时配准方法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 589-594. |
[13] | 马志峰, 于俊洋, 王龙葛. 多样性表示的深度子空间聚类算法[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 407-412. |
[14] | 贾晴, 王来花, 王伟胜. 基于独立循环神经网络与变分自编码网络的视频帧异常检测[J]. 《计算机应用》唯一官方网站, 2023, 43(2): 507-513. |
[15] | 魏楚元, 王梦珂, 户传豪, 张桄齐. 增强推荐系统可解释性的深度评论注意力神经网络模型[J]. 《计算机应用》唯一官方网站, 2023, 43(11): 3443-3448. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||