Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (12): 3614-3619.DOI: 10.11772/j.issn.1001-9081.2021061082
Special Issue: 第十八届中国机器学习会议(CCML 2021)
• The 18th China Conference on Machine Learning • Previous Articles Next Articles
Tengqi JI, Jun MENG(), Siyuan ZHAO, Hehuan HU
Received:
2021-05-12
Revised:
2021-06-24
Accepted:
2021-07-21
Online:
2021-12-28
Published:
2021-12-10
Contact:
Jun MENG
About author:
JI Tengqi, born in 1996, M. S. candidate. His research interests include bioinformatics, machine learning.Supported by:
通讯作者:
孟军
作者简介:
纪腾其(1996—),男,山东烟台人,硕士研究生,主要研究方向:生物信息学、机器学习基金资助:
CLC Number:
Tengqi JI, Jun MENG, Siyuan ZHAO, Hehuan HU. Prediction model of lncRNA-encoded short peptides based on representation learning and deep forest[J]. Journal of Computer Applications, 2021, 41(12): 3614-3619.
纪腾其, 孟军, 赵思远, 胡鹤还. 基于表示学习和深度森林的长链非编码RNA编码短肽预测模型[J]. 《计算机应用》唯一官方网站, 2021, 41(12): 3614-3619.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021061082
真实结果 | 预测结果 | |
---|---|---|
Positive Class | Negative Class | |
Positive Class | TP | FN |
Negative Class | FP | TN |
Tab. 1 Meaning of classification results
真实结果 | 预测结果 | |
---|---|---|
Positive Class | Negative Class | |
Positive Class | TP | FN |
Negative Class | FP | TN |
模型 | ACC±SD②/% | P±SD/% | R±SD/% | F1±SD/% |
---|---|---|---|---|
NB | 76.36±2.8 | 75.91±3.7 | 77.01±4.0 | 76.35±2.8 |
AE+NB | 76.77±2.6 | 74.00±4.9 | 78.79±5.0 | 76.80±2.6 |
DT | 83.02±1.9 | 84.80±3.4 | 80.31±2.8 | 83.00±1.9 |
AE+DT | 86.36±1.3 | 84.70±3.4 | 86.75±1.8 | 86.37±1.3 |
RF | 86.46±1.3 | 83.45±4.2 | 82.84±2.8 | 86.46±1.3 |
AE+RF | 87.50±1.6 | 85.46±2.9 | 88.17±2.4 | 87.50±1.6 |
DF | 87.92±1.6 | 85.77±3.5 | 89.17±3.0 | 87.93±1.6 |
本文模型 | 92.08±1.2 | 91.23±1.1 | 92.40±2.6 | 92.08±1.2 |
Tab. 2 Result comparison of the proposed model with traditional machine learning models, their combined models and DF on Arabidopsis thaliana dataset
模型 | ACC±SD②/% | P±SD/% | R±SD/% | F1±SD/% |
---|---|---|---|---|
NB | 76.36±2.8 | 75.91±3.7 | 77.01±4.0 | 76.35±2.8 |
AE+NB | 76.77±2.6 | 74.00±4.9 | 78.79±5.0 | 76.80±2.6 |
DT | 83.02±1.9 | 84.80±3.4 | 80.31±2.8 | 83.00±1.9 |
AE+DT | 86.36±1.3 | 84.70±3.4 | 86.75±1.8 | 86.37±1.3 |
RF | 86.46±1.3 | 83.45±4.2 | 82.84±2.8 | 86.46±1.3 |
AE+RF | 87.50±1.6 | 85.46±2.9 | 88.17±2.4 | 87.50±1.6 |
DF | 87.92±1.6 | 85.77±3.5 | 89.17±3.0 | 87.93±1.6 |
本文模型 | 92.08±1.2 | 91.23±1.1 | 92.40±2.6 | 92.08±1.2 |
模型 | ACC±SD/% | P±SD/% | R±SD/% | F1±SD/% |
---|---|---|---|---|
CNN | 90.42±2.2 | 91.62±3.3 | 88.64±2.6 | 90.42±2.2 |
AE+CNN | 91.04±1.9 | 88.75±3.3 | 92.95±2.1 | 91.05±1.9 |
RNN | 89.48±1.5 | 89.15±2.5 | 89.58±2.6 | 89.49±1.4 |
AE+RNN | 90.00±1.7 | 89.24±1.5 | 90.65±1.0 | 89.99±1.7 |
本文模型 | 92.08±1.2 | 91.23±1.1 | 92.40±2.6 | 92.08±1.2 |
Tab. 3 Result comparison of the proposed model with deep learning models and their combined models on Arabidopsis thaliana dataset
模型 | ACC±SD/% | P±SD/% | R±SD/% | F1±SD/% |
---|---|---|---|---|
CNN | 90.42±2.2 | 91.62±3.3 | 88.64±2.6 | 90.42±2.2 |
AE+CNN | 91.04±1.9 | 88.75±3.3 | 92.95±2.1 | 91.05±1.9 |
RNN | 89.48±1.5 | 89.15±2.5 | 89.58±2.6 | 89.49±1.4 |
AE+RNN | 90.00±1.7 | 89.24±1.5 | 90.65±1.0 | 89.99±1.7 |
本文模型 | 92.08±1.2 | 91.23±1.1 | 92.40±2.6 | 92.08±1.2 |
数据集 | ACC/% | P/% | R/% | F1/% |
---|---|---|---|---|
Glycine max | 78.16 | 79.65 | 75.63 | 78.14 |
Zea mays | 74.92 | 72.12 | 81.23 | 74.82 |
Tab. 4 Classification results of the proposed model on Glycine max and Zea mays datasets
数据集 | ACC/% | P/% | R/% | F1/% |
---|---|---|---|---|
Glycine max | 78.16 | 79.65 | 75.63 | 78.14 |
Zea mays | 74.92 | 72.12 | 81.23 | 74.82 |
1 | KLEAVELAND B, SHI C Y, STEFANO J, et al. A network of noncoding regulatory RNAs acts in the mammalian brain[J]. Cell, 2018, 174(2): 350-362.e17. 10.1016/j.cell.2018.05.022 |
2 | CUI J, JIANG N, MENG J, et al. LncRNA33732‐respiratory burst oxidase module associated with WRKY1 in tomato‐ Phytophthora infestans interactions[J]. The Plant Journal, 2019, 97(5): 933-946. 10.1111/tpj.14173 |
3 | RÖHRIG H, SCHMIDT J, MIKLASHEVICHS E, et al. Soybean ENOD40 encodes two peptides that bind to sucrose synthase[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(4): 1915-1920. 10.1073/pnas.022664799 |
4 | LEVINE M T, JONES C D, KERN A D, et al. Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression[J]. Proceedings of the National Academy of Sciences of the United States of America, 2006, 103(26): 9935-9939. 10.1073/pnas.0509809103 |
5 | FESENKO I, KIROV I, KNIAZEV A, et al. Distinct types of short open reading frames are translated in plant cells[J]. Genome Research, 2019, 29(9): 1464-1477. 10.1101/gr.253302.119 |
6 | NELSON B R, MAKAREWICH C A, ANDERSON D M, et al. A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle[J]. Science, 2016, 351(6270): 271-275. 10.1126/science.aad4076 |
7 | LIU H Z, ZHOU X, YUAN M Q, et al. ncEP: a manually curated database for experimentally validated ncRNA-encoded proteins or peptides[J]. Journal of Molecular Biology, 2020, 432(11): 3364-3368. 10.1016/j.jmb.2020.02.022 |
8 | 常征,孟军,施云生,等. 多特征融合的lncRNA识别与其功能预测[J]. 智能系统学报, 2018, 13(6):68-74. 10.11992/tis.201806008 |
CHANG Z, MENG J, SHI Y S, et al. LncRNA recognition by fusing multiple features and its function prediction[J]. CAAI Transactions on Intelligent Systems, 2018, 13(6): 68-74. 10.11992/tis.201806008 | |
9 | WEKESA J S, MENG J, LUAN Y S. Multi-feature fusion for deep learning to predict plant lncRNA-protein interaction[J]. Genomics, 2020, 112(5): 2928-2936. 10.1016/j.ygeno.2020.05.005 |
10 | KANG Q, MENG J, SHI W H, et al. Ensemble deep learning based on multi-level information enhancement and greedy fuzzy decision for plant miRNA-lncRNA interaction prediction[J]. Interdisciplinary Sciences: Computational Life Sciences, 2021, 13(4): 603-614. 10.1007/s12539-021-00434-7 |
11 | KARIM S. Exploring plant tolerance to biotic and abiotic stresses[D]. Uppsala: Swedish University of Agricultural Sciences, 2007: 18-23. |
12 | ROMBEL I T, SYKES K F, RAYNER S, et al. ORF-FINDER: a vector for high-throughput gene identification[J]. Gene, 2002, 282(1/2): 33-41. 10.1016/s0378-1119(01)00819-8 |
13 | HANADA K, AKIYAMA K, SAKURAI T, et al. sORF finder: a program package to identify small open reading frames with high coding potential[J]. Bioinformatics, 2010, 26(3): 399-400. 10.1093/bioinformatics/btp688 |
14 | ZHU M M, GRIBSKOV M. MiPepid: MicroPeptide identification tool using machine learning[J]. BMC Bioinformatics, 2019, 20: No.559. 10.1186/s12859-019-3033-9 |
15 | DENG J, ZHANG Z X, EYBEN F, et al. Autoencoder-based unsupervised domain adaptation for speech emotion recognition[J]. IEEE Signal Processing Letters, 2014, 21(9): 1068-1072. 10.1109/lsp.2014.2324759 |
16 | 樊玮,王慧敏,邢艳. 基于自编码器的多视图属性网络表示学习模型[J]. 计算机应用, 2021, 41(4):1064-1070. |
FAN W, WANG H M, XING Y. Auto-encoder based multi-view attributed network representation learning model[J]. Journal of Computer Applications, 2021, 41(4):1064-1070. | |
17 | YANG J C, MA S P, JIANG X P. Predicting LncRNA-disease association by autoencoder and rotation forest[C]// Proceedings of the 2019 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway: IEEE, 2019: 159-164. 10.1109/bibm47256.2019.8983261 |
18 | BAEK J, LEE B, KWON S, et al. lncRNAnet: long non-coding RNA identification using deep learning[J]. Bioinformatics, 2018, 34(22): 3889-3897. 10.1093/bioinformatics/bty418 |
19 | ZHOU Z H, FENG J. Deep forest[J]. National Science Review, 2019, 6(1): 74-86. 10.1093/nsr/nwy108 |
20 | LI Y, ZHANG Q, LIU Z Q, et al. Deep forest ensemble learning for classification of alignments of non-coding RNA sequences based on multi-view structure representations[J]. Briefings in Bioinformatics, 2021, 22(4):. 10.1093/bib/bbaa354 |
No.bbaa35. 10.1093/bib/bbaa354 | |
21 | GOODSTEIN D M, SHU S Q, HOWSON R, et al. Phytozome: a comparative platform for green plant genomics[J]. Nucleic Acids Research, 2012, 40(D1): D1178-D1186. 10.1093/nar/gkr944 |
22 | FU L M, NIU B F, ZHU Z W, et al. CD-HIT: accelerated for clustering the next generation sequencing data[J]. Bioinformatics, 2012, 28(23): 3150-3152. 10.1093/bioinformatics/bts565 |
23 | NEGRI T D C, ALVES W A L, BUGATTI P H, et al. Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants[J]. Briefings in Bioinformatics, 2019, 20(2): 682-689. 10.1093/bib/bby034 |
24 | YIN C C, YAU S S T. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence[J]. Journal of Theoretical Biology, 2007, 247(4): 687-694. 10.1016/j.jtbi.2007.03.038 |
25 | RODRIGUEZ-GALIANO V F, GHIMIRE B, ROGAN J, et al. An assessment of the effectiveness of a random forest classifier for land-cover classification[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2012, 67: 93-104. 10.1016/j.isprsjprs.2011.11.002 |
26 | GAO C Z, CHENG Q, HE P, et al. Privacy-preserving Naive Bayes classifiers secure against the substitution-then-comparison attack[J]. Information Sciences, 2018, 444: 72-88. 10.1016/j.ins.2018.02.058 |
27 | SAFAVIAN S R, LANDGREBE D. A survey of decision tree classifier methodology[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1991, 21(3): 660-674. 10.1109/21.97458 |
28 | CHENG J, WANG P S, LI G, et al. Recent advances in efficient computation of deep convolutional neural networks[J]. Frontiers of Information Technology and Electronic Engineering, 2018, 19(1): 64-77. 10.1631/fitee.1700789 |
29 | WILLIAMS R J, ZIPSER D. A learning algorithm for continually running fully recurrent networks[J]. Neural Computation, 1989, 1(2): 270-280. 10.1162/neco.1989.1.2.270 |
[1] | Hailin XIAO, Tianyi HUANG, Qiuxiang DAI, Yuejun ZHANG, Zhongshan ZHANG. Safe reinforcement learning method for decision making of autonomous lane changing based on trajectory prediction [J]. Journal of Computer Applications, 2024, 44(9): 2958-2963. |
[2] | Yu DU, Yan ZHU. Constructing pre-trained dynamic graph neural network to predict disappearance of academic cooperation behavior [J]. Journal of Computer Applications, 2024, 44(9): 2726-2731. |
[3] | Guixiang XUE, Hui WANG, Weifeng ZHOU, Yu LIU, Yan LI. Port traffic flow prediction based on knowledge graph and spatio-temporal diffusion graph convolutional network [J]. Journal of Computer Applications, 2024, 44(9): 2952-2957. |
[4] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[5] | Xin YANG, Xueni CHEN, Chunjiang WU, Shijie ZHOU. Short-term traffic flow prediction of urban highway based on variant residual model and Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2947-2951. |
[6] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[7] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[8] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[9] | Chunxue ZHANG, Liqing QIU, Cheng’ai SUN, Caixia JING. Purchase behavior prediction model based on two-stage dynamic interest recognition [J]. Journal of Computer Applications, 2024, 44(8): 2365-2371. |
[10] | Quanmei ZHANG, Runping HUANG, Fei TENG, Haibo ZHANG, Nan ZHOU. Automatic international classification of disease coding method incorporating heterogeneous information [J]. Journal of Computer Applications, 2024, 44(8): 2476-2482. |
[11] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[12] | Tingjie TANG, Jiajin HUANG, Jin QIN, Hui LU. Session-based recommendation based on graph co-occurrence enhanced multi-layer perceptron [J]. Journal of Computer Applications, 2024, 44(8): 2357-2364. |
[13] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[14] | Runze TIAN, Yulong ZHOU, Hong ZHU, Gang XUE. Local information based path selection algorithm for service migration [J]. Journal of Computer Applications, 2024, 44(7): 2168-2174. |
[15] | Chao GE, Jiabin ZHANG, Lei WANG, Zhixin LUN. Trajectory planning for autonomous vehicles based on model predictive control [J]. Journal of Computer Applications, 2024, 44(6): 1959-1964. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||