Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (S1): 81-87.DOI: 10.11772/j.issn.1001-9081.2022081138
• Artificial intelligence • Previous Articles Next Articles
Jiayi LIU1,2, Dongping CAO1,2, Yong ZHONG1,2()
Received:
2022-08-22
Revised:
2022-10-26
Accepted:
2022-11-14
Online:
2023-07-04
Published:
2023-06-30
Contact:
Yong ZHONG
通讯作者:
钟勇
作者简介:
刘嘉艺(1996—),男,四川内江人,硕士研究生,主要研究方向:人工智能、计算机视觉基金资助:
CLC Number:
Jiayi LIU, Dongping CAO, Yong ZHONG. End-to-end scene character detection and recognition algorithm based on differentiable architecture search[J]. Journal of Computer Applications, 2023, 43(S1): 81-87.
刘嘉艺, 曹冬平, 钟勇. 基于可微分架构搜索的端到端场景文字检测及识别算法[J]. 《计算机应用》唯一官方网站, 2023, 43(S1): 81-87.
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022081138
网络层 | 输入张量大小 | 输出张量大小 |
---|---|---|
Conv1 | N×3×32×32 | N×128×32×32 |
Maxpool1 | N×128×32×32 | N×128×16×16 |
MBASB1 | N×128×16×16 | N×128×16×16 |
Conv2 | N×128×16×16 | N×256×16×16 |
Maxpool2 | N×256×16×16 | N×256×8×8 |
MBASB2 | N×256×8×8 | N×256×8×8 |
Conv3 | N×256×8×8 | N×512×8×8 |
Maxpool3 | N×512×8×8 | N×512×4×4 |
MBASB3 | N×512×4×4 | N×512×4×4 |
Conv4 | N×512×4×4 | N×1 024×4×4 |
Maxpool4 | N×1 024×4×4 | N×1 024×2×2 |
MBASB4 | N×1 024×2×2 | N×1 024×2×2 |
GAP | N×1 024×2×2 | N×1 024×1×1 |
FC | N×1 024 | N×1 |
网络层 | 输入张量大小 | 输出张量大小 |
---|---|---|
Conv1 | N×3×32×32 | N×128×32×32 |
Maxpool1 | N×128×32×32 | N×128×16×16 |
MBASB1 | N×128×16×16 | N×128×16×16 |
Conv2 | N×128×16×16 | N×256×16×16 |
Maxpool2 | N×256×16×16 | N×256×8×8 |
MBASB2 | N×256×8×8 | N×256×8×8 |
Conv3 | N×256×8×8 | N×512×8×8 |
Maxpool3 | N×512×8×8 | N×512×4×4 |
MBASB3 | N×512×4×4 | N×512×4×4 |
Conv4 | N×512×4×4 | N×1 024×4×4 |
Maxpool4 | N×1 024×4×4 | N×1 024×2×2 |
MBASB4 | N×1 024×2×2 | N×1 024×2×2 |
GAP | N×1 024×2×2 | N×1 024×1×1 |
FC | N×1 024 | N×1 |
MBASB名 | 子分支类型 | 输入张量大小 | 输出张量大小 |
---|---|---|---|
MBASB1 | B5 | N×512×24×24 | N×256×24×24 |
MBASB2 | B2 | N×512×48×48 | N×256×48×48 |
MBASB3 | B2 | N×384×96×96 | N×128×96×96 |
MBASB4 | B2 | N×192×192×192 | N×64×192×192 |
MBASB5 | B5 | N×128×384×384 | N×128×384×384 |
MBASB名 | 子分支类型 | 输入张量大小 | 输出张量大小 |
---|---|---|---|
MBASB1 | B5 | N×512×24×24 | N×256×24×24 |
MBASB2 | B2 | N×512×48×48 | N×256×48×48 |
MBASB3 | B2 | N×384×96×96 | N×128×96×96 |
MBASB4 | B2 | N×192×192×192 | N×64×192×192 |
MBASB5 | B5 | N×128×384×384 | N×128×384×384 |
MBASB名 | 子分支类型 | 输入张量大小 | 输出张量大小 |
---|---|---|---|
MBASB1 | B2 | N×128×16×16 | N×128×16×16 |
MBASB2 | B2 | N×256×8×8 | N×256×8×8 |
MBASB3 | B2 | N×512×4×4 | N×512×4×4 |
MBASB4 | B1 | N×1 024×2×2 | N×1 024×2×2 |
MBASB名 | 子分支类型 | 输入张量大小 | 输出张量大小 |
---|---|---|---|
MBASB1 | B2 | N×128×16×16 | N×128×16×16 |
MBASB2 | B2 | N×256×8×8 | N×256×8×8 |
MBASB3 | B2 | N×512×4×4 | N×512×4×4 |
MBASB4 | B1 | N×1 024×2×2 | N×1 024×2×2 |
方法 | ICDAR13(DetEval) | ICDAR15 | FPS | ||
---|---|---|---|---|---|
召回率/% | 精确率/% | 召回率/% | 精确率/% | ||
SegLink | 83.0 | 87.7 | 76.8 | 73.1 | 20.6 |
SSTD | 86.0 | 89.0 | 73.0 | 80.0 | 7.7 |
Mask TextSpotter | 88.1 | 94.1 | 81.2 | 85.8 | 4.8 |
R2CNN | 82.6 | 93.6 | 79.7 | 85.6 | 0.4 |
PixelLink | 87.5 | 88.6 | 82.0 | 85.5 | 3.0 |
本文方法 | 89.4 | 91.4 | 80.5 | 86.8 | 68.4 |
方法 | ICDAR13(DetEval) | ICDAR15 | FPS | ||
---|---|---|---|---|---|
召回率/% | 精确率/% | 召回率/% | 精确率/% | ||
SegLink | 83.0 | 87.7 | 76.8 | 73.1 | 20.6 |
SSTD | 86.0 | 89.0 | 73.0 | 80.0 | 7.7 |
Mask TextSpotter | 88.1 | 94.1 | 81.2 | 85.8 | 4.8 |
R2CNN | 82.6 | 93.6 | 79.7 | 85.6 | 0.4 |
PixelLink | 87.5 | 88.6 | 82.0 | 85.5 | 3.0 |
本文方法 | 89.4 | 91.4 | 80.5 | 86.8 | 68.4 |
是否使用 可微架构搜索方法 | ICDAR13(DetEval) | ICDAR15 | ||
---|---|---|---|---|
召回率 | 精确率 | 召回率 | 精确率 | |
不使用 | 87.2 | 90.3 | 79.1 | 84.6 |
使用 | 89.4 | 91.4 | 80.5 | 86.8 |
是否使用 可微架构搜索方法 | ICDAR13(DetEval) | ICDAR15 | ||
---|---|---|---|---|
召回率 | 精确率 | 召回率 | 精确率 | |
不使用 | 87.2 | 90.3 | 79.1 | 84.6 |
使用 | 89.4 | 91.4 | 80.5 | 86.8 |
1 | 赵龙, 李飞, 王伟峰. 基于PSENet和CRNN的身份证识别[J]. 现代计算机, 2020(34): 78-82. 10.3969/j.issn.1007-1423.2020.34.017 |
2 | 王鹏飞,黄汉明,王梦琪.改进YOLOv5的复杂道路目标检测算法[J].计算机工程与应用,2022,58(17):81-92. 10.3778/j.issn.1002-8331.2205-0158 |
3 | YAO C, BAI X, SHI B, et al. Strokelets: A learned multi-scale representation for scene text recognition[C]// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2014: 4042-4049. 10.1109/cvpr.2014.515 |
4 | SHI B, YANG M, WANG X, et al. ASTER: An attentional scene text recognizer with flexible rectification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 41(9): 2035-2048. 10.1109/tpami.2018.2848939 |
5 | LIAO M, ZHANG J, WAN Z, et al. Scene text recognition from two-dimensional perspective[EB/OL]. [2022-10-23]. . 10.1609/aaai.v33i01.33018714 |
6 | LECUN Y, BOSER B, DENKER J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4): 541-551. 10.1162/neco.1989.1.4.541 |
7 | ZAREMBA W, SUTSKEVER I, VINYALS O. Recurrent neural network regularization[EB/OL]. [2022-10-23]. . |
8 | SUBAKAN C, RAVANELLI M, CORNELL S, et al. Attention is all you need in speech separation[C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 21-25. 10.1109/icassp39728.2021.9413901 |
9 | ATASHIN A A, GHIASI-SHIRAZI K, HARATI A. Training LDCRF model on unsegmented sequences using connectionist temporal classification[C]// Proceedings of the 2016 6th International Conference on Computer and Knowledge Engineering. Piscataway: IEEE, 2016: 280-285. 10.1109/iccke.2016.7802153 |
10 | ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8697-8710. 10.1109/cvpr.2018.00907 |
11 | HUANG W, QIAO Y, TANG X. Robust scene text detection with convolution neural network induced MSER trees[C]// Proceedings of the 2014 European Conference on Computer Vision. Cham: Springer, 2014: 497-511. 10.1007/978-3-319-10593-2_33 |
12 | LIAO M, SHI B, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. 10.1109/tip.2018.2825107 |
13 | YAO C, BAI X, SANG N, et al. Scene text detection via holistic, multi-channel prediction[EB/OL]. [2022-10-23]. . |
14 | HE P, HUANG W, HE T, et al. Single shot text detector with regional attention[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3047-3055. 10.1109/iccv.2017.331 |
15 | SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2550-2558. 10.1109/cvpr.2017.371 |
16 | LIAO M, ZHU Z, SHI B, et al. Rotation-sensitive regression for oriented scene text detection[C]// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 5909-5918. 10.1109/cvpr.2018.00619 |
17 | SHI B, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 39(11): 2298-2304. 10.1109/tpami.2016.2646371 |
18 | SHI B, WANG X, LYU P, et al. Robust scene text recognition with automatic rectification[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 4168-4176. 10.1109/cvpr.2016.452 |
19 | BAEK J, MATSUI Y, AIZAWA K. What if we only use real datasets for scene text recognition? toward scene text recognition with fewer labels[C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 3113-3122. 10.1109/cvpr46437.2021.00313 |
20 | DU Y, LI C, GUO R, et al. PP-OCR: a practical ultra lightweight OCR system[EB/OL]. [2022-10-23]. . |
21 | HU W, CAI X, HOU J, et al. GTC: guided training of CTC towards efficient and accurate scene text recognition[EB/OL]. [2022-10-23]. . 10.1609/aaai.v34i07.6735 |
22 | JADERBERG M, SIMONYAN K, VEDALDI A, et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016, 116(1): 1-20. 10.1007/s11263-015-0823-z |
23 | MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. 10.1109/tmm.2018.2818020 |
24 | LI H, WANG P, SHEN C. Towards end-to-end text spotting with convolutional recurrent neural networks[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 5238-5246. 10.1109/iccv.2017.560 |
25 | BUSTA M, NEUMANN L, MATAS J. Deep TextSpotter: an end-to-end trainable scene text localization and recognition framework[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2204-2212. 10.1109/iccv.2017.242 |
26 | CHEN X, XIE L, WU J, et al. Progressive differentiable architecture search: bridging the depth gap between search and evaluation[C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 1294-1303. 10.1109/iccv.2019.00138 |
27 | ZHANG H, YAO Q, YANG M, et al. AutoSTR: efficient backbone search for scene text recognition[C]// Proceedings of the 2020 European Conference on Computer Vision. Cham: Springer, 2020: 751-767. 10.1007/978-3-030-58586-0_44 |
28 | BAEK Y, LEE B, HAN D, et al. Character region awareness for text detection[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9365-9374. 10.1109/cvpr.2019.00959 |
29 | LIAO M, LYU P, HE M, et al. Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes[EB/OL]. [2022-10-23]. . 10.1109/tpami.2019.2937086 |
30 | JIANG Y, ZHU X, WANG X, et al. R2 CNN: Rotational region CNN for orientation robust scene text detection[EB/OL]. [2022-10-23]. . 10.1109/icpr.2018.8545598 |
31 | DENG D, LIU H, LI X, et al. PixelLink: detecting scene text via instance segmentation[EB/OL]. [2022-10-23]. . 10.1609/aaai.v32i1.12269 |
[1] | Qingli YOU, Guoyong LI. Offline handwritten signature authentication algorithm based on Siamese network [J]. Journal of Computer Applications, 2023, 43(S1): 45-48. |
[2] | Yiren LI, Pei SHEN. Foreign object detection method for business demand of scrap steel recycling [J]. Journal of Computer Applications, 2023, 43(S1): 243-249. |
[3] | Quanyou SHEN, Xiaobo ZHANG, Wenhao LI, Lihan LI, Rongde XU, Daohua CHEN, Jing LI. Progress of U-Net applicaitons to lung nodule segmentation [J]. Journal of Computer Applications, 2023, 43(S1): 250-257. |
[4] | Xiajiao ZHONG, Shaobing ZHANG, Jing GUO, Shengchao WANG, Miao CHENG, Lian HE, Yimin ZHAO. 3D point cloud tooth and jaw segmentation and identification based on RandLA-Net [J]. Journal of Computer Applications, 2023, 43(S1): 269-275. |
[5] | Xuelin WANG, Lixue DU, Dejin CHEN, Xiaqing ZHANG, Tao XU, Yaxin CHEN, Zhangwei YU. Localization of automobile fuel tank cover based on deep learning and binocular vision [J]. Journal of Computer Applications, 2023, 43(S1): 281-287. |
[6] | Xudong CHEN, Heng ZHONG, Jie HUANGFU, Gaochong LYU, Cheng WANG, Deliang WANG, Kai TONG. Review of emotion recognition of EEG signals [J]. Journal of Computer Applications, 2023, 43(S1): 323-332. |
[7] | Wanyang XU, Wengen LI, Jihong GUAN. Heterogeneous table information extraction model for financial Web data [J]. Journal of Computer Applications, 2023, 43(S1): 56-60. |
[8] | Xiwei LIU, Xiaoyan GONG, Hongxia ZHAO, Siyu BIAN, Shuai SHAO, Yaping DAI, Wenxin DAI. Dynamic facial expression recognition based on hybrid attention mechanism [J]. Journal of Computer Applications, 2023, 43(S1): 1-7. |
[9] | Kui JIANG, Zhihang YU, Xiaolei CHEN, Yuhao LI. Design and implementation of Webshell traffic detection system based on BERT-CNN [J]. Journal of Computer Applications, 2023, 43(S1): 126-132. |
[10] | Dong WANG, Xian ZHANG, Da LI, Qinglei GUO, Xin CHANG, Jingli FENG. Blockchain security protection scheme for power grid based on distributed anomaly detection [J]. Journal of Computer Applications, 2023, 43(S1): 139-146. |
[11] | Pengliu TAN, Guangyong XU, Luyu ZHANG, Runshu WANG. Heart disease prediction model based on convolutional neural network and Adaboost [J]. Journal of Computer Applications, 2023, 43(S1): 19-25. |
[12] | Jingqiao LU, Wei BIN, Yongqiang LU, Guangzhu MAI, Yin CHEN, Yanxiong WU. Fine-grained visual classification combining attention mutual exclusion regularization [J]. Journal of Computer Applications, 2023, 43(S1): 224-228. |
[13] | Huibin ZHANG, Liping FENG, Yaojun HAO, Yining WANG. Ancient mural dynasty identification based on attention mechanism and transfer learning [J]. Journal of Computer Applications, 2023, 43(6): 1826-1832. |
[14] | Yichi CHEN, Bin CHEN. Review of lifelong learning in computer vision [J]. Journal of Computer Applications, 2023, 43(6): 1785-1795. |
[15] | Xin JIN, Yangchuan LIU, Yechen ZHU, Zijian ZHANG, Xin GAO. Sinogram inpainting for sparse-view cone-beam computed tomography image reconstruction based on residual encoder-decoder generative adversarial network [J]. Journal of Computer Applications, 2023, 43(6): 1950-1957. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||