Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3686-3691.DOI: 10.11772/j.issn.1001-9081.2021101749
Special Issue: 人工智能
• Artificial intelligence • Previous Articles Next Articles
Zhida FENG1,2(), Li CHEN1,2
Received:
2021-10-12
Revised:
2022-01-24
Accepted:
2022-01-24
Online:
2022-04-08
Published:
2022-12-10
Contact:
Zhida FENG
About author:
CHEN Li, born in 1977, Ph. D., professor. His research interests include computer vision, image processing.
Supported by:
通讯作者:
冯智达
作者简介:
陈黎(1977—),男,湖北武汉人,教授,博士,主要研究方向:计算机视觉、图像处理。
基金资助:
CLC Number:
Zhida FENG, Li CHEN. Single direction projected Transformer method for aliasing text detection[J]. Journal of Computer Applications, 2022, 42(12): 3686-3691.
冯智达, 陈黎. 面向混叠文字检测的单向投影Transformer方法[J]. 《计算机应用》唯一官方网站, 2022, 42(12): 3686-3691.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2021101749
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 88.04 | 73.50 | 79.10 |
PAN | 76.58 | 68.31 | 71.11 | |
DB | 93.94 | 88.01 | 90.53 | |
本文方法 | 98.54 | 98.48 | 98.50 | |
PSENet | ResNet50 | 89.89 | 75.23 | 80.91 |
PAN | 74.35 | 67.07 | 69.55 | |
DB | 95.24 | 90.22 | 92.39 | |
本文方法 | 98.43 | 98.35 | 98.39 |
Tab.1 Comparison results of IoU50 benchmark on BDD-SynText dataset
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 88.04 | 73.50 | 79.10 |
PAN | 76.58 | 68.31 | 71.11 | |
DB | 93.94 | 88.01 | 90.53 | |
本文方法 | 98.54 | 98.48 | 98.50 | |
PSENet | ResNet50 | 89.89 | 75.23 | 80.91 |
PAN | 74.35 | 67.07 | 69.55 | |
DB | 95.24 | 90.22 | 92.39 | |
本文方法 | 98.43 | 98.35 | 98.39 |
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 74.97 | 64.15 | 68.43 |
PAN | 78.21 | 75.58 | 76.50 | |
DB | 65.63 | 61.62 | 63.33 | |
本文方法 | 92.27 | 92.19 | 92.23 | |
PSENet | ResNet50 | 78.39 | 67.23 | 71.68 |
PAN | 64.35 | 59.47 | 61.18 | |
DB | 63.44 | 60.25 | 61.64 | |
本文方法 | 93.07 | 93.00 | 93.04 |
Tab.2 Comparison results of IoU75 benchmark on BDD-SynText dataset
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 74.97 | 64.15 | 68.43 |
PAN | 78.21 | 75.58 | 76.50 | |
DB | 65.63 | 61.62 | 63.33 | |
本文方法 | 92.27 | 92.19 | 92.23 | |
PSENet | ResNet50 | 78.39 | 67.23 | 71.68 |
PAN | 64.35 | 59.47 | 61.18 | |
DB | 63.44 | 60.25 | 61.64 | |
本文方法 | 93.07 | 93.00 | 93.04 |
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 98.60 | 90.08 | 94.15 |
PAN | 54.61 | 51.04 | 52.77 | |
DB | 41.49 | 60.23 | 49.14 | |
本文方法 | 97.25 | 97.33 | 97.29 | |
PSENet | ResNet50 | 99.50 | 91.50 | 95.33 |
PAN | 79.42 | 59.12 | 67.78 | |
DB | 45.92 | 72,80 | 56.32 | |
本文方法 | 97.42 | 97.51 | 97.47 |
Tab.3 Comparison results of IoU50 benchmark on RealText dataset
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 98.60 | 90.08 | 94.15 |
PAN | 54.61 | 51.04 | 52.77 | |
DB | 41.49 | 60.23 | 49.14 | |
本文方法 | 97.25 | 97.33 | 97.29 | |
PSENet | ResNet50 | 99.50 | 91.50 | 95.33 |
PAN | 79.42 | 59.12 | 67.78 | |
DB | 45.92 | 72,80 | 56.32 | |
本文方法 | 97.42 | 97.51 | 97.47 |
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 84.34 | 77.05 | 80.53 |
PAN | 42.34 | 39.57 | 40.91 | |
DB | 24.98 | 36.26 | 29.58 | |
本文方法 | 97.25 | 97.33 | 97.29 | |
PSENet | ResNet50 | 82.83 | 76.17 | 79.36 |
PAN | 64.20 | 47.80 | 54.80 | |
DB | 28.93 | 45.86 | 35.48 | |
本文方法 | 97.42 | 97.51 | 97.47 |
Tab.4 Comparison results of IoU75 benchmark on RealText dataset
方法 | 骨干网络 | P | R | F |
---|---|---|---|---|
PSENet | ResNet18 | 84.34 | 77.05 | 80.53 |
PAN | 42.34 | 39.57 | 40.91 | |
DB | 24.98 | 36.26 | 29.58 | |
本文方法 | 97.25 | 97.33 | 97.29 | |
PSENet | ResNet50 | 82.83 | 76.17 | 79.36 |
PAN | 64.20 | 47.80 | 54.80 | |
DB | 28.93 | 45.86 | 35.48 | |
本文方法 | 97.42 | 97.51 | 97.47 |
实验ID | BM | SDPT | MTT | P | R | F |
---|---|---|---|---|---|---|
1 | × | × | × | 87.12 | 88.43 | 87.77 |
2 | √ | × | × | 90.53 | 91.28 | 90.90 |
3 | × | √ | × | 90.57 | 90.50 | 90.53 |
4 | × | × | √ | 90.52 | 91.38 | 90.95 |
5 | √ | √ | × | 89.46 | 89.31 | 89.38 |
6 | √ | × | √ | 89.51 | 90.36 | 89.93 |
7 | × | √ | √ | 91.34 | 91.26 | 91.30 |
8 | √ | √ | √ | 92.27 | 92.19 | 92.23 |
Tab.5 Results of ablation experiment
实验ID | BM | SDPT | MTT | P | R | F |
---|---|---|---|---|---|---|
1 | × | × | × | 87.12 | 88.43 | 87.77 |
2 | √ | × | × | 90.53 | 91.28 | 90.90 |
3 | × | √ | × | 90.57 | 90.50 | 90.53 |
4 | × | × | √ | 90.52 | 91.38 | 90.95 |
5 | √ | √ | × | 89.46 | 89.31 | 89.38 |
6 | √ | × | √ | 89.51 | 90.36 | 89.93 |
7 | × | √ | √ | 91.34 | 91.26 | 91.30 |
8 | √ | √ | √ | 92.27 | 92.19 | 92.23 |
1 | CHEN D T, BOURLARD H, THIRAN J P. Text identification in complex background using SVM[C]// Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2001: II-621- II-626. 10.1109/cvpr.2001.990916 |
2 | WU V, MANMATHA R, RISEMAN E M. TextFinder: an automatic system to detect and recognize text in images[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(11): 1224-1229. 10.1109/34.809116 |
3 | SRIVASTAV A, KUMAR J. Text detection in scene images using stroke width and nearest-neighbor constraints[C]// Proceedings of the 2008 IEEE Region 10 Conference. Piscataway: IEEE, 2008: 1-5. 10.1109/tencon.2008.4766826 |
4 | MANCAS-THILLOU C, GOSSELIN B. Spatial and color spaces combination for natural scene text extraction[C]// Proceedings of the 2006 International Conference on Image Processing. Piscataway: IEEE, 2006: 985-988. 10.1109/icip.2006.312653 |
5 | 李敏花,柏猛. 基于蚁群优化算法的复杂背景图像文字检测方法[J]. 计算机应用, 2011, 31(7): 1844-1846. |
LI M H, BAI M. Text detection from images with complex background by ant colony optimization algorithm[J]. Journal of Computer Applications, 2011, 31(7): 1844-1846. | |
6 | 王伟强,付立波,高文,等. 基于笔画特征的叠加文字检测方法[J]. 通信学报, 2007, 28(12): 116-120. 10.3321/j.issn:1000-436x.2007.12.019 |
WANG W Q, FU L B, GAO W, et al. Text detection based on stroke features[J]. Journal on Communications, 2007, 28(12): 116-120. 10.3321/j.issn:1000-436x.2007.12.019 | |
7 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2015:91-99. |
8 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multiBox detector[C]// Proceedings of the 2016 European Conference on Computer Vision, LNCS 9905. Cham: Springer, 2016: 21-37. |
9 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
10 | JIANG Y Y, ZHU X Y, WANG X B, et al. R2 CNN: rotational region CNN for orientation robust scene text detection[EB/OL]. (2017-06-30) [2021-12-28].. 10.1109/icpr.2018.8545598 |
11 | MA J Q, SHAO W Y, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11): 3111-3122. 10.1109/tmm.2018.2818020 |
12 | HE P, HUANG W L, HE T, et al. Single shot text detector with regional attention[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 3066-3074. 10.1109/iccv.2017.331 |
13 | LIAO M H, SHI B G, BAI X, et al. TextBoxes: a fast text detector with a single deep neural network[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2017:4161-4167. 10.1609/aaai.v31i1.11196 |
14 | LIAO M H, SHI B G, BAI X. TextBoxes++: a single-shot oriented scene text detector[J]. IEEE Transactions on Image Processing, 2018, 27(8): 3676-3690. 10.1109/tip.2018.2825107 |
15 | SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2550-2558. 10.1109/cvpr.2017.371 |
16 | HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]// Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 2980-2988. 10.1109/iccv.2017.322 |
17 | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL] (2016-06-07) [2021-12-28].. 10.1109/tpami.2017.2699184 |
18 | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. 10.1109/tpami.2017.2699184 |
19 | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL] (2017-12-05) [2021-12-28].. 10.1007/978-3-030-01234-2_49 |
20 | CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]// Proceedings of the 2018 European Conference on Computer Vision, LNCS 11211. Cham: Springer, 2018: 833-851. 10.1007/978-3-030-01234-2_49 |
21 | DENG D, LIU H F, LI X L, et al. PixelLink: detecting scene text via instance segmentation[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2018:6773-6780. 10.1609/aaai.v32i1.12269 |
22 | WANG W H, XIE E Z, LI X, et al. Shape robust text detection with progressive scale expansion network[C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 9328-9337. 10.1109/cvpr.2019.00956 |
23 | LIAO M H, WAN Z Y, YAO C, et al. Real-time scene text detection with differentiable binarization[C]// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Palo Alto, CA: AAAI Press, 2020: 11474-11481. 10.1609/aaai.v34i07.6812 |
24 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY: Curran Associates Inc., 2017:6000-6010. |
25 | LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 936-944. 10.1109/cvpr.2017.106 |
26 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
27 | YU F, CHEN H F, WANG X, et al. BDD100K: a diverse driving video database with scalable annotation tooling[C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 2633-2642. 10.1109/cvpr42600.2020.00271 |
28 | WANG W, XIE E, SONG X, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network [C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 8440-8449. 10.1109/iccv.2019.00853 |
[1] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[2] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[3] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[4] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[5] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[6] | Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380. |
[7] | Yuhan LIU, Genlin JI, Hongping ZHANG. Video pedestrian anomaly detection method based on skeleton graph and mixed attention [J]. Journal of Computer Applications, 2024, 44(8): 2551-2557. |
[8] | Yanjie GU, Yingjun ZHANG, Xiaoqian LIU, Wei ZHOU, Wei SUN. Traffic flow forecasting via spatial-temporal multi-graph fusion [J]. Journal of Computer Applications, 2024, 44(8): 2618-2625. |
[9] | Qianhong SHI, Yan YANG, Yongquan JIANG, Xiaocao OUYANG, Wubo FAN, Qiang CHEN, Tao JIANG, Yuan LI. Multi-granularity abrupt change fitting network for air quality prediction [J]. Journal of Computer Applications, 2024, 44(8): 2643-2650. |
[10] | Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention [J]. Journal of Computer Applications, 2024, 44(7): 2047-2054. |
[11] | Yiqun ZHAO, Zhiyu ZHANG, Xue DONG. Anisotropic travel time computation method based on dense residual connection physical information neural networks [J]. Journal of Computer Applications, 2024, 44(7): 2310-2318. |
[12] | Song XU, Wenbo ZHANG, Yifan WANG. Lightweight video salient object detection network based on spatiotemporal information [J]. Journal of Computer Applications, 2024, 44(7): 2192-2199. |
[13] | Xun SUN, Ruifeng FENG, Yanru CHEN. Monocular 3D object detection method integrating depth and instance segmentation [J]. Journal of Computer Applications, 2024, 44(7): 2208-2215. |
[14] | Zheng WU, Zhiyou CHENG, Zhentian WANG, Chuanjian WANG, Sheng WANG, Hui XU. Deep learning-based classification of head movement amplitude during patient anaesthesia resuscitation [J]. Journal of Computer Applications, 2024, 44(7): 2258-2263. |
[15] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||