《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (S1): 81-87.DOI: 10.11772/j.issn.1001-9081.2022081138

• 人工智能 • 上一篇    下一篇

基于可微分架构搜索的端到端场景文字检测及识别算法

刘嘉艺1,2, 曹冬平1,2, 钟勇1,2()   

  1. 1.中国科学院 成都计算机应用研究所,成都 610041
    2.中国科学院大学,北京100049
  • 收稿日期:2022-08-22 修回日期:2022-10-26 接受日期:2022-11-14 发布日期:2023-07-04 出版日期:2023-06-30
  • 通讯作者: 钟勇
  • 作者简介:刘嘉艺(1996—),男,四川内江人,硕士研究生,主要研究方向:人工智能、计算机视觉
    曹冬平(1992—),男,四川成都人,博士,主要研究方向:图像处理、模式识别
    钟勇(1966—),男,四川岳池人,研究员,博士,CCF会员,主要研究方向:大数据、人工智能、软件过程技术与方法。zhongyong@casit.com.cn
  • 基金资助:
    四川省科技成果转化计划项目(2020ZHZY0002)

End-to-end scene character detection and recognition algorithm based on differentiable architecture search

Jiayi LIU1,2, Dongping CAO1,2, Yong ZHONG1,2()   

  1. 1.Chengdu Institute of Computer Applications,Chinese Academy of Sciences,Chengdu Sichuan 610041,China
    2.University of Chinese Academy of Sciences,Beijing 100049,China
  • Received:2022-08-22 Revised:2022-10-26 Accepted:2022-11-14 Online:2023-07-04 Published:2023-06-30
  • Contact: Yong ZHONG

摘要:

在自然场景文字检测和识别任务中,现有大多数方法的文字检测和文字识别过程相对独立,导致这些方法处理速度较慢;此外,这些方法的训练和推理过程较为复杂,并且手工设计合理的架构比较困难。针对以上这些问题,基于可微分架构搜索方法提出了多分支自动选择网络(MBASNet),该网络由数个多分支自动选择块(MBASB)组成。MBASB能在不显著增加计算量的情况下通过自动搜索检测和识别性能较优的子分支结构,组合多个MBASB得到整个检测和识别网络。所提出的MBASNet可以同时训练检测子网络和识别子网络,降低文字检测和识别任务中网络的训练和推理难度,提高对文字的检测和识别速度。MBASNet在ICDAR2013数据集上取得了89.4%的精确率和91.4%的召回率,在ICDAR15数据集上取得了80.5%的精确率和86.8%的召回率,并且计算速度达到了每秒68帧。

关键词: 深度学习, 卷积神经网络, 文本检测, 文字识别, 可微分架构搜索

Abstract:

When most existing methods are used for scene character detection and recognition, the processes of character detection and recognition are relatively independent, which leads to the problem slow processing speed; in addition, the training and inference processes are relatively complex, and it is difficult to design a reasonable architecture manually. To solve these problems, a Multi-Branch Automatic Selection Network (MBASNet) was proposed based on the differentiable architecture search method, which consisted of several Multi-Branch Automatic Selection Blocks (MBASBs). The MBASB could automatically search the subbranch structure with better performance, and the subnetwork did not significantly increase the computational cost. Multiple MBASBs were combined to obtain the whole detection and recognition network. The proposed MBASNet could train the detection and the recognition subnetworks at the same time, which reduced the difficulty of network training and inference in character detection and recognition tasks, meanwhile, it improved the detection and recognition speed. The proposed MBASNet achieved 89.4% precision and 91.4% recall on the ICDAR2013 dataset, 80.5% precision and 86.8% recall on the ICDAR15 dataset, and the computational speed reached 68 Frames Per Second (FPS).

Key words: deep learning, Convolutional Neural Network (CNN), text detection, character recognition, differentiable architecture search

中图分类号: