《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3927-3932.DOI: 10.11772/j.issn.1001-9081.2022121887

• 多媒体计算与计算机仿真 • 上一篇    下一篇

改进DeepLabv3+网络的图书书脊分割算法

姬晓飞(), 张可心, 唐李荣   

  1. 沈阳航空航天大学 自动化学院,沈阳 110136
  • 收稿日期:2022-12-22 修回日期:2023-03-21 接受日期:2023-03-22 发布日期:2023-04-03 出版日期:2023-12-10
  • 通讯作者: 姬晓飞
  • 作者简介:张可心(1996—),女,辽宁锦州人,硕士研究生,主要研究方向:图像处理、视频分析与处理;
    唐李荣(2000—),男,四川南充人,硕士研究生,主要研究方向:图像处理、视频分析与处理。
  • 基金资助:
    辽宁省教育厅重点攻关项目(LJKZZ20220033)

Book spine segmentation algorithm based on improved DeepLabv3+ network

Xiaofei JI(), Kexin ZHANG, Lirong TANG   

  1. School of Automation,Shenyang Aerospace University,Shenyang Liaoning 110136,China
  • Received:2022-12-22 Revised:2023-03-21 Accepted:2023-03-22 Online:2023-04-03 Published:2023-12-10
  • Contact: Xiaofei JI
  • About author:ZHANG Kexin, born in 1996, M. S. candidate. Her research interests include image processing, video analysis and processing.
    TANG Lirong, born in 2000, M. S. candidate. His research interests include image processing, video analysis and processing.
  • Supported by:
    Key Projects of Liaoning Provincial Department of Education(LJKZZ20220033)

摘要:

图书定位是实现图书馆智能化发展的重要技术之一,精确的书脊分割算法成为实现该目标的一大难题。基于以上情况,提出改进DeepLabv3+网络的图书书脊分割算法,以解决图书密集排列、图书存在倾斜角度和书脊纹理极相似等情况下的书脊分割难点。首先,为了提取图书图像更密集的金字塔特征,将原始DeepLabv3+网络中的空洞金字塔池化(ASPP)替换为多空洞率、多尺度的DenseASPP (Dense Atrous Spatial Pyramid Pooling)模块;其次,针对原始DeepLabv3+网络对大长宽比的目标对象分割边界不敏感的问题,在DenseASPP模块的支路加入条形池化(SP)模块以增强书脊的长条形特征;最后,参考ViT (Vision Transformer)中的多头自注意(MHSA)机制,提出一种全局信息增强的自注意模块,以增强网络获取长距离特征的能力。将所提算法在开源数据库上进行对比测试,实验结果表明,与原始DeepLabv3+网络分割算法相比,所提算法在近竖直书脊数据库上的平均交并比(MIoU)提升了1.8个百分点;在倾斜书脊数据库上的MIoU提升了4.1个百分点,达到了93.3%。以上验证了所提算法实现了有一定倾斜角度的、密集的和大长宽比的书脊目标的精确分割。

关键词: 书脊分割, 智能图书馆, DeepLabv3+网络, DenseASPP, 自注意机制

Abstract:

The location of books is one of the critical technologies to realize the intelligent development of libraries, and the accurate book spine segmentation algorithm has become a major challenge to achieve this goal. Based on the above solution, an improved book spine segmentation algorithm based on improved DeepLabv3+ network was proposed, aiming to solve the difficulties in book spine segmentation caused by dense arrangement, skew angles of books, and extremely similar book spine textures. Firstly, to extract more dense pyramid features of book images, the Atrous Spatial Pyramid Pooling (ASPP) in the original DeepLabv3+ network was replaced by the multi-dilation rate and multi-scale DenseASPP (Dense Atrous Spatial Pyramid Pooling) module. Secondly, to solve the problem of insensitivity of the original DeepLabv3+ network to the segmentation boundaries of objects with large aspect ratios, Strip Pooling (SP) module was added to the branch of the DenseASPP module to enhance the strip features of book spines. Finally, based on the Multi-Head Self-Attention (MHSA) mechanism in ViT (Vision Transformer), a global information enhancement-based self-attention mechanism was proposed to enhance the network’s ability to obtain long-distance features. The proposed algorithm was tested and compared on an open-source database, and the experimental results show that compared with the original DeepLabv3+ network segmentation algorithm, the proposed algorithm improves the Mean Intersection over Union (MIoU) by 1.8 percentage points on the nearly vertical book spine database and by 4.1 percentage points on the skewed book spine database, and the latter MIoU of the proposed algorithm achieves 93.3%. The above confirms that the proposed algorithm achieves accurate segmentation of book spine targets with certain skew angles, dense arrangement, and large aspect ratios.

Key words: book spine segmentation, intelligent library, DeepLabv3+ network, DenseASPP (Dense Atrous Spatial Pyramid Pooling), self-attention mechanism

中图分类号: