改进DeepLabv3+网络的图书书脊分割算法

doi:10.11772/j.issn.1001-9081.2022121887

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3927-3932.DOI: 10.11772/j.issn.1001-9081.2022121887

• 多媒体计算与计算机仿真 • 上一篇下一篇

改进DeepLabv3+网络的图书书脊分割算法

姬晓飞(), 张可心, 唐李荣

沈阳航空航天大学自动化学院，沈阳 110136

收稿日期:2022-12-22 修回日期:2023-03-21 接受日期:2023-03-22 发布日期:2023-04-03 出版日期:2023-12-10
通讯作者: 姬晓飞
作者简介:张可心（1996—），女，辽宁锦州人，硕士研究生，主要研究方向：图像处理、视频分析与处理；
唐李荣（2000—），男，四川南充人，硕士研究生，主要研究方向：图像处理、视频分析与处理。
基金资助:
辽宁省教育厅重点攻关项目(LJKZZ20220033)

Book spine segmentation algorithm based on improved DeepLabv3+ network

Xiaofei JI(), Kexin ZHANG, Lirong TANG

School of Automation，Shenyang Aerospace University，Shenyang Liaoning 110136，China

Received:2022-12-22 Revised:2023-03-21 Accepted:2023-03-22 Online:2023-04-03 Published:2023-12-10
Contact: Xiaofei JI
About author:ZHANG Kexin， born in 1996， M. S. candidate. Her research interests include image processing， video analysis and processing.
TANG Lirong， born in 2000， M. S. candidate. His research interests include image processing， video analysis and processing.
Supported by:
Key Projects of Liaoning Provincial Department of Education(LJKZZ20220033)

摘要/Abstract

摘要：

图书定位是实现图书馆智能化发展的重要技术之一，精确的书脊分割算法成为实现该目标的一大难题。基于以上情况，提出改进DeepLabv3+网络的图书书脊分割算法，以解决图书密集排列、图书存在倾斜角度和书脊纹理极相似等情况下的书脊分割难点。首先，为了提取图书图像更密集的金字塔特征，将原始DeepLabv3+网络中的空洞金字塔池化（ASPP）替换为多空洞率、多尺度的DenseASPP （Dense Atrous Spatial Pyramid Pooling）模块；其次，针对原始DeepLabv3+网络对大长宽比的目标对象分割边界不敏感的问题，在DenseASPP模块的支路加入条形池化（SP）模块以增强书脊的长条形特征；最后，参考ViT （Vision Transformer）中的多头自注意（MHSA）机制，提出一种全局信息增强的自注意模块，以增强网络获取长距离特征的能力。将所提算法在开源数据库上进行对比测试，实验结果表明，与原始DeepLabv3+网络分割算法相比，所提算法在近竖直书脊数据库上的平均交并比（MIoU）提升了1.8个百分点；在倾斜书脊数据库上的MIoU提升了4.1个百分点，达到了93.3%。以上验证了所提算法实现了有一定倾斜角度的、密集的和大长宽比的书脊目标的精确分割。

关键词: 书脊分割, 智能图书馆, DeepLabv3+网络, DenseASPP, 自注意机制

Abstract:

The location of books is one of the critical technologies to realize the intelligent development of libraries， and the accurate book spine segmentation algorithm has become a major challenge to achieve this goal. Based on the above solution， an improved book spine segmentation algorithm based on improved DeepLabv3+ network was proposed， aiming to solve the difficulties in book spine segmentation caused by dense arrangement， skew angles of books， and extremely similar book spine textures. Firstly， to extract more dense pyramid features of book images， the Atrous Spatial Pyramid Pooling （ASPP） in the original DeepLabv3+ network was replaced by the multi-dilation rate and multi-scale DenseASPP （Dense Atrous Spatial Pyramid Pooling） module. Secondly， to solve the problem of insensitivity of the original DeepLabv3+ network to the segmentation boundaries of objects with large aspect ratios， Strip Pooling （SP） module was added to the branch of the DenseASPP module to enhance the strip features of book spines. Finally， based on the Multi-Head Self-Attention （MHSA） mechanism in ViT （Vision Transformer）， a global information enhancement-based self-attention mechanism was proposed to enhance the network’s ability to obtain long-distance features. The proposed algorithm was tested and compared on an open-source database， and the experimental results show that compared with the original DeepLabv3+ network segmentation algorithm， the proposed algorithm improves the Mean Intersection over Union （MIoU） by 1.8 percentage points on the nearly vertical book spine database and by 4.1 percentage points on the skewed book spine database， and the latter MIoU of the proposed algorithm achieves 93.3%. The above confirms that the proposed algorithm achieves accurate segmentation of book spine targets with certain skew angles， dense arrangement， and large aspect ratios.

Key words: book spine segmentation, intelligent library, DeepLabv3+ network, DenseASPP (Dense Atrous Spatial Pyramid Pooling), self-attention mechanism

中图分类号:

TP391.1

姬晓飞, 张可心, 唐李荣. 改进DeepLabv3+网络的图书书脊分割算法[J]. 计算机应用, 2023, 43(12): 3927-3932.

Xiaofei JI, Kexin ZHANG, Lirong TANG. Book spine segmentation algorithm based on improved DeepLabv3+ network[J]. Journal of Computer Applications, 2023, 43(12): 3927-3932.

图/表 10

图1 本文算法的网络结构

Fig.1 Network structure of the proposed algorithm

图2 条形池化过程

Fig.2 Process of strip pooling

图3 自注意模块

Fig.3 Self-attention module

图4 损失-训练次数关系

Fig.4 Loss-epoch relationship

表1 DenseASPP模块的网络层数对分割效果的影响

Tab.1 Influence of number of network layers of DenseASPP module on segmentation effect

网络层数	MIoU/%	网络层数	MIoU/%
3	79.4	5	91.2
4	88.5	6	89.9

表2 引入自注意模块前后的实验结果对比 (%)

Tab.2 Comparison of experimental results before and after introduction of self-attention module

骨架网络	引入自注意模块	MIoU
Xception	是	92.7
Xception	否	92.2
MobileNetV2	是	93.8
MobileNetV2	否	93.1

图5 加入自注意模块前后的特征可视化对比

Fig.5 Visual comparison of features before and after introduction of self-attention module

图6 加入条形池化模块前后的特征可视化对比

Fig.6 Visual comparison of features before and after introduction of strip pooling module

表3 不同网络分割算法在开源数据库上的测试结果

Tab.3 Test results of different network segmentation algorithms on open-source database

数据库	算法	批次	骨架网络	MIoU/%
近竖直书脊数据库	Mask R-CNN算法*	2	ResNet50	87.5
	改进Mask R-CNN算法*	2	ResNet50	85.3
	DeepLabv3+算法*	4	MobileNet V2	92.3
	本文算法	4	MobileNet V2	94.1
倾斜书脊数据库	Mask R-CNN算法*	2	ResNet50	81.3
	改进Mask R-CNN算法*	2	ResNet50	93.5
	DeepLabv3+算法*	4	MobileNet V2	89.2
	本文算法	4	MobileNet V2	93.3

图7 不同算法的分割效果

Fig.7 Segmentation effect of different algorithms

参考文献 22

1	TABASSUM N， CHOWDHURY S， HOSSEN M K， et al. An approach to recognize book title from multi-cell bookshelf images ［C］// Proceedings of the 2017 IEEE International Conference on Imaging， Vision & Pattern Recognition. Piscataway： IEEE， 2017：1-6. 10.1109/icivpr.2017.7890886
2	康洪雷，牛连强，冯庸，等.基于视觉的错序在架图书检测系统［J］.软件工程，2018，21（4）：18-22.
	KANG H L， NIU L Q， FENG Y， et al. A vision-based system to detect books with incorrect sequence on shelf ［J］. Software Engineering， 2018， 21（4）：18-22.
3	崔晨，任明武.一种基于文本检测的书脊定位方法［J］.计算机与数字工程，2020，48（1）：178-182，251. 10.3969/j.issn.1672-9722.2020.01.034
	CUI C， REN M W. A spine location method based on text detection ［J］. Computer and Digital Engineering， 2020， 48（1）： 178-182，251. 10.3969/j.issn.1672-9722.2020.01.034
4	NEVETHA M P， BARSKAR A. Automatic book spine extraction and recognition for library inventory management ［C］// Proceedings of the 3rd International Symposium on Women in Computing and Informatics. New York： ACM， 2015：44-48. 10.1145/2791405.2791506
5	UÇKUN F A， ÖZER H， NURBAŞ E， et al. Direction finding using convolutional neural networks and convolutional recurrent neural networks ［C］// Proceedings of the 2020 28th Signal Processing and Communications Applications Conference. Piscataway： IEEE， 2020：1-4. 10.1109/siu49456.2020.9302448
6	CAI W， HU D. QRS complex detection using novel deep learning neural networks ［J］. IEEE Access， 2020， 8： 97082-97089. 10.1109/access.2020.2997473
7	SAXENA N， K B N， RAMAN B. Semantic segmentation of multispectral images using Res-Seg-net model ［C］// Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing. Piscataway： IEEE， 2020：154-157. 10.1109/icsc.2020.00030
8	ZHANG Z， LIU Q， WANG Y. Road extraction by deep residual U-Net ［J］. IEEE Geoscience and Remote Sensing Letters， 2018， 15（5）： 749-753. 10.1109/lgrs.2018.2802944
9	ZHOU Z， SIDDIQUEE M M R， TAJBAKHSH N， et al. UNet++： a nested U-Net architecture for medical image segmentation ［EB/OL］. （2018-07-18）［2022-12-18］. . 10.1007/978-3-030-00889-5_1
10	CAO K， ZHANG X. An improved Res-UNet model for tree species classification using airborne high-resolution images ［J］. Remote Sensing， 2020， 12（7）： 1128. 10.3390/rs12071128
11	CHEN L-C， PAPANDREOU G， KOKKINOS I. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2022-12-18］. . 10.1109/tpami.2017.2699184
12	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）：834-848. 10.1109/tpami.2017.2699184
13	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. （2017-06-05）［2022-12-18］. . 10.1007/978-3-030-01234-2_49
14	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation ［EB/OL］. （2018-08-22）［2022-12-18］. . 10.1007/978-3-030-01234-2_49
15	XIE Y， ZHANG J， SHEN C， et al. CoTr： efficiently bridging CNN and Transformer for 3D medical image segmentation ［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham： Springer， 2021： 171-180. 10.1007/978-3-030-87199-4_16
16	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale ［EB/OL］. （2020-10-22）［2022-12-18］. .
17	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows ［EB/OL］. （2021-08-17）［2022-12-18］. . 10.1109/iccv48922.2021.00986
18	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation ［EB/OL］. （2021-02-08）［2022-12-18］. . 10.48550/arXiv.2102.04306
19	AZAD R， HEIDARI M， SHARIATNIA M， et al. TransDeepLab： convolution-free Transformer-based DeepLabv3+ for medical image segmentation ［EB/OL］. （2022-08-01）［2022-12-18］. . 10.1007/978-3-031-16919-9_9
20	SRINIVAS A， LIN T-Y， PARMAR N， et al. Bottleneck Transformers for visual recognition ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2021： 16514-16524. 10.1109/cvpr46437.2021.01625
21	曾文雯，杨阳，钟小品.一种用于在架图书书脊语义分割的山字形网络［J］.图像与信号处理， 2020， 9（4）： 218-225. 10.12677/JISP.2020.94026
	ZENG W W， YANG Y， ZHONG X P. A mountain-shaped network for semantic segmentation of books spines on-shelves ［J］. Image and Signal Processing， 2020， 9（4）： 218-225. 10.12677/JISP.2020.94026
22	曾文雯，杨阳，钟小品. 基于改进Mask R-CNN的在架图书书脊图像实例分割方法［J］.计算机应用研究， 2021，38（11）：3456-3459，3505. 10.19734/j.issn.1001-3695.2021.01.0069
	ZENG W W， YANG Y， ZHONG X P. Improved Mask R-CNN based instance segmentation method for spine image of books on shelves ［J］. Application Research of Computers， 2021， 38（11）：3456-3459，3505. 10.19734/j.issn.1001-3695.2021.01.0069

[1]	张庆杨凡方宇涵. 基于多模态信息融合的中文拼写纠错算法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[2]	高颖杰, 林民, 斯日古楞null, 李斌, 张树钧. 基于片段抽取原型网络的古籍文本断句标点提示学习方法[J]. 《计算机应用》唯一官方网站, 2024, 44(12): 3815-3822.
[3]	王猛张大千周冰艳马倩影吕继东. 基于时序知识图谱补全的CTCS-3级列控车载接口设备故障诊断方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[4]	杨青朱焱. 改进语言规则中的表示的隐喻识别技术[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[5]	余婧陈艳平扈应黄瑞章秦永彬. 结合实体边界偏移的序列标注优化方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[6]	张伟牛家祥马继超沈琼霞. 深层语义特征增强的ReLM中文拼写纠错模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[7]	徐章杰陈艳平扈应黄瑞章秦永彬. 联合边界生成的多目标学习嵌套命名实体识别[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[8]	代震龙韩萌杨文艳朱诗能杨书蓉. 序列模式挖掘综述[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[9]	徐乐黄瑞章白瑞娜秦永彬. 基于意图正则化的深度半监督文本聚类[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[10]	彭一峰朱焱. 结合预处理方法和对抗学习的公平链接预测[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[11]	赵彪秦玉华田荣坤胡月航陈芳锐. 依赖类型及距离增强的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[12]	任登燃王淑营. 基于差分边界增强的风电装备嵌套实体识别模型[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[13]	田海燕黄赛豪张栋李寿山. 视觉指导的分词和词性标注[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[14]	帅健王中卿陈嘉沥. 基于代码生成的细粒度情感分析方法[J]. 《计算机应用》唯一官方网站, 0, (): 0-0.
[15]	姜雨杉, 张仰森. 大语言模型驱动的立场感知事实核查[J]. 《计算机应用》唯一官方网站, 2024, 44(10): 3067-3073.

改进DeepLabv3+网络的图书书脊分割算法

Book spine segmentation algorithm based on improved DeepLabv3+ network

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 22

相关文章 15

编辑推荐

Metrics