Book spine segmentation algorithm based on improved DeepLabv3+ network

doi:10.11772/j.issn.1001-9081.2022121887

Abstract

Abstract:

The location of books is one of the critical technologies to realize the intelligent development of libraries， and the accurate book spine segmentation algorithm has become a major challenge to achieve this goal. Based on the above solution， an improved book spine segmentation algorithm based on improved DeepLabv3+ network was proposed， aiming to solve the difficulties in book spine segmentation caused by dense arrangement， skew angles of books， and extremely similar book spine textures. Firstly， to extract more dense pyramid features of book images， the Atrous Spatial Pyramid Pooling （ASPP） in the original DeepLabv3+ network was replaced by the multi-dilation rate and multi-scale DenseASPP （Dense Atrous Spatial Pyramid Pooling） module. Secondly， to solve the problem of insensitivity of the original DeepLabv3+ network to the segmentation boundaries of objects with large aspect ratios， Strip Pooling （SP） module was added to the branch of the DenseASPP module to enhance the strip features of book spines. Finally， based on the Multi-Head Self-Attention （MHSA） mechanism in ViT （Vision Transformer）， a global information enhancement-based self-attention mechanism was proposed to enhance the network’s ability to obtain long-distance features. The proposed algorithm was tested and compared on an open-source database， and the experimental results show that compared with the original DeepLabv3+ network segmentation algorithm， the proposed algorithm improves the Mean Intersection over Union （MIoU） by 1.8 percentage points on the nearly vertical book spine database and by 4.1 percentage points on the skewed book spine database， and the latter MIoU of the proposed algorithm achieves 93.3%. The above confirms that the proposed algorithm achieves accurate segmentation of book spine targets with certain skew angles， dense arrangement， and large aspect ratios.

Key words: book spine segmentation, intelligent library, DeepLabv3+ network, DenseASPP (Dense Atrous Spatial Pyramid Pooling), self-attention mechanism

摘要：

图书定位是实现图书馆智能化发展的重要技术之一，精确的书脊分割算法成为实现该目标的一大难题。基于以上情况，提出改进DeepLabv3+网络的图书书脊分割算法，以解决图书密集排列、图书存在倾斜角度和书脊纹理极相似等情况下的书脊分割难点。首先，为了提取图书图像更密集的金字塔特征，将原始DeepLabv3+网络中的空洞金字塔池化（ASPP）替换为多空洞率、多尺度的DenseASPP （Dense Atrous Spatial Pyramid Pooling）模块；其次，针对原始DeepLabv3+网络对大长宽比的目标对象分割边界不敏感的问题，在DenseASPP模块的支路加入条形池化（SP）模块以增强书脊的长条形特征；最后，参考ViT （Vision Transformer）中的多头自注意（MHSA）机制，提出一种全局信息增强的自注意模块，以增强网络获取长距离特征的能力。将所提算法在开源数据库上进行对比测试，实验结果表明，与原始DeepLabv3+网络分割算法相比，所提算法在近竖直书脊数据库上的平均交并比（MIoU）提升了1.8个百分点；在倾斜书脊数据库上的MIoU提升了4.1个百分点，达到了93.3%。以上验证了所提算法实现了有一定倾斜角度的、密集的和大长宽比的书脊目标的精确分割。

关键词: 书脊分割, 智能图书馆, DeepLabv3+网络, DenseASPP, 自注意机制

CLC Number:

TP391.1

Xiaofei JI, Kexin ZHANG, Lirong TANG. Book spine segmentation algorithm based on improved DeepLabv3+ network[J]. Journal of Computer Applications, 2023, 43(12): 3927-3932.

姬晓飞, 张可心, 唐李荣. 改进DeepLabv3+网络的图书书脊分割算法[J]. 《计算机应用》唯一官方网站, 2023, 43(12): 3927-3932.

Figures/Tables 10

Fig.1 Network structure of the proposed algorithm

Fig.2 Process of strip pooling

Fig.3 Self-attention module

Fig.4 Loss-epoch relationship

Tab.1 Influence of number of network layers of DenseASPP module on segmentation effect

网络层数	MIoU/%	网络层数	MIoU/%
3	79.4	5	91.2
4	88.5	6	89.9

Tab.2 Comparison of experimental results before and after introduction of self-attention module

骨架网络	引入自注意模块	MIoU
Xception	是	92.7
Xception	否	92.2
MobileNetV2	是	93.8
MobileNetV2	否	93.1

Fig.5 Visual comparison of features before and after introduction of self-attention module

Fig.6 Visual comparison of features before and after introduction of strip pooling module

Tab.3 Test results of different network segmentation algorithms on open-source database

数据库	算法	批次	骨架网络	MIoU/%
近竖直书脊数据库	Mask R-CNN算法*	2	ResNet50	87.5
	改进Mask R-CNN算法*	2	ResNet50	85.3
	DeepLabv3+算法*	4	MobileNet V2	92.3
	本文算法	4	MobileNet V2	94.1
倾斜书脊数据库	Mask R-CNN算法*	2	ResNet50	81.3
	改进Mask R-CNN算法*	2	ResNet50	93.5
	DeepLabv3+算法*	4	MobileNet V2	89.2
	本文算法	4	MobileNet V2	93.3

Fig.7 Segmentation effect of different algorithms

References 22

1	TABASSUM N， CHOWDHURY S， HOSSEN M K， et al. An approach to recognize book title from multi-cell bookshelf images ［C］// Proceedings of the 2017 IEEE International Conference on Imaging， Vision & Pattern Recognition. Piscataway： IEEE， 2017：1-6. 10.1109/icivpr.2017.7890886
2	康洪雷，牛连强，冯庸，等.基于视觉的错序在架图书检测系统［J］.软件工程，2018，21（4）：18-22.
	KANG H L， NIU L Q， FENG Y， et al. A vision-based system to detect books with incorrect sequence on shelf ［J］. Software Engineering， 2018， 21（4）：18-22.
3	崔晨，任明武.一种基于文本检测的书脊定位方法［J］.计算机与数字工程，2020，48（1）：178-182，251. 10.3969/j.issn.1672-9722.2020.01.034
	CUI C， REN M W. A spine location method based on text detection ［J］. Computer and Digital Engineering， 2020， 48（1）： 178-182，251. 10.3969/j.issn.1672-9722.2020.01.034
4	NEVETHA M P， BARSKAR A. Automatic book spine extraction and recognition for library inventory management ［C］// Proceedings of the 3rd International Symposium on Women in Computing and Informatics. New York： ACM， 2015：44-48. 10.1145/2791405.2791506
5	UÇKUN F A， ÖZER H， NURBAŞ E， et al. Direction finding using convolutional neural networks and convolutional recurrent neural networks ［C］// Proceedings of the 2020 28th Signal Processing and Communications Applications Conference. Piscataway： IEEE， 2020：1-4. 10.1109/siu49456.2020.9302448
6	CAI W， HU D. QRS complex detection using novel deep learning neural networks ［J］. IEEE Access， 2020， 8： 97082-97089. 10.1109/access.2020.2997473
7	SAXENA N， K B N， RAMAN B. Semantic segmentation of multispectral images using Res-Seg-net model ［C］// Proceedings of the 2020 IEEE 14th International Conference on Semantic Computing. Piscataway： IEEE， 2020：154-157. 10.1109/icsc.2020.00030
8	ZHANG Z， LIU Q， WANG Y. Road extraction by deep residual U-Net ［J］. IEEE Geoscience and Remote Sensing Letters， 2018， 15（5）： 749-753. 10.1109/lgrs.2018.2802944
9	ZHOU Z， SIDDIQUEE M M R， TAJBAKHSH N， et al. UNet++： a nested U-Net architecture for medical image segmentation ［EB/OL］. （2018-07-18）［2022-12-18］. . 10.1007/978-3-030-00889-5_1
10	CAO K， ZHANG X. An improved Res-UNet model for tree species classification using airborne high-resolution images ［J］. Remote Sensing， 2020， 12（7）： 1128. 10.3390/rs12071128
11	CHEN L-C， PAPANDREOU G， KOKKINOS I. Semantic image segmentation with deep convolutional nets and fully connected CRFs ［EB/OL］. （2014-12-22）［2022-12-18］. . 10.1109/tpami.2017.2699184
12	CHEN L-C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）：834-848. 10.1109/tpami.2017.2699184
13	CHEN L-C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation ［EB/OL］. （2017-06-05）［2022-12-18］. . 10.1007/978-3-030-01234-2_49
14	CHEN L-C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation ［EB/OL］. （2018-08-22）［2022-12-18］. . 10.1007/978-3-030-01234-2_49
15	XIE Y， ZHANG J， SHEN C， et al. CoTr： efficiently bridging CNN and Transformer for 3D medical image segmentation ［C］// Proceedings of the 2021 International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham： Springer， 2021： 171-180. 10.1007/978-3-030-87199-4_16
16	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale ［EB/OL］. （2020-10-22）［2022-12-18］. .
17	LIU Z， LIN Y， CAO Y， et al. Swin Transformer： hierarchical vision Transformer using shifted windows ［EB/OL］. （2021-08-17）［2022-12-18］. . 10.1109/iccv48922.2021.00986
18	CHEN J， LU Y， YU Q， et al. TransUNet： Transformers make strong encoders for medical image segmentation ［EB/OL］. （2021-02-08）［2022-12-18］. . 10.48550/arXiv.2102.04306
19	AZAD R， HEIDARI M， SHARIATNIA M， et al. TransDeepLab： convolution-free Transformer-based DeepLabv3+ for medical image segmentation ［EB/OL］. （2022-08-01）［2022-12-18］. . 10.1007/978-3-031-16919-9_9
20	SRINIVAS A， LIN T-Y， PARMAR N， et al. Bottleneck Transformers for visual recognition ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington， DC： IEEE Computer Society， 2021： 16514-16524. 10.1109/cvpr46437.2021.01625
21	曾文雯，杨阳，钟小品.一种用于在架图书书脊语义分割的山字形网络［J］.图像与信号处理， 2020， 9（4）： 218-225. 10.12677/JISP.2020.94026
	ZENG W W， YANG Y， ZHONG X P. A mountain-shaped network for semantic segmentation of books spines on-shelves ［J］. Image and Signal Processing， 2020， 9（4）： 218-225. 10.12677/JISP.2020.94026
22	曾文雯，杨阳，钟小品. 基于改进Mask R-CNN的在架图书书脊图像实例分割方法［J］.计算机应用研究， 2021，38（11）：3456-3459，3505. 10.19734/j.issn.1001-3695.2021.01.0069
	ZENG W W， YANG Y， ZHONG X P. Improved Mask R-CNN based instance segmentation method for spine image of books on shelves ［J］. Application Research of Computers， 2021， 38（11）：3456-3459，3505. 10.19734/j.issn.1001-3695.2021.01.0069

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[3]	Yue LIU, Fang LIU, Aoyun WU, Qiuyue CHAI, Tianxiao WANG. 3D object detection network based on self-attention mechanism and graph convolution [J]. Journal of Computer Applications, 2024, 44(6): 1972-1977.
[4]	Zexin XU, Lei YANG, Kangshun LI. Shorter long-sequence time series forecasting model [J]. Journal of Computer Applications, 2024, 44(6): 1824-1831.
[5]	Rong HUANG, Junjie SONG, Shubo ZHOU, Hao LIU. Image aesthetic quality evaluation method based on self-supervised vision Transformer [J]. Journal of Computer Applications, 2024, 44(4): 1269-1276.
[6]	Xinran LUO, Tianrui LI, Zhen JIA. Chinese medical named entity recognition based on self-attention mechanism and lexicon enhancement [J]. Journal of Computer Applications, 2024, 44(2): 385-392.
[7]	Ziqi HUANG, Jianpeng HU. Entity category enhanced nested named entity recognition in automotive domain [J]. Journal of Computer Applications, 2024, 44(2): 377-384.
[8]	Liqing QIU, Xiaopan SU. Personalized multi-layer interest extraction click-through rate prediction model [J]. Journal of Computer Applications, 2024, 44(11): 3411-3418.
[9]	Xingyao YANG, Hongtao SHEN, Zulian ZHANG, Jiong YU, Jiaying CHEN, Dongxiao WANG. Sequential recommendation based on hierarchical filter and temporal convolution enhanced self-attention network [J]. Journal of Computer Applications, 2024, 44(10): 3090-3096.
[10]	Yanbo LI, Qing HE, Shunyi LU. Aspect sentiment triplet extraction integrating semantic and syntactic information [J]. Journal of Computer Applications, 2024, 44(10): 3275-3280.
[11]	Jia CHEN, Hong ZHANG. Image text retrieval method based on feature enhancement and semantic correlation matching [J]. Journal of Computer Applications, 2024, 44(1): 16-23.
[12]	Hanxiao SHI, Leichun WANG. Short-term power load forecasting by graph convolutional network combining LSTM and self-attention mechanism [J]. Journal of Computer Applications, 2024, 44(1): 311-317.
[13]	Li’an CHEN, Yi GUO. Text sentiment analysis model based on individual bias information [J]. Journal of Computer Applications, 2024, 44(1): 145-151.
[14]	Guolong YUAN, Yujin ZHANG, Yang LIU. Image tampering forensics network based on residual feedback and self-attention [J]. Journal of Computer Applications, 2023, 43(9): 2925-2931.
[15]	Yi ZHANG, Zhenmei WANG. circRNA-disease association prediction by two-stage fusion on graph auto-encoder [J]. Journal of Computer Applications, 2023, 43(6): 1979-1986.