自监督学习HOG预测辅助任务下的车位检测方法

doi:10.11772/j.issn.1001-9081.2022111687

《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (12): 3933-3940.DOI: 10.11772/j.issn.1001-9081.2022111687

• 多媒体计算与计算机仿真 • 上一篇下一篇

自监督学习HOG预测辅助任务下的车位检测方法

刘磊¹, 伍鹏¹(), 谢凯¹^,², 程贝芝¹, 盛冠群³

^1.长江大学电子信息学院, 湖北荆州 434023
^2.长江大学西部研究院, 新疆克拉玛依 834000
^3.三峡大学计算机与信息学院, 湖北宜昌 443002

收稿日期:2022-11-10 修回日期:2023-05-23 接受日期:2023-05-29 发布日期:2023-07-26 出版日期:2023-12-10
通讯作者: 伍鹏
作者简介:刘磊（2002—），男，山东青岛人，主要研究方向：图像处理、人工智能
谢凯（1974—），男，湖北荆州人，教授，博士，主要研究方向：信号与信息处理、图像处理、人工智能
程贝芝（2002—），女，湖北黄冈人，主要研究方向：图像处理、人工智能
盛冠群（1987—），男，山东东营人，副教授，博士，主要研究方向：人工智能、信号与信息处理。
基金资助:
国家自然科学基金资助项目(42204111)

Parking space detection method based on self-supervised learning HOG prediction auxiliary task

Lei LIU¹, Peng WU¹(), Kai XIE¹^,², Beizhi CHENG¹, Guanqun SHENG³

^1.School of Electronic Information，Yangtze University，Jingzhou Hubei 434023，China
^2.Western Research Institute，Yangtze University，Karamay Xinjiang 834000，China
^3.College of Computer and Information Technology，China Three Gorges University，Yichang Hubei 443002，China

Received:2022-11-10 Revised:2023-05-23 Accepted:2023-05-29 Online:2023-07-26 Published:2023-12-10
Contact: Peng WU
About author:LIU Lei， born in 2002. His research interests include image processing，artificial intelligence.
XIE Kai， born in 1974， Ph. D.， professor. His research interests include signal and information processing， image processing， artificial intelligence.
CHENG Beizhi， born in 2002. Her research interests include image processing， artificial intelligence.
SHENG Guanqun， born in 1987， Ph. D.， associate professor. His research interests include artificial intelligence， signal and information processing.
Supported by:
National Natural Science Foundation of China(42204111)

摘要/Abstract

摘要：

针对智能车位管理系统中，光照变化、车位遮挡等因素导致车位预测的精度下降、有效性变差的问题，提出一种自监督学习方向梯度直方图（HOG）预测辅助任务下的车位检测方法。首先，设计预测图像遮挡部分HOG特征的自监督学习辅助任务，利用MobileViTBlock（light-weight， general-purpose， and Mobile-friendly Vision Transformer Block）综合图像全局信息，使模型更充分地学习图像的视觉表征，并提高模型的特征提取能力；其次，改进SE（Squeeze-and-Excitation）注意力机制，使模型在更低的计算开销上达到甚至高于原始SE注意力机制的效果；最后，将辅助任务训练的特征提取部分应用于下游的分类任务进行车位状态预测，在PKLot和CNRPark的混合数据集上进行实验。实验结果表明，所提模型在测试集上的准确率达到了97.49%，相较于RepVGG，遮挡预测准确率提高了5.46个百分点，与其他的车位检测算法相比进步较大。

关键词: 智能停车系统, 自监督学习, 方向梯度直方图, 辅助任务, 车位状态预测

Abstract:

In the intelligent parking space management system， a decrease in accuracy and effectiveness of parking space prediction can be caused by factors such as illumination changes and parking space occlusion. To overcome this problem， a parking space detection method based on self-supervised learning HOG （Histogram of Oriented Gradient） prediction auxiliary task was proposed. Firstly， a self-supervised learning auxiliary task to predict the HOG feature in occluded part of image was designed， the visual representation of the image was learned more fully and the feature extraction ability of the model was improved by using the MobileViTBlock （light-weight， general-purpose， and Mobile-friendly Vision Transformer Block） to synthesize the global information of the image. Then， an improvement was made to the SE （Squeeze-and-Excitation） attention mechanism， thereby enabling the model to achieve or even exceed the effect of the original SE attention mechanism at a lower computational cost. Finally， the feature extraction part trained by the auxiliary task was applied to the downstream classification task for parking space status prediction. Experiments were carried out on the mixed dataset of PKLot and CNRPark. The experimental results show that the proposed model has the accuracy reached 97.49% on the test set； compared to RepVGG， the accuracy of occlusion prediction improves by 5.46 percentage points， which represents a great improvement compared with other parking space detection algorithms.

Key words: intelligent parking system, self-supervised learning, Histogram of Oriented Gradient (HOG), auxiliary task, parking space status prediction

中图分类号:

TP389.1

刘磊, 伍鹏, 谢凯, 程贝芝, 盛冠群. 自监督学习HOG预测辅助任务下的车位检测方法[J]. 计算机应用, 2023, 43(12): 3933-3940.

Lei LIU, Peng WU, Kai XIE, Beizhi CHENG, Guanqun SHENG. Parking space detection method based on self-supervised learning HOG prediction auxiliary task[J]. Journal of Computer Applications, 2023, 43(12): 3933-3940.

图/表 17

图1 两种条件下的车位示意图

Fig.1 Schematic diagrams ofparking spaces under two conditions

图2 车位样本示意图

Fig.2 Schematic diagrams ofparking space samples

图3 特征提取部分的Block结构示意图

Fig.3 Schematic diagrams ofblock structures in feature extraction part

图4 CSE注意力机制的结构

Fig.4 Structure of CSE attention mechanism

图5 CSE与SE的效果对比

Fig.5 Comparison of CSE and SE renderings

图6 辅助任务的整体流程

Fig.6 Overall flowchart of auxiliary task

图7 推理过程的网络结构

Fig.7 Network structure of reasoning process

图8 注意力可视化结果

Fig.8 Attention visualization results

表1 计算机软硬件配置

Tab. 1 Computer hardware and software configuration

配置项	详情
CPU	AMD Ryzen 7 5800H with Radeon Graphics
内存	16 GB
GPU	NVIDIA GeForce RTX 3060 Laptop GPU
操作系统	Windows 10
软件	Python 3.8，OpenCV，PyTorch 1.11.0+cu115

图9 Light-Change数据集示例

Fig.9 Samples of Light-Change dataset

表2 两种模型在C1和C2条件下的检测性能对比 (%)

Tab. 2 Comparison of detection performance of two models under C1 and C2 conditions

模型	C1		C2
模型	准确率	F1分数	准确率	F1分数
模型1	97.54	97.44	97.99	98.16
模型2	99.24	99.22	98.09	98.23

表3 C1和C2条件下两种注意力机制应用在本文模型上的效果对比

Tab.3 Comparison of effects of two attention mechanisms applying on proposed model under C1 and C2 conditions

注意力	测试方法	准确率/%	F1分数/%	参数量/10³	计算量/10³
SE	C1	98.47	97.58	412.672	431 447
SE	C2	97.72	97.91	412.672	431 447
CSE	C1	99.24	99.22	404.382	431 442
CSE	C2	98.09	98.23	404.382	431 442

表4 不同模型在C1和C2条件下的检测性能对比 (%)

Tab.4 Comparison of detection performance of different models under C1 and C2 conditions

测试方法	模型	准确率	精确度	召回率	F1分数
C1	VGG16	96.35	99.56	92.93	96.12
	ResNet18	96.08	99.80	92.15	95.82
	MCNN	96.33	98.72	93.63	96.11
	RepVGG	97.63	98.64	96.46	97.53
	mAlexNet	96.53	99.07	94.34	96.64
	本文模型	99.24	99.67	98.78	99.22
C2	VGG16	97.21	98.29	96.55	97.41
	ResNet18	97.81	99.28	96.67	97.95
	MCNN	97.76	98.59	96.68	97.62
	RepVGG	97.82	98.15	97.82	97.98
	mAlexNet	97.29	97.60	97.21	97.40
	本文模型	98.09	98.87	97.60	98.23

表5 测试集上不同模型在不同遮挡条件下的准确率对比 (%)

Tab.5 Accuracy comparison of different models underdifferent occlusion conditions on test set

模型	准确率	轻度遮挡	中度遮挡	重度遮挡
MCNN	89.93	93.24	82.33	75.15
mAlexNet	88.05	90.43	84.53	77.41
RepVGG	92.03	95.63	89.58	84.48
本文模型	97.49	97.96	92.28	88.63

表6 不同模型在LC数据集上的检测性能对比 (%)

Tab.6 Comparison of detection performance of different models on LC dataset

模型	准确率	精确度	召回率	F1分数
MCNN	75.03	75.49	80.95	78.12
RepVGG	79.83	81.16	83.65	82.38
mAlexNet	78.95	78.91	83.37	81.18
本文模型	82.54	82.19	87.26	84.64

表7 不同模型的计算量、参数量和推理时间对比

Tab.7 Comparisons of computational cost， parameter number and reasoning time of different models

模型	计算量/10⁶	参数量/10⁴	推理时间/ms
VGG16	15 483.86	13 836	6.49
ResNet18	1 819.07	1 169	2.90
MCNN	24.40	6	1.84
RepVGG	1 362.03	703	2.69
mAlexNet	21.22	3	1.27
本文模型	431.44	40	3.96

图10 不同模型对停车场中某一监控摄像头的检测结果

Fig.10 Detection results of a surveillance camera in parking lot by different models

参考文献 26

1	DE ALMEIDA P R L， OLIVEIRA L S， BRITTO A C， Jr， et al. PKLot — a robust dataset for parking lot classification［J］. Expert Systems with Applications， 2015， 42（11）：4937-4949. 10.1016/j.eswa.2015.02.009
2	黄伟杰，张希，赵柏暄，等. 基于视觉的停车场车位检测与分类算法［J］. 计算机系统应用， 2022， 31（3）：234-240.
	HUANG W J， ZHANG X， ZHAO B X， et al. Vision-based parking space detection and classification algorithm［J］. Computer Systems and Applications， 2022， 31（3）：234-240.
3	JERMSURAWONG J， AHSAN U， HAIDAR A， et al. One-day long statistical analysis of parking demand by using single-camera vacancy detection［J］. Journal of Transportation Systems Engineering and Information Technology， 2013， 14（2）： 33-44. 10.1016/s1570-6672(13)60136-1
4	安旭骁，邓洪敏，史兴宇. 基于迷你卷积神经网络的停车场空车位检测方法［J］. 计算机应用， 2018， 38（4）： 935-938. 10.11772/j.issn.1001-9081.2017092362
	AN X X， DENG H M， SHI X Y. Parking lot space detection method based on mini convolutional neural network［J］. Journal of Computer Applications， 2018， 38（4）： 935-938. 10.11772/j.issn.1001-9081.2017092362
5	AMATO G， CARRARA F， FALCHI F， et al. Deep learning for decentralized parking lot occupancy detection ［J］. Expert Systems with Applications， 2017， 72： 327-334. 10.1016/j.eswa.2016.10.055
6	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks［C］// Proceedings of the 25th International Conference on Neural Information Processing Systems — Volume 1. Red Hook， NY： Curran Associates Inc.， 2012： 1097-1105.
7	申铉京，刘同壮，王玉，等. 基于卷积网络结构重参数化的车位状态检测算法［J］. 吉林大学学报（工学版）， 2022， 52（12）： 2898-2905.
	SHEN X J， LIU T Z， WANG Y， et al. Detection algorithm for parking space status based on of convolution network structural re-parameterization［J］. Journal of Jilin University （Engineering and Technology Edition）， 2022， 52（12）： 2898-2905.
8	DING X， ZHANG X， MA N， et al. RepVGG： making VGG-style ConvNets great again［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 13728-13737. 10.1109/cvpr46437.2021.01352
9	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. （2015-04-10）［2021-04-20］..
10	VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need ［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook， NY： Curran Associates Inc.， 2017： 6000-6010.
11	DALAL N， TRIGGS B. Histograms of oriented gradients for human detection［C］// Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition — Volume 1. Piscataway： IEEE， 2005： 886-893. 10.1109/cvpr.2005.4
12	HU J， SHEN L， SUN G. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 7132-7141. 10.1109/cvpr.2018.00745
13	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
14	HOWARD A G， ZHU M， CHEN B， et al. MobileNets： efficient convolutional neural networks for mobile vision applications ［EB/OL］. （2017-04-17）［2021-06-05］.. 10.48550/arXiv.1704.04861
15	SANDLER M， HOEARD A， ZHU M， et al. Inverted residuals and linear bottlenecks： mobile networks for classification， detection and segmentation［EB/OL］. （2018-01-13）［2021-06-20］. . 10.1109/cvpr.2018.00474
16	HOWARD A， SANDLER M， CHEN B， et al. Searching for MobileNetV3［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 1314-1324. 10.1109/iccv.2019.00140
17	SZEGEDY C， VANHOUCKE V， IOFFE S， et al. Rethinking the inception architecture for computer vision ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016：2818-2826. 10.1109/cvpr.2016.308
18	HE K， FAN H， WU Y， et al. Momentum contrast for unsupervised visual representation learning ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 9726-9735. 10.1109/cvpr42600.2020.00975
19	CHEN T， KORNBLITH S， NOROUZI M， et al. A simple framework for contrastive learning of visual representations ［C］// Proceedings of the 37th International Conference on Machine Learning. New York： JMLR.org， 2020： 1597-1607.
20	DEVLIN J， CHANG M W， LEE K， et al. BERT： pre-training of deep bidirectional transformers for language understanding ［C］// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies （Volume 1： Long and Short Papers）. Stroudsburg， PA， ACL， 2019： 4171-4186. 10.18653/v1/n18-2
21	HE K， CHEN X， XIE S， et al. Masked autoencoders are scalable vision learners［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 15979-15988. 10.1109/cvpr52688.2022.01553
22	WEI C， FAN H， XIE S， et al. Masked feature prediction for self-supervised visual pre-training ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 14648-14658. 10.1109/cvpr52688.2022.01426
23	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［EB/OL］. （2021-06-03）［2022-03-21］..
24	MEHTA S， RASTEGARI M. MobileViT： light-weight， general-purpose， and mobile-friendly vision transformer ［EB/OL］. （2022-03-04）［2022-05-16］.. 10.1109/cvpr.2019.00941
25	TAN M， LE Q. EfficientNetV2： smaller models and faster training［C］// Proceedings of the 38th International Conference on Machine Learning. New York： JMLR.org， 2021： 10096-10106.
26	CUI C， GAO T， WEI S， et al. PP-LCNet： a lightweight CPU convolutional neural network［EB/OL］. （2021-09-17）［2022-05-12］..

[1]	马胜位, 黄瑞章, 任丽娜, 林川. 基于多层语义融合的结构化深度文本聚类模型[J]. 《计算机应用》唯一官方网站, 2023, 43(8): 2364-2369.
[2]	柏财通, 崔翛龙, 郑会吉, 李爱. 基于自监督知识迁移的鲁棒性语音识别技术[J]. 《计算机应用》唯一官方网站, 2022, 42(10): 3217-3223.
[3]	代雨柔, 杨庆, 张凤荔, 周帆. 基于自监督学习的社交网络用户轨迹预测模型[J]. 计算机应用, 2021, 41(9): 2545-2551.
[4]	吴崇数, 林霖, 薛蕴菁, 时鹏. 基于自监督学习的病理图像层次分割[J]. 计算机应用, 2020, 40(6): 1856-1862.
[5]	陈立潮, 张雷, 曹建芳, 张睿. 梯度直方图卷积特征的胶囊网络在交通监控下的车型分类[J]. 计算机应用, 2020, 40(10): 2881-2889.
[6]	姬晓飞, 左鑫孟. 基于关键帧特征库统计特征的双人交互行为识别[J]. 计算机应用, 2016, 36(8): 2287-2291.
[7]	徐海宁, 陈恩庆, 梁成武. 三维动作识别时空特征提取方法[J]. 计算机应用, 2016, 36(2): 568-573.
[8]	朱苏阳, 惠浩添, 钱龙华, 张民. 基于自监督学习的维基百科家庭关系抽取[J]. 计算机应用, 2015, 35(4): 1013-1016.
[9]	曾伟朱桂斌陈杰唐丁丁. 多特征融合的鲁棒粒子滤波跟踪算法[J]. 计算机应用, 2010, 30(3): 643-645.

自监督学习HOG预测辅助任务下的车位检测方法

Parking space detection method based on self-supervised learning HOG prediction auxiliary task

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 26

相关文章 9

编辑推荐

Metrics