计算机应用 ›› 2021, Vol. 41 ›› Issue (10): 2952-2958.DOI: 10.11772/j.issn.1001-9081.2020122037

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

基于编码解码结构的移动端电力线语义分割方法

黄巨挺, 高宏力, 戴志坤   

  1. 西南交通大学 机械工程学院, 成都 610031
  • 收稿日期:2020-12-25 修回日期:2021-03-21 出版日期:2021-10-10 发布日期:2021-07-14
  • 通讯作者: 高宏力
  • 作者简介:黄巨挺(1996-),男,浙江温州人,硕士研究生,主要研究方向:深度学习、机器视觉;高宏力(1971-),男,河南洛阳人,教授,博士,主要研究方向:智能化状态监测、故障诊断;戴志坤(1996-),男,浙江台州人,硕士研究生,主要研究方向:图像处理、图像分割。
  • 基金资助:
    国家自然科学基金资助项目(61801402,51775452)。

Semantic segmentation method of power line on mobile terminals based on encoder-decoder structure

HUANG Juting, GAO Hongli, DAI Zhikun   

  1. School of Mechanical Engineering, Southwest Jiaotong University, Chengdu Sichuan 610031, China
  • Received:2020-12-25 Revised:2021-03-21 Online:2021-10-10 Published:2021-07-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61801402, 51775452).

摘要: 针对传统的视觉算法在复杂场景下检测细长电力线准确率低、受环境因素干扰大,现有基于深度学习的电线检测算法效率不高的问题,提出一种适用于移动端电力线检测的端到端全卷积神经网络模型。首先,采用一种对称的编码-解码结构,其中编码部分使用最大池化层进行下采样,以提取多尺度特征,而解码部分使用最大池化索引的非线性上采样方式逐层融合多尺度特征,以恢复图像细节;其次,针对电线像素与背景像素不平衡的问题,采用了一种加权损失函数来训练模型;最后,构建了一个背景复杂且有像素级标注的电线数据集来训练和评估模型,并重新标注了一个公开电线数据集作为不同源测试集。与现有移动端电线语义分割模型Dilated ConvNet相比,所提模型在移动端设备GPU NVIDIA JetsonTX2上对于512×512分辨率的图片的预测速度提升至Dilated ConvNet的两倍,达到8.2 frame/s所提模型在同源测试集上的平均交并比(mIoU)为0.857 3,F1分数为0.844 7,平均精度(AP)为0.927 9,这三个指标分别提升了0.011、0.014和0.008;所提模型在公开测试集上的mIoU达到0.724 4,F1分数达到0.634 1,AP达到0.664 4,这三个指标分别提升了0.004、0.007和0.032。实验结果表明,该模型具有更好的移动端电力线实时分割性能。

关键词: 深度学习, 卷积神经网络, 语义分割, 无人机, 输电线路

Abstract: The traditional vision algorithms have low accuracy and are greatly affected by environmental factors during the detection of long and slender power lines in complex scenes, and the existing power line detection algorithms based on deep learning are not efficient. In order to solve the problems, an end-to-end fully convolutional neural network model was proposed which was suitable for power line detection on mobile terminals. Firstly, a symmetrical encoder-decoder structure was adopted. In the encoder part, the max-pooling layer was used for down-sampling, so as to extract multi-scale features. In the decoder part, the max-pooling indices based non-linear up-sampling was used to fuse multi-scale features layer by layer to restore the image details. Then, a weighted loss function was adopted to train the model, thereby solving the imbalance problem between power line pixels and background pixels. Finally, a power line dataset with complex background and pixel-level labels was constructed to train and evaluate the model, and a public power line dataset was relabeled as a different source test set. Compared with a model named Dilated ConvNet for power line semantic segmentation on mobile devices, the proposed model has the prediction speed for 512×512 resolution images on the mobile device GPU NVIDIA JetsonTX2 twice that of Dilated ConvNet, which is 8.2 frame/s; the proposed model achieves a mean Intersection over Union (mIoU) of 0.857 3, F1 score of 0.844 7, Average Precision (AP) of 0.927 9 on the same source test set, which are increased by 0.011, 0.014 and 0.008 respectively; and the proposed model achieves mIoU of 0.724 4, F1 score of 0.634 1, AP of 0.664 4 on the public test set, which are increased by 0.004, 0.007 and 0.032 respectively. Experimental results show that the proposed model has better performance of real-time power line segmentation on mobile terminals.

Key words: deep learning, Convolutional Neural Network (CNN), semantic segmentation, Unmanned Aerial Vehicle (UAV), power transmission line

中图分类号: