计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2720-2725.DOI: 10.11772/j.issn.1001-9081.2020111815

所属专题: 多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇    下一篇

结合目标检测的室内场景识别方法

徐江浪1, 李林燕2, 万新军1, 胡伏原1   

  1. 1. 苏州科技大学 电子与信息工程学院, 江苏 苏州 215009;
    2. 苏州经贸职业技术学院 信息技术学院, 江苏 苏州 215009
  • 收稿日期:2020-11-19 修回日期:2020-12-26 出版日期:2021-09-10 发布日期:2021-05-12
  • 通讯作者: 胡伏原
  • 作者简介:徐江浪(1996-),男,江苏宿迁人,硕士研究生,CCF会员,主要研究方向:图像处理、深度学习、场景识别;李林燕(1983-),女,湖南岳阳人,高级工程师,硕士,主要研究方向:多模态信息处理;万新军(1996-),男,江苏连云港人,硕士研究生,CCF会员,主要研究方向:图像处理、深度学习;胡伏原(1978-),男,湖南岳阳人,教授,博士,CCF会员,主要研究方向:图像处理、模式识别、信息安全。
  • 基金资助:
    国家自然科学基金面上项目(61876121);江苏省高等学校自然科学研究项目(19KJB520054)。

Indoor scene recognition method combined with object detection

XU Jianglang1, LI Linyan2, WAN Xinjun1, HU Fuyuan1   

  1. 1. School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou Jiangsu 215009, China;
    2. School of Information Technology, Suzhou Institute of Trade and Commerce, Suzhou Jiangsu 215009, China
  • Received:2020-11-19 Revised:2020-12-26 Online:2021-09-10 Published:2021-05-12
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (61876121), the Natural Science Research Program of Jiangsu Province Higher Educations (19KJB520054).

摘要: 在目标检测网络(ObjectNet)和场景识别网络相结合的方法中,由于ObjectNet提取的目标特征和场景网络提取的场景特征的维度和性质不一致,且目标特征中存在影响场景判断的冗余信息,导致场景识别的准确率低。针对这个问题,提出一种改进的结合目标检测的室内场景识别方法。首先,在ObjectNet中引入类转换矩阵(CCM),将ObjectNet输出的目标特征进行转化,使得目标特征的维度与场景特征的维度相一致,以此减少特征维度不一致带来的信息丢失;然后采用上下文门控(CG)机制对特征中的冗余信息进行抑制,从而降低不相关信息的权重,提高了目标特征在场景识别中的作用。该方法在MIT Indoor67数据集上的识别准确率达到90.28%,与维护空间布局的对象语义特征(SOSF)方法相比识别准确率提高了0.77个百分点;其在SUN397数据集上识别准确率达到81.15%,与交替专家层次结构(HoAS)方法相比识别准确率提高了1.49个百分点。实验结果表明,所提方法提高了室内场景识别的准确率。

关键词: 深度学习, 卷积神经网络, 室内场景识别, 目标检测, 类转换矩阵, 上下文门控

Abstract: In the method of combining Object detection Network (ObjectNet) and scene recognition network, the object features extracted by the ObjectNet and the scene features extracted by the scene network are inconsistent in dimensionality and property, and there is redundant information in the object features that affects the scene judgment, resulting in low recognition accuracy of scenes. To solve this problem, an improved indoor scene recognition method combined with object detection was proposed. First, the Class Conversion Matrix (CCM) was introduced into the ObjectNet to convert the object features output by ObjectNet, so that the dimension of the object features was consistent with that of the scene features, as a result, the information loss caused by inconsistency of the feature dimensions was reduced. Then, the Context Gating (CG) mechanism was used to suppress the redundant information in the features, reducing the weight of irrelevant information, and increasing the contribution of object features in scene recognition. The recognition accuracy of the proposed method on MIT Indoor67 dataset reaches 90.28%, which is 0.77 percentage points higher than that of Spatial-layout-maintained Object Semantics Features (SOSF) method; and the recognition accuracy of the proposed method on SUN397 dataset is 81.15%, which is 1.49 percentage points higher than that of Hierarchy of Alternating Specialists (HoAS) method. Experimental results show that the proposed method improves the accuracy of indoor scene recognition.

Key words: deep learning, Convolutional Neural Network (CNN), indoor scene recognition, object detection, Class Conversion Matrix (CCM), Context Gating (CG)

中图分类号: