Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (7): 1853-1856.DOI: 10.11772/j.issn.1001-9081.2017122920

Previous Articles     Next Articles

Fine-grained image classification method based on multi-feature combination

ZOU Chengming1,2, LUO Ying1,2, XU Xiaolong1,2   

  1. 1. Hubei Key Laboratory of Transportation Internet of Things(Wuhan University of Technology), Wuhan Hubei 430070, China;
    2. College of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2017-12-13 Revised:2018-02-08 Online:2018-07-10 Published:2018-07-12
  • Supported by:
    This work is partially supported by the Fundamental Research Funds for the Central Universities (2017-zy-084).

基于多特征组合的细粒度图像分类方法

邹承明1,2, 罗莹1,2, 徐晓龙1,2   

  1. 1. 交通物联网技术湖北省重点实验室(武汉理工大学), 武汉 430070;
    2. 武汉理工大学 计算机科学与技术学院, 武汉 430070
  • 通讯作者: 罗莹
  • 作者简介:邹承明(1975-),男,广东徐闻人,教授,博士,CCF会员,主要研究方向:计算机视觉、嵌入式系统、软件理论与方法;罗莹(1993-),女,湖南益阳人,硕士研究生,主要研究方向:图形图像处理;徐晓龙(1995-),男,安徽宿州人,硕士研究生,主要研究方向:图形图像处理。
  • 基金资助:
    中央高校基本科研业务费专项(2017-zy-084)。

Abstract: As the limitation of single feature representation may cause low accuracy of fine-grained image classification, a multi-feature combination representation method based on Convolutional Neural Network (CNN) and Scale Invariant Feature Transform (SIFT) was proposed. The features were extracted from the entire target, the key parts and the key points comprehensively. Firstly, two CNN models were trained with the target-entirety regions and the head-only regions in the fine-grained image library respectively, which were used to extract the target-entirety and the head-only CNN features. Secondly, the SIFT key points were extracted from all the target-entirety regions in the image library, and the codebook was generated through the K-means clustering. Then, the SIFT descriptors of each target-entirety region were encoded into a feature vector by using the Vector of Locally Aggregated Descriptors (VLAD) along with the codebook. Finally, Support Vector Machine (SVM) was used to classify the fine-grained images by using the combination of multiple features. The method was evaluated in CUB-200-2011 database and compared with the single feature representation method. The experimental results show that the proposed method can improve the classification accuracy by 13.31% compared with the single CNN feature representation, which proves the positive effect of multi-feature combination on fine-grained image classification.

Key words: Convolutional Neural Network (CNN), Scale Invariant Feature Transform (SIFT), K-means clustering, Vector of Locally Aggregated Descriptors (VLAD), fine-grained image classification

摘要: 针对单一特征表示的局限性会导致细粒度图像分类准确度不高的问题,提出了一种基于卷积神经网络(CNN)和尺度不变特征转换(SIFT)的多特征组合表示方法,综合考虑对目标整体、关键部位和关键点的特征提取。首先,分别以细粒度图像库中的目标整体和头部区域训练CNN得到两个网络模型,用来提取目标的整体和头部CNN特征;然后,对图像库中所有目标区域提取SIFT关键点并通过K均值(K-means)聚类生成码本,再将每个目标区域的SIFT描述子通过局部特征聚合描述符(VLAD)参照码本编码为特征向量;最后,组合多种特征作为最终的特征表示,采用支持向量机(SVM)对细粒度图像进行分类。使用该方法在CUB-200-2011数据库上进行实验,并与单一的特征表示方法进行了比较。实验结果表明,该方法与基于单一CNN特征的细粒度图像分类相比提升了13.31%的准确度,证明了多特征组合对细粒度图像分类的积极作用。

关键词: 卷积神经网络, 尺度不变特征转换, K均值聚类, 局部特征聚合描述符, 细粒度图像分类

CLC Number: