Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (6): 1972-1978.DOI: 10.11772/j.issn.1001-9081.2021040647

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles    

Joint 1-2-order pooling network learning for remote sensing scene classification

Xiaoyong BIAN1,2,3(), Xiongjun FEI1, Chunfang CHEN1, Dongdong KAN1, Sheng DING1,2,3   

  1. 1.School of Computer Science and Technology,Wuhan University of Science and Technology,Wuhan Hubei 430065,China
    2.Institute of Big Data Science and Engineering,Wuhan University of Science and Technology,Wuhan Hubei 430065,China
    3.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System (Wuhan University of Science and Technology),Wuhan Hubei 430065,China
  • Received:2021-04-23 Revised:2021-07-30 Accepted:2021-08-05 Online:2022-06-22 Published:2022-06-10
  • Contact: Xiaoyong BIAN
  • About author:FEI Xiongjun,born in 1992,M. S. candidate. His research interests include high-order pooling.
    CHEN Chunfang,born in 1992,M. S. candidate. Her research interests include deep multi-instance learning.
    KAN Dongdong,born in 1998,M. S. candidate. His research interests include high-order pooling.
    DING Sheng,born in 1975,Ph. D.,associate professor. His research interests include object detection,deep learning
  • Supported by:
    National Natural Science Foundation of China(61972299);Graduate Innovation Foundation of Wuhan University of Science and Technology(JCX201927)


边小勇1,2,3(), 费雄君1, 陈春芳1, 阚东东1, 丁胜1,2,3   

  1. 1.武汉科技大学 计算机科学与技术学院, 武汉 430065
    2.武汉科技大学 大数据科学与工程研究院, 武汉 430065
    3.智能信息处理与实时工业系统湖北省重点实验室(武汉科技大学), 武汉 430065
  • 通讯作者: 边小勇
  • 作者简介:边小勇(1976—),男,江西吉安人,副教授,博士,主要研究方向:机器学习、遥感场景分类
  • 基金资助:


At present, most pooling methods mainly extract aggregated feature information from the 1-order pooling layer or the 2-order pooling layer, ignoring the comprehensive representation capability of multiple pooling strategies for scenes, which affects the scene recognition performance. To address the above problems, a joint model with first- and second-order pooling networks learning for remote sensing scene classification was proposed. Firstly, the convolutional layers of residual network ResNet-50 were utilized to extract the initial features of the input images. Then, a second-order pooling approach based on the similarity of feature vectors was proposed, where the information distribution of feature values was modulated by deriving their weight coefficients from the similarity between feature vectors, and the efficient second-order feature information was calculated. Meanwhile, an approximate solving method for calculating square root of covariance matrix was introduced to obtain the second-order feature representation with higher semantic information. Finally, the entire network was trained with the combination loss function composed of cross-entropy and class-distance weighting. As a result, a discriminative classification model was achieved. The proposed method was tested on AID (50% training proportion), NWPU-RESISC45 (20% training proportion), CIFAR-10 and CIFAR-100 datasets and achieved classification accuracies of 96.32%, 93.38%, 96.51% and 83.30% respectively, which were increased by 1.09 percentage points, 0.55 percentage points, 1.05 percentage points and 1.57 percentage points respectively, compared with iterative matrix SQuare RooT normalization of COVariance pooling (iSQRT-COV). Experimental results show that the proposed method effectively improves the performance of remote sensing scene classification.

Key words: remote sensing scene classification, deep learning, first-order pooling, second-order pooling, square root of covariance matrix


目前大多数池化方法主要是从一阶池化层或二阶池化层提取聚合特征信息,忽略了多种池化策略对场景的综合表示能力,进而影响到场景识别性能。针对以上问题,提出了联合一二阶池化网络学习的遥感场景分类模型。首先,利用残差网络ResNet-50的卷积层提取输入图像的初始特征。接着,提出基于特征向量相似度的二阶池化方法,即通过特征向量间的相似度求出其权重系数来调制特征值的信息分布,并计算有效的二阶特征信息。同时,引入一种有效的协方差矩阵平方根逼近求解方法,以获得高阶语义信息的二阶特征表示。最后,基于交叉熵和类距离加权的组合损失函数训练整个网络,从而得到富于判别性的分类模型。所提方法在AID(50%训练比例)、NWPU-RESISC45 (20%训练比例)、CIFAR-10和CIFAR-100数据集上的分类准确率分别达到96.32%、93.38%、96.51%和83.30%,与iSQRT-COV方法相比,分别提高了1.09个百分点、0.55个百分点、1.05个百分点和1.57个百分点。实验结果表明,所提方法有效提高了遥感场景分类性能。

关键词: 遥感场景分类, 深度学习, 一阶池化, 二阶池化, 协方差矩阵平方根

CLC Number: