Journal of Computer Applications

    Next Articles

Human parsing method with aggregation of generalized contextual features

  

  • Received:2024-10-28 Revised:2025-01-06 Accepted:2025-01-07 Online:2025-01-22 Published:2025-01-22
  • Supported by:
    National Natural Science Foundation of China;Fundamental Research Funds for the Central Universities

聚合广义上下文特征的人体解析方法

袁家奇1,黄荣1,2*,董爱华1,2,周树波1,2,刘浩1,2   

  1. 1.东华大学 信息科学与技术学院,上海市 201620;
    2.东华大学 数字化纺织服装技术教育工程研究中心,上海 201620


  • 通讯作者: 黄荣
  • 基金资助:
    国家自然科学基金资助项目;中央高校基本科研业务费专项资助项目

Abstract: Human parsing aims to achieve fine-grained segmentation of human body parts in images. Some human parsing methods enhance part representations by aggregating contextual features, but the scope of such contextual aggregation is often limited. To address this issue, a generalized contextual features aggregation human parsing method was proposed. Guided by prior knowledge of human topological structure, the proposed method not only aggregates contextual features globally within the current image but also extends the aggregation scope to other images. This extended scope was defined as generalized context. For the current image, a Cross-Stripe Attention Module (CSAM) was designed to aggregate global contextual features within the image. This module leverages the part distribution to model the human topological structure prior within the image and uses it as guidance to aggregate contextual features along horizontal and vertical stripes. For other images, a Region-aware Batch Attention Module (RBAM) was designed to aggregate inter-image contextual features at the batch level. Due to the constraints of human topological structure, the positional deviations of similar parts across batch images fall within a certain range. This enables the module to learn the spatial offsets between similar parts across different human images. Based on these offsets, it aggregates features from similar part regions in other images along the batch dimension. Quantitative comparison results show that our method improves the mean Intersection over Union (mIoU) on the LIP dataset by 0.42 percentage points compared to DTML (Dual-Task Mutual Learning). Visualization experiments demonstrate that the proposed method effectively aggregates global features from the current image and part features from other images within the generalized context.

Key words: generalized contextual feature, feature aggregation, batch-dimension attention, cross-stripe attention, human parsing

摘要: 人体解析旨在对人体图像进行细粒度部件分割。一些人体解析方法通过聚合上下文特征增强部件表示,但这些方法聚合上下文特征的范围受限。针对这个问题,设计聚合广义上下文特征的人体解析方法。该方法以人体拓扑结构先验为引导,不仅从当前图像的全局聚合上下文特征,还进一步将聚合的范围扩展到其他图像。这个扩展后的范围被定义为广义上下文。对于当前图像,设计十字条纹注意力模块(CSAM)聚合图像内的全局上下文特征。该模块通过部件分布刻画图像内的人体拓扑结构先验,并以此为引导在水平竖直方向条纹内聚合上下文特征。对于其他图像,提出区域感知批注意力模块(RBAM),以批为单位聚合图像间上下文特征。由于人体拓扑结构的约束,批量人体图像间相似部件的位置偏差处于一定范围内。这使得该模块能够学习不同人体图像相似部件间的空间偏移,并根据偏移沿批维度从其他图像的相似部件区域中聚合特征。定量对比结果表明,所提方法在LIP数据集上的平均交并比(mIoU)指标与DTML(Dual-Task Mutual Learning)相比,提高了0.42个百分点。可视化实验表明,所提方法能够从广义上下文中聚合当前图像的全局特征以及其他图像的部件特征。

关键词: 广义上下文, 特征聚合, 批维度注意力, 十字条纹注意力, 人体解析

CLC Number: