Journal of Computer Applications ›› 2025, Vol. 45 ›› Issue (10): 3170-3178.DOI: 10.11772/j.issn.1001-9081.2024101527

• Artificial intelligence • Previous Articles    

Human parsing method with aggregation of generalized contextual features

Jiaqi YUAN1, Rong HUANG1,2(), Aihua DONG1,2, Shubo ZHOU1,2, Hao LIU1,2   

  1. 1.College of Information Science and Technology,Donghua University,Shanghai 201620,China
    2.Engineering Research Center of Digitized Textile and Apparel Technology,Ministry of Education (Donghua University),Shanghai 201620,China
  • Received:2024-10-15 Revised:2025-01-06 Accepted:2025-01-07 Online:2025-01-22 Published:2025-10-10
  • Contact: Rong HUANG
  • About author:YUAN Jiaqi, born in 1999, M. S. candidate. His research interests include deep learning, human parsing.
    HUANG Rong, born in 1985, Ph. D., associate professor. His research interests include artificial intelligence, human parsing.
    DONG Aihua, born in 1970, Ph. D., associate professor. Her research interests include smart textiles and clothing.
    ZHOU Shubo, born in 1988, Ph. D., lecturer. His research interests include deep learning, machine vision.
    LIU Hao, born in 1977, Ph. D., associate professor. His research interests include deep learning, machine vision.
  • Supported by:
    National Natural Science Foundation of China(62001099);Fundamental Research Funds for the Central Universities(2232023D-30)

聚合广义上下文特征的人体解析方法

袁家奇1, 黄荣1,2(), 董爱华1,2, 周树波1,2, 刘浩1,2   

  1. 1.东华大学 信息科学与技术学院,上海 201620
    2.数字化纺织服装技术教育部工程研究中心(东华大学),上海 201620
  • 通讯作者: 黄荣
  • 作者简介:袁家奇(1999—),男,湖南常德人,硕士研究生,CCF会员,主要研究方向:深度学习、人体解析
    黄荣(1985—),男,浙江绍兴人,副教授,博士,CCF会员,主要研究方向:人工智能、人体解析 Email:rong.huang@dhu.edu.cn
    董爱华(1970—),女,上海人,副教授,博士,主要研究方向:智能纺织服装
    周树波(1988—),男,浙江绍兴人,讲师,博士,主要研究方向:深度学习、机器视觉
    刘浩(1977—),男,四川达州人,副教授,博士,CCF会员,主要研究方向:深度学习、机器视觉。
  • 基金资助:
    国家自然科学基金资助项目(62001099);中央高校基本科研业务费专项资金资助项目(2232023D-30)

Abstract:

Human parsing aims to achieve fine-grained part segmentation in human images. Some human parsing methods enhance part representations by aggregating contextual features, but the scope of such contextual aggregation is often limited. To address this issue, a generalized contextual features aggregation based human parsing method was proposed. In this method, guided by prior knowledge of human topological structure, the contextual features were aggregated from the current image globally, and the aggregation scope was extended to other images. This extended scope was defined as generalized context. For the current image, a Cross-Stripe Attention Module (CSAM) was designed to aggregate global contextual features within the image. In this module, the human topological structure prior within the image was described through the part distribution, and the contextual features were aggregated along horizontal and vertical stripes with the above as guidance. For other images, a Region-aware Batch Attention Module (RBAM) was designed to aggregate the inter-image contextual features at batch level. Due to the constraints of human topological structure, the positional deviations of similar parts between batch human images are in a certain range. This enabled RBAM to learn the spatial offsets between similar parts of different human images, and based on these offsets, features were aggregated from similar part regions in other images along the batch dimension. Quantitative comparison results show that the proposed method improves the mean Intersection over Union (mIoU) by 0.43 percentage points compared to Dual-Task Mutual Learning (DTML) on the LIP (Look Into Person) dataset. Visualization experimental results demonstrate that the proposed method aggregates global features of the current image and part features of other images from the generalized context.

Key words: generalized contextual feature, feature aggregation, batch-dimension attention, cross-stripe attention, human parsing

摘要:

人体解析旨在对人体图像进行细粒度部件分割。一些人体解析方法通过聚合上下文特征增强部件表示,但这些方法聚合上下文特征的范围受限。针对这个问题,设计聚合广义上下文特征的人体解析方法。该方法以人体拓扑结构先验为引导,不仅从当前图像的全局聚合上下文特征,还进一步将聚合的范围扩展到其他图像。这个扩展后的范围被定义为广义上下文。对于当前图像,设计十字条纹注意力模块(CSAM)聚合图像内的全局上下文特征。该模块通过部件分布刻画图像内的人体拓扑结构先验,并以此为引导在水平、竖直方向条纹内聚合上下文特征。对于其他图像,提出区域感知批注意力模块(RBAM),以批为单位聚合图像间上下文特征。由于人体拓扑结构的约束,批量人体图像间相似部件的位置偏差处于一定范围内。这使得RBAM能够学习不同人体图像相似部件间的空间偏移,并根据偏移,沿批维度从其他图像的相似部件区域中聚合特征。定量对比结果表明,与双任务互学习(DTML)相比,所提方法在LIP(Look Into Person)数据集上的平均交并比(mIoU)提高了0.43个百分点。可视化实验结果表明,所提方法能够从广义上下文中聚合当前图像的全局特征和其他图像的部件特征。

关键词: 广义上下文特征, 特征聚合, 批维度注意力, 十字条纹注意力, 人体解析

CLC Number: