Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2047-2054.DOI: 10.11772/j.issn.1001-9081.2023081172

• Artificial intelligence • Previous Articles     Next Articles

Gaze estimation model based on multi-scale aggregation and shared attention

Sailong SHI1,2,3, Zhiwen FANG1,2,3()   

  1. 1.School of Biomedical Engineering,Southern Medical University,Guangzhou Guangdong 510515,China
    2.Guangdong Provincial Key Laboratory of Medical Image Processing (Southern Medical University),Guangzhou Guangdong 510515,China
    3.Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology (Southern Medical University),Guangzhou Guangdong 510515,China
  • Received:2023-09-01 Revised:2023-11-15 Accepted:2023-11-24 Online:2024-07-18 Published:2024-07-10
  • Contact: Zhiwen FANG
  • About author:SHI Sailong, born in 2000, M. S. candidate. His research interests include computer vision, gaze analysis.
    First author contact:FANG Zhiwen, born in 1983, Ph. D., associate professor. His research interests include abnormal analysis, behavior analysis and evaluation, gaze analysis, medical image segmentation.
  • Supported by:
    National Natural Science Foundation of China(62371219);Guangdong Basic and Applied Basic Research Foundation(2023A1515011260);Science and Technology Program of Guangzhou(202201011672)

基于多尺度聚合和共享注意力的注视估计模型

施赛龙1,2,3, 方智文1,2,3()   

  1. 1.南方医科大学 生物医学工程学院, 广州 510515
    2.广东省医学图像处理重点实验室(南方医科大学), 广州 510515
    3.广东省医学成像与诊断技术工程实验室(南方医科大学), 广州 510515
  • 通讯作者: 方智文
  • 作者简介:施赛龙(2000—),男,江苏南通人,硕士研究生,主要研究方向:计算机视觉、注视分析;
    第一联系人:方智文(1983—),男,湖南娄底人,副教授,博士,主要研究方向:异常分析、行为分析与评价、注视分析、医学图像分割。
  • 基金资助:
    国家自然科学基金资助项目(62371219);广东省基础与应用基础研究基金资助项目(2023A1515011260);广州市科技计划项目(202201011672)

Abstract:

Gaze estimation is a method for estimating 3D gaze directions from face images, where information about eye details directly related to gaze is concentrated in the face image and has a significant impact on gaze estimation. However, existing gaze estimation models ignore small-scale eye details and are easily overwhelmed by gaze-independent information in image features. For this reason, a model based on multi-scale aggregation and shared attention was proposed to enhance the representativeness of features. First, the omission of eye details in images by the model was dealt with by using shunted self-attention to aggregate eye and face information at different scales in an image and guiding the model to learn the correlation between objects at different scales; second, the attention to gaze-irrelevant features was reduced by establishing shared attention to capture shared features between images; and lastly, the combination of multi-scale aggregation and shared attention was used to further improve the accuracy of gaze estimation. On the public datasets MPIIFaceGaze, Gaze360, Gaze360_Processed, and GAFA-Head, the average angular errors of the proposed model are lower by 5.74%, 4.09%, 4.82%, and 10.55% compared to GazeTR (Gaze TRansformer). For difficult images with back-to-camera on the Gaze360, the average angular error of the proposed model is lower by 4.70% compared to GazeTR. The experimental results show that the proposed model can effectively aggregate multi-scale gaze information and shared attention to improve the accuracy and robustness of gaze estimation.

Key words: gaze estimation, shared attention, multi-scale aggregation, shared feature, computer vision

摘要:

注视估计是从人脸图像中估计3D注视方向的方法,其中与注视直接相关的眼睛细节信息在人脸图像中集中且对注视估计具有显著影响。然而现有的注视估计模型忽略了小尺度的眼睛细节,且容易被图像特征中与注视无关的信息淹没。为此,提出一种基于多尺度聚合和共享注意力的模型以增强特征的表达能力。首先,使用分流自注意力聚合图像中不同尺度的眼睛和人脸信息,并引导模型学习不同尺度对象之间的相关性,以此处理模型对图像中眼睛细节的遗漏问题;其次,通过建立共享注意力来捕获图像之间的共享特征,减少对注视无关特征的关注;最后,结合多尺度聚合和共享注意力,进一步提高注视估计的精度。在公开数据集MPIIFaceGaze、Gaze360、Gaze360_Processed和GAFA-Head上,所提模型的平均角度误差比GazeTR (Gaze TRansformer)降低了5.74%、4.09%、4.82%和10.55%。在Gaze360背对相机的困难图像上,所提模型的平均角度误差比GazeTR降低了4.70%。实验结果表明,所提模型能有效聚合多尺度的注视信息和共享注意力,提高注视估计的准确性和鲁棒性。

关键词: 注视估计, 共享注意力, 多尺度聚合, 共享特征, 计算机视觉

CLC Number: