Gaze estimation model based on multi-scale aggregation and shared attention

doi:10.11772/j.issn.1001-9081.2023081172

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (7): 2047-2054.DOI: 10.11772/j.issn.1001-9081.2023081172

• Artificial intelligence • Previous Articles Next Articles

Gaze estimation model based on multi-scale aggregation and shared attention

Sailong SHI¹^,²^,³, Zhiwen FANG¹^,²^,³()

^1.School of Biomedical Engineering，Southern Medical University，Guangzhou Guangdong 510515，China
^2.Guangdong Provincial Key Laboratory of Medical Image Processing （Southern Medical University），Guangzhou Guangdong 510515，China
^3.Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology （Southern Medical University），Guangzhou Guangdong 510515，China

Received:2023-09-01 Revised:2023-11-15 Accepted:2023-11-24 Online:2024-07-18 Published:2024-07-10
Contact: Zhiwen FANG
About author:SHI Sailong， born in 2000， M. S. candidate. His research interests include computer vision， gaze analysis.
First author contact:FANG Zhiwen， born in 1983， Ph. D.， associate professor. His research interests include abnormal analysis， behavior analysis and evaluation， gaze analysis， medical image segmentation.
Supported by:
National Natural Science Foundation of China(62371219);Guangdong Basic and Applied Basic Research Foundation(2023A1515011260);Science and Technology Program of Guangzhou(202201011672)

基于多尺度聚合和共享注意力的注视估计模型

施赛龙¹^,²^,³, 方智文¹^,²^,³()

^1.南方医科大学生物医学工程学院, 广州 510515
^2.广东省医学图像处理重点实验室(南方医科大学), 广州 510515
^3.广东省医学成像与诊断技术工程实验室(南方医科大学), 广州 510515

通讯作者: 方智文
作者简介:施赛龙（2000—），男，江苏南通人，硕士研究生，主要研究方向：计算机视觉、注视分析；
第一联系人：方智文（1983—），男，湖南娄底人，副教授，博士，主要研究方向：异常分析、行为分析与评价、注视分析、医学图像分割。
基金资助:
国家自然科学基金资助项目(62371219);广东省基础与应用基础研究基金资助项目(2023A1515011260);广州市科技计划项目(202201011672)

Abstract

Abstract:

Gaze estimation is a method for estimating 3D gaze directions from face images， where information about eye details directly related to gaze is concentrated in the face image and has a significant impact on gaze estimation. However， existing gaze estimation models ignore small-scale eye details and are easily overwhelmed by gaze-independent information in image features. For this reason， a model based on multi-scale aggregation and shared attention was proposed to enhance the representativeness of features. First， the omission of eye details in images by the model was dealt with by using shunted self-attention to aggregate eye and face information at different scales in an image and guiding the model to learn the correlation between objects at different scales； second， the attention to gaze-irrelevant features was reduced by establishing shared attention to capture shared features between images； and lastly， the combination of multi-scale aggregation and shared attention was used to further improve the accuracy of gaze estimation. On the public datasets MPIIFaceGaze， Gaze360， Gaze360_Processed， and GAFA-Head， the average angular errors of the proposed model are lower by 5.74%， 4.09%， 4.82%， and 10.55% compared to GazeTR （Gaze TRansformer）. For difficult images with back-to-camera on the Gaze360， the average angular error of the proposed model is lower by 4.70% compared to GazeTR. The experimental results show that the proposed model can effectively aggregate multi-scale gaze information and shared attention to improve the accuracy and robustness of gaze estimation.

Key words: gaze estimation, shared attention, multi-scale aggregation, shared feature, computer vision

摘要：

注视估计是从人脸图像中估计3D注视方向的方法，其中与注视直接相关的眼睛细节信息在人脸图像中集中且对注视估计具有显著影响。然而现有的注视估计模型忽略了小尺度的眼睛细节，且容易被图像特征中与注视无关的信息淹没。为此，提出一种基于多尺度聚合和共享注意力的模型以增强特征的表达能力。首先，使用分流自注意力聚合图像中不同尺度的眼睛和人脸信息，并引导模型学习不同尺度对象之间的相关性，以此处理模型对图像中眼睛细节的遗漏问题；其次，通过建立共享注意力来捕获图像之间的共享特征，减少对注视无关特征的关注；最后，结合多尺度聚合和共享注意力，进一步提高注视估计的精度。在公开数据集MPIIFaceGaze、Gaze360、Gaze360_Processed和GAFA-Head上，所提模型的平均角度误差比GazeTR （Gaze TRansformer）降低了5.74%、4.09%、4.82%和10.55%。在Gaze360背对相机的困难图像上，所提模型的平均角度误差比GazeTR降低了4.70%。实验结果表明，所提模型能有效聚合多尺度的注视信息和共享注意力，提高注视估计的准确性和鲁棒性。

关键词: 注视估计, 共享注意力, 多尺度聚合, 共享特征, 计算机视觉

CLC Number:

TP391.4

Sailong SHI, Zhiwen FANG. Gaze estimation model based on multi-scale aggregation and shared attention[J]. Journal of Computer Applications, 2024, 44(7): 2047-2054.

施赛龙, 方智文. 基于多尺度聚合和共享注意力的注视估计模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2047-2054.

Figures/Tables 8

References 33

1	EMERY N J. The eyes have it： the neuroethology， function and evolution of social gaze ［J］. Neuroscience & Biobehavioral Reviews， 2000， 24（6）： 581-604.
2	TERZIOĞLU Y， MUTLU B， ŞAHIN E. Designing social cues for collaborative robots： the roie of gaze and breathing in human-robot collaboration ［C］// Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction. New York： ACM， 2020： 343-357.
3	TOPAL C， GUNAL S， KOÇDEVIREN O， et al. A low-computational approach on gaze estimation with eye touch system ［J］. IEEE Transactions on Cybernetics， 2014， 44（2）： 228-239.
4	胡文婷，周献中，盛寅，等.基于视线跟踪的智能界面实现机制研究［J］.计算机应用与软件， 2016， 33（1）： 134-137.
	HU W T， ZHOU X Z， SHENG Y， et al. On implementation mechanism of intelligent interface based on gaze tracking ［J］. Computer Applications and Software， 2016， 33（1）： 134-137.
5	CHONG E， CLARK-WHITNEY E， SOUTHERLAND A， et al. Detection of eye contact with deep neural networks is as accurate as human experts ［J］. Nature Communications， 2020， 11（1）： 6386.
6	LI J， CHEN Z， ZHONG Y， et al. Appearance-based gaze estimation for ASD diagnosis ［J］. IEEE Transactions on Cybernetics， 2022， 52（7）： 6504-6517.
7	郭爱华，潘小平.阿尔茨海默病的眼动跟踪研究［J］.广东医学， 2021， 42（9）： 1132-1135.
	GUO A H， PAN X P. Eye tracking research on Alzheimer’s disease ［J］. Guangdong Medical Journal， 2021， 42（9）： 1132-1135.
8	VINNIKOV M， ALLISON R S， FERNANDES S. Gaze-contingent auditory displays for improved spatial attention in virtual reality ［J］. ACM Transactions on Computer-Human Interaction， 2017， 24（3）： No. 19.
9	PATNEY A， SALVI M， KIM J， et al. Towards foveated rendering for gaze-tracked virtual reality ［J］. ACM Transactions on Graphics， 2016， 35（6）： No. 179.
10	侯守明，贾超兰，张明敏.用于虚拟现实系统的眼动交互技术综述［J］.计算机应用， 2022， 42（11）： 3534-3543.
	HOU S M， JIA C L， ZHANG M M. Review of eye movement-based interaction techniques for virtual reality systems ［J］. Journal of Computer Applications， 2022， 42（11）： 3534-3543.
11	LIU Y， ZHOU L， BAI X， et al. Goal-oriented gaze estimation for zero-shot learning ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway IEEE， 2021： 3793-3802.
12	张闯，迟健男，张朝晖，等.一种新的基于瞳孔-角膜反射技术的视线追踪方法［J］.计算机学报， 2010， 33（7）： 1272-1285.
	ZHANG C， CHI J N， ZHANG Z H， et al. A novel eye gaze tracking technique based on pupil center cornea reflection technique ［J］. Chinese Journal of Computers， 2010， 33（7）： 1272-1285.
13	熊春水，黄磊，刘昌平.一种新的单点标定视线估计方法［J］.自动化学报， 2014， 40（3）： 459-470.
	XIONG C S， HUANG L， LIU C P. A novel gaze estimation method with one-point calibration ［J］. Acta Automatica Sinica， 2014， 40（3）： 459-470.
14	苟超，卓莹，王康，等.眼动跟踪研究进展与展望［J］.自动化学报，2022， 48（5）： 1173-1192.
	GOU C， ZHUO Y， WANG K， et al. Research advances and prospects of eye tracking ［J］. Acta Automatica Sinica， 2022， 48（5）： 1173-1192.
15	ZHANG X， SUGANO Y， FRITZ M， et al. Appearance-based gaze estimation in the wild ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 4511-4520.
16	WANG K， ZHAO R， JI Q. A hierarchical generative model for eye image synthesis and eye gaze estimation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 440-448.
17	REED S， AKATA Z， YAN X， et al. Generative adversarial text to image synthesis ［C］// Proceedings of the 33rd International Conference on Machine Learning. New York： ACM， 2016： 1060-1069.
18	LIU G， YU Y， MORA K A F， et al. A differential approach for gaze estimation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligencem， 2021， 43（3）： 1092-1099.
19	SUN Y， ZENG J， SHAN S， et al. Cross-encoder for unsupervised gaze representation learning ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 3682-3691.
20	ZHANG X， SUGANO Y， FRITZ M， et al. It's written all over your face： full-face appearance-based gaze estimation ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2017： 2299-2308.
21	CHENG Y， HUANG S， WANG F， et al. A coarse-to-fine adaptive network for appearance-based gaze estimation ［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. Menlo Park： AAAI， 2020： 10623-10630.
22	ZHANG X， SUGANO Y， BULLING A， et al. Learning-based region selection for end-to-end gaze estimation ［C］// Proceedings of the 31st British Machine Vision Conference. Nottingham， UK： BMVA Press， 2020： No. 86.
23	KELLNHOFER P， RECASENS A， STENT S， et al. Gaze360： physically unconstrained gaze estimation in the wild ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 6911-6920.
24	KOTHARI R， DE MELLO S， IQBAL U， et al. Weakly-supervised physically unconstrained gaze estimation ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 9975-9984.
25	NONAKA S， NOBUHARA S， NISHINO K. Dynamic 3D gaze from afar： deep gaze estimation from temporal eye-head-body coordination ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 2182-2191.
26	WU Y， LI G， LIU Z， et al. Gaze estimation via modulation-based adaptive network with auxiliary self-learning ［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（8）： 5510-5520.
27	CHEN Z， SHI B E. Towards high performance low complexity calibration in appearance based gaze estimation ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（1）： 1174-1188.
28	CHENG Y， LU F. Gaze estimation using transformer ［C］// Proceedings of the 2022 26th International Conference on Pattern Recognition. Piscataway： IEEE， 2022： 3341-3347.
29	OH J O， CHANG H J， S-I CHOI. Self-attention with convolution and deconvolution for efficient eye gaze estimation from a full face image ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Piscataway： IEEE， 2022： 4988-4996.
30	NAGPURE V， OKUMA K. Searching efficient neural architecture with multi-resolution fusion transformer for appearance-based gaze estimation ［C］// Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. Piscataway： IEEE， 2023： 890-899.
31	REN S， ZHOU D， HE S， et al. Shunted self-attention via multi-scale token aggregation ［C］// Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2022： 10843-10852.
32	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16x16 words： Transformers for image recognition at scale ［EB/OL］. （2021-06-03）［2022-10-14］.
33	CHENG Y， WANG H， BAO Y， et al. Appearance-based gaze estimation with deep learning： a review and benchmark ［EB/OL］. （2021-04-26）［2023-08-22］. .

模型	MPIIFaceGaze	Gaze360	Gaze360_Processed	GAFA-Head
FullFace^［20］	4.93	22.06	14.99	33.02
Gaze360^［23］	4.06	15.60	11.04	27.78
CA-Net^［21］	4.27	N/A	11.20	N/A
MANet^［26］	4.30	N/A	13.20	N/A
GEDDnet^［27］	4.50	N/A	N/A	N/A
GazeTR^［28］	4.18	15.39	11.00	28.53
CADSE^［29］	4.04	N/A	10.70	N/A
GazeNas-ETH^［30］	3.96	N/A	10.52	N/A
本文模型	3.94	14.76	10.47	25.52

模型	MPIIFaceGaze	Gaze360	Gaze360_Processed	GAFA-Head
FullFace^［20］	4.93	22.06	14.99	33.02
Gaze360^［23］	4.06	15.60	11.04	27.78
CA-Net^［21］	4.27	N/A	11.20	N/A
MANet^［26］	4.30	N/A	13.20	N/A
GEDDnet^［27］	4.50	N/A	N/A	N/A
GazeTR^［28］	4.18	15.39	11.00	28.53
CADSE^［29］	4.04	N/A	10.70	N/A
GazeNas-ETH^［30］	3.96	N/A	10.52	N/A
本文模型	3.94	14.76	10.47	25.52

模型	Gaze360				GAFA-Head
模型	全角度	前180°	正面	后180°	全角度	前180°	正面	后180°
FullFace^［20］	22.06	17.82	18.44	37.33	33.02	22.87	21.20	41.59
Gaze360^［23］	15.60	13.40	13.40	23.50	27.78	19.75	19.15	34.55
GazeTR^［28］	15.39	13.27	13.60	23.00	28.53	21.31	20.71	34.63
本文模型	14.76	12.68	12.78	21.92	25.52	18.87	18.64	31.15

模型	Gaze360				GAFA-Head
模型	全角度	前180°	正面	后180°	全角度	前180°	正面	后180°
FullFace^［20］	22.06	17.82	18.44	37.33	33.02	22.87	21.20	41.59
Gaze360^［23］	15.60	13.40	13.40	23.50	27.78	19.75	19.15	34.55
GazeTR^［28］	15.39	13.27	13.60	23.00	28.53	21.31	20.71	34.63
本文模型	14.76	12.68	12.78	21.92	25.52	18.87	18.64	31.15

分流自注意力	共享注意力	AAE
分流自注意力	共享注意力	MPIIFaceGaze	Gaze360	Gaze360_Processed	GAFA-Head
无	无	4.06	15.60	11.21	27.78
有	无	3.98	15.31	10.70	27.45
无	有	4.03	15.33	10.79	26.21
有	有	3.94	14.76	10.47	25.52

Gaze estimation model based on multi-scale aggregation and shared attention

基于多尺度聚合和共享注意力的注视估计模型

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 33

Related Articles 15

Recommended Articles

Metrics

[1]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[2]	Shuai FU, Xiaoying GUO, Ruyi BAI, Tao YAN, Bin CHEN. Age estimation method combining improved CloFormer model and ordinal regression [J]. Journal of Computer Applications, 2024, 44(8): 2372-2380.
[3]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[4]	Ziwen SUN, Lizhi QIAN, Chuandong YANG, Yibo GAO, Qingyang LU, Guanglin YUAN. Survey of visual object tracking methods based on Transformer [J]. Journal of Computer Applications, 2024, 44(5): 1644-1654.
[5]	Yudong PANG, Zhixing LI, Weijie LIU, Tianhao LI, Ningning WANG. Small target detection model in overlooking scenes on tower cranes based on improved real-time detection Transformer [J]. Journal of Computer Applications, 2024, 44(12): 3922-3929.
[6]	Yongjiang LIU, Bin CHEN. Pixel-level unsupervised industrial anomaly detection based on multi-scale memory bank [J]. Journal of Computer Applications, 2024, 44(11): 3587-3594.
[7]	Wenze CHAI, Jing FAN, Shukui SUN, Yiming LIANG, Jingfeng LIU. Overview of deep metric learning [J]. Journal of Computer Applications, 2024, 44(10): 2995-3010.
[8]	Yi WANG, Jie XIE, Jia CHENG, Liwei DOU. Review of object pose estimation in RGB images based on deep learning [J]. Journal of Computer Applications, 2023, 43(8): 2546-2555.
[9]	Yichi CHEN, Bin CHEN. Review of lifelong learning in computer vision [J]. Journal of Computer Applications, 2023, 43(6): 1785-1795.
[10]	Mengting WANG, Wenzhong YANG, Yongzhi WU. Survey of single target tracking algorithms based on Siamese network [J]. Journal of Computer Applications, 2023, 43(3): 661-673.
[11]	SHEN Zhijun, MU Lina, GAO Jing, SHI Yuanhang, LIU Zhiqiang. Review of fine-grained image categorization [J]. Journal of Computer Applications, 2023, 43(1): 51-60.
[12]	Zhida FENG, Li CHEN. Single direction projected Transformer method for aliasing text detection [J]. Journal of Computer Applications, 2022, 42(12): 3686-3691.
[13]	Shouming HOU, Chaolan JIA, Mingmin ZHANG. Review of eye movement‑based interaction techniques for virtual reality systems [J]. Journal of Computer Applications, 2022, 42(11): 3534-3543.
[14]	Yi ZHANG, Hua WAN, Shuqin TU. Technical review and case study on classification of Chinese herbal slices based on computer vision [J]. Journal of Computer Applications, 2022, 42(10): 3224-3234.
[15]	MA Jialiang, CHEN Bin, SUN Xiaofei. General object detection framework based on improved Faster R-CNN [J]. Journal of Computer Applications, 2021, 41(9): 2712-2719.