Salient object detection-driven viewport prediction for 360-degree live video streaming

doi:10.11772/j.issn.1001-9081.2026030348

Journal of Computer Applications

Salient object detection-driven viewport prediction for 360-degree live video streaming

CHEN Xiaolei, AN Qianqian

School of Microelectronics Industry-education Integration, Lanzhou University of Technology

Received:2026-04-07 Revised:2026-04-26 Online:2026-05-25 Published:2026-05-25
About author:CHEN Xiaolei, born in 1979, Ph. D., professor. His research interests include artificial intelligence, computer vision. AN Qianqian, born in 1997, M. S. candidate. Her research interests include computer vision.
Supported by:
National Natural Science Foundation of China （61967012）

显著目标检测驱动的360度视频直播视口预测

陈晓雷,安倩倩

兰州理工大学微电子现代产业学院

通讯作者: 陈晓雷
作者简介:陈晓雷(1979—)，男，河南灵宝人，教授，博士，CCF会员，主要研究方向：人工智能、计算机视觉；安倩倩(1997—)，女，甘肃庆阳人，硕士研究生，主要研究方向：计算机视觉。
基金资助:
国家自然科学基金资助项目(61967012)

Abstract

Abstract: In 360-degree panoramic live video streaming, viewport prediction is constrained by the unpredictability of future video content and strict latency requirements, which makes it difficult for traditional methods relying on long-term historical information and prior user trajectory knowledge to achieve satisfactory performance. To address this issue, SOD-VP360, a viewport prediction approach for 360-degree live video streaming based on salient object detection and Tiles classification, was developed. First, visually salient regions in video frames were extracted through salient object detection to provide content-guided information for viewport prediction. Then, a Tiles-embedded Transformer encoder was constructed to jointly model the visual features of spatial tiles and their temporal dynamics. Furthermore, a lightweight network combined with spherical convolution was adopted to model the geometric distortion of panoramic video while reducing computational complexity. Meanwhile, a reinforcement learning mechanism was incorporated for dynamic Tiles selection and classification so as to improve the accuracy of importance evaluation. Experiments were conducted on a dataset containing 18 virtual reality videos and head-motion trajectories from 48 users. Experimental results show that SOD-VP360 achieves good overall performance in terms of prediction accuracy, error duration, and bandwidth utilization, while meeting the real-time processing requirements of 360-degree live video streaming.

Key words: viewport prediction, 360-degree video, live video streaming, deep learning, salient object detection

摘要： 在360度全景视频直播场景中，受未来视频内容不可预知和严格时延约束影响，传统依赖长时历史信息与用户轨迹先验的视口预测方法难以获得理想性能。针对该问题，设计了一种基于显著目标检测与Tiles分类的360度视频直播视口预测方法SOD-VP360。首先，利用显著目标检测提取视频中的视觉显著区域，为视口预测提供内容引导信息；其次，构建Tiles嵌入式Transformer编码器，对空间图块的视觉特征及时间动态进行联合建模；进一步结合轻量级网络与球形卷积，实现对全景视频几何畸变的建模并降低计算复杂度；同时，引入强化学习机制对Tiles进行动态选择与分类，以提升重要性评估的准确性。在包含18个虚拟现实视频和48名用户头部运动轨迹的数据集上进行了实验验证。实验结果表明，SOD-VP360在预测精度、错误持续时间和带宽占用率等指标上取得了较好的综合表现，并满足了360度视频直播场景下的实时处理要求。

关键词: 视口预测；360度视频；视频直播；深度学习；显著目标检测

CLC Number:

TP391.4

CHEN Xiaolei, AN Qianqian. Salient object detection-driven viewport prediction for 360-degree live video streaming[J]. Journal of Computer Applications, DOI: 10.11772/j.issn.1001-9081.2026030348.

陈晓雷安倩倩. 显著目标检测驱动的360度视频直播视口预测[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2026030348.

[1]	WANG Xin, AN Junxiu, MAO Ke. Image captioning with block-prototype contrastive alignment based on dynamic semantic mapping [J]. Journal of Computer Applications, 0, (): 0-0.
[2]	. Scene recognition method based on structured co-occurrence representation learning [J]. Journal of Computer Applications, 0, (): 0-0.
[3]	. Attention-guided symmetric positive definite second-order representation for facial expression recognition [J]. Journal of Computer Applications, 0, (): 0-0.
[4]	. Red kidney bean leaf disease detection method based on Mamba feature extraction and improved YOLOv11 [J]. Journal of Computer Applications, 0, (): 0-0.
[5]	. Noninvasive fetal electrocardiogram signal extraction method based on Mamba-UNETR [J]. Journal of Computer Applications, 0, (): 0-0.
[6]	. Multimodal bio-coupling correlation driven audio-visual deepfake detection [J]. Journal of Computer Applications, 0, (): 0-0.
[7]	. UAV remote sensing image small object detection algorithm based on improved RT-DETR [J]. Journal of Computer Applications, 0, (): 0-0.
[8]	. Traffic prediction based on spatio-temporal bottleneck attention enhanced by pre-trained language model [J]. Journal of Computer Applications, 0, (): 0-0.
[9]	. Collaborative perception method based on closed-loop trajectory sharing [J]. Journal of Computer Applications, 0, (): 0-0.
[10]	Wenchao MING, Suzhen LIN, Zanxia JIN. Multi-band image captioning method based on scene concept-guided feature fusion [J]. Journal of Computer Applications, 2026, 46(5): 1560-1567.
[11]	Chi ZHANG, Xianjing MENG, Changhao DOU, Qian WANG, Leilei GENG, Xiaoming XI. MD-FVR： cascaded finger vein recognition network based on multi-domain feature fusion [J]. Journal of Computer Applications, 2026, 46(5): 1658-1666.
[12]	Wen PENG, Bokai ZHANG, Jinwei LIN. Chromosome cascaded classification framework integrating image texture enhancement and super-resolution [J]. Journal of Computer Applications, 2026, 46(5): 1647-1657.
[13]	Miaomiao YUAN, Yihong CHU, Guanjun YIN, Chunhua DENG. High-precision recognition method for imperfect grain images based on TransNeXt [J]. Journal of Computer Applications, 2026, 46(5): 1684-1691.
[14]	Binhong XIE, Erdan ZHU, Rui ZHANG. Appearance-motion collaborative modeling for video anomaly detection [J]. Journal of Computer Applications, 2026, 46(5): 1551-1559.
[15]	Yuanhao HE, Jun ZHAO. Defect detection algorithm for train bearing rollers based on FHC-DETR [J]. Journal of Computer Applications, 2026, 46(5): 1624-1633.

Salient object detection-driven viewport prediction for 360-degree live video streaming

显著目标检测驱动的360度视频直播视口预测

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics