Journal of Computer Applications

    Next Articles

Salient object detection-driven viewport prediction for 360-degree live video streaming

CHEN Xiaolei, AN Qianqian   

  1. School of Microelectronics Industry-education Integration, Lanzhou University of Technology
  • Received:2026-04-07 Revised:2026-04-26 Online:2026-05-25 Published:2026-05-25
  • About author:CHEN Xiaolei, born in 1979, Ph. D., professor. His research interests include artificial intelligence, computer vision. AN Qianqian, born in 1997, M. S. candidate. Her research interests include computer vision.
  • Supported by:
    National Natural Science Foundation of China (61967012)

显著目标检测驱动的360度视频直播视口预测

陈晓雷,安倩倩   

  1. 兰州理工大学 微电子现代产业学院
  • 通讯作者: 陈晓雷
  • 作者简介:陈晓雷(1979—),男,河南灵宝人,教授,博士,CCF会员,主要研究方向:人工智能、计算机视觉;安倩倩(1997—),女,甘肃庆阳人,硕士研究生,主要研究方向:计算机视觉。
  • 基金资助:
    国家自然科学基金资助项目(61967012)

Abstract: In 360-degree panoramic live video streaming, viewport prediction is constrained by the unpredictability of future video content and strict latency requirements, which makes it difficult for traditional methods relying on long-term historical information and prior user trajectory knowledge to achieve satisfactory performance. To address this issue, SOD-VP360, a viewport prediction approach for 360-degree live video streaming based on salient object detection and Tiles classification, was developed. First, visually salient regions in video frames were extracted through salient object detection to provide content-guided information for viewport prediction. Then, a Tiles-embedded Transformer encoder was constructed to jointly model the visual features of spatial tiles and their temporal dynamics. Furthermore, a lightweight network combined with spherical convolution was adopted to model the geometric distortion of panoramic video while reducing computational complexity. Meanwhile, a reinforcement learning mechanism was incorporated for dynamic Tiles selection and classification so as to improve the accuracy of importance evaluation. Experiments were conducted on a dataset containing 18 virtual reality videos and head-motion trajectories from 48 users. Experimental results show that SOD-VP360 achieves good overall performance in terms of prediction accuracy, error duration, and bandwidth utilization, while meeting the real-time processing requirements of 360-degree live video streaming.

Key words: viewport prediction, 360-degree video, live video streaming, deep learning, salient object detection

摘要: 在360度全景视频直播场景中,受未来视频内容不可预知和严格时延约束影响,传统依赖长时历史信息与用户轨迹先验的视口预测方法难以获得理想性能。针对该问题,设计了一种基于显著目标检测与Tiles分类的360度视频直播视口预测方法SOD-VP360。首先,利用显著目标检测提取视频中的视觉显著区域,为视口预测提供内容引导信息;其次,构建Tiles嵌入式Transformer编码器,对空间图块的视觉特征及时间动态进行联合建模;进一步结合轻量级网络与球形卷积,实现对全景视频几何畸变的建模并降低计算复杂度;同时,引入强化学习机制对Tiles进行动态选择与分类,以提升重要性评估的准确性。在包含18个虚拟现实视频和48名用户头部运动轨迹的数据集上进行了实验验证。实验结果表明,SOD-VP360在预测精度、错误持续时间和带宽占用率等指标上取得了较好的综合表现,并满足了360度视频直播场景下的实时处理要求。

关键词: 视口预测;360度视频;视频直播;深度学习;显著目标检测 

CLC Number: