Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (6): 1621-1626.DOI: 10.11772/j.issn.1001-9081.2019101802

• Artificial intelligence • Previous Articles     Next Articles

Video translation model from virtual to real driving scenes based on generative adversarial dual networks

LIU Shihao, HU Xuemin, JIANG Bohou, ZHANG Ruohan, KONG Li   

  1. School of Computer Science and Information Engineering, Hubei University, Wuhan Hubei 430062, China
  • Received:2019-10-24 Revised:2019-12-11 Online:2020-06-10 Published:2020-06-18
  • Contact: HU Xuemin, born in 1985, Ph. D., associate professor. His research interests include computer vision, machine learning.
  • About author:LIU Shihao, born in 1999. His research interests include computer version.HU Xuemin, born in 1985, Ph. D., associate professor. His research interests include computer vision, machine learning.JIANG Bohou, born in 1999. His research interests include computer version.JIANG Bohou, born in 1999. His research interests include computer version.KONG Li, born in 1995, M. S. candidate. His research interests include deep learning.
  • Supported by:
    National Natural Science Foundation of China (61806076), the Natural Science Foundation of Hubei Province (2018CFB158), the Undergraduate Innovation and Entrepreneurship Training Plan of Hubei Province (S201910512026), the Student Science Research Project of Chucai Honors College of Hubei University (20182211006).


刘士豪, 胡学敏, 姜博厚, 张若晗, 孔力   

  1. 湖北大学 计算机与信息工程学院,武汉 430062
  • 通讯作者: 胡学敏(1985—)
  • 作者简介:刘士豪(1999—),男,湖北天门人,主要研究方向:计算机视觉。胡学敏(1985—),男,湖南岳阳人,副教授,博士,主要研究方向:计算机视觉、机器学习。姜博厚(1999—),男,湖北武汉人,主要研究方向:计算机视觉。张若晗(1997—),女,湖北襄阳人,硕士研究生,主要研究方向:机器学习。孔力(1995—),男,湖北咸宁人,硕士研究生,主要研究方向:深度学习。
  • 基金资助:

Abstract: To handle the issues of lacking paired training samples and inconsistency between frames in translation from virtual to real driving scenes, a video translation model based on Generative Adversarial Networks was proposed in this paper. In order to solve the problem of lacking data samples, the model adopted a “dual networks” architecture, where the semantic segmentation scene was used as an intermediate transition to build front-part and back-part networks, respectively. In the front-part network, a convolution network and a deconvolution network were adopted, and the optical flow network was also used to extract the dynamic information between frames to implement continuous video translation from virtual to semantic segmentation scenes. In the back-part network, a conditional generative adversarial network was used in which a generator, an image discriminator and a video discriminator were designed and combined with the optical flow network to implement continuous video translation from semantic segmentation to real scenes. Data collected from an autonomous driving simulator and a public data set were used for training and testing. Virtual to real scene translation can be achieved in a variety of driving scenarios, and the translation effect is significantly better than the comparative algorithms. Experimental results show that the proposed model can handle the problems of the discontinuity between frames and the ambiguity for moving obstacles to obtain more continuous videos when applying in various driving scenarios.

Key words: virtual to real, video translation, Generative Adversarial Networks (GAN), optical flow network, driving scene

摘要: 针对虚拟到真实驾驶场景翻译中成对的数据样本缺乏以及前后帧不一致等问题,提出一种基于生成对抗网络的视频翻译模型。为解决数据样本缺乏问题,模型采取“双网络”架构,将语义分割场景作为中间过渡分别构建前、后端网络。在前端网络中,采用卷积和反卷积框架,并利用光流网络提取前后帧的动态信息,实现从虚拟场景到语义分割场景的连续的视频翻译;在后端网络中,采用条件生成对抗网络框架,设计生成器、图像判别器和视频判别器,并结合光流网络,实现从语义分割场景到真实场景的连续的视频翻译。实验利用从自动驾驶模拟器采集的数据与公开数据集进行训练和测试,在多种驾驶场景中能够实现虚拟到真实场景的翻译,翻译效果明显好于对比算法。结果表明,所提模型能够有效解决前后帧不连续和动态目标模糊的问题,使翻译的视频更为流畅,并且能适应多种复杂的驾驶场景。

关键词: 虚拟到真实, 视频翻译, 生成对抗网络, 光流网络, 驾驶场景

CLC Number: