Journal of Computer Applications
Next Articles
Received:
Revised:
Online:
Published:
程帅博,颜佳
通讯作者:
Abstract: To address the difficulty in effectively assessing the quality of short videos with rich content and complex structure, a method based on the Contrastive Language-Image Pre-Training (CLIP) model is proposed. Firstly, according to the special form of short videos, an efficient structural feature extraction module is designed to capture its textual and layout characteristics. Next, to enhance the representation of global features, a multi-feature extractor is developed to capture quality features across three dimensions: the spatiotemporal quality, the structural quality, and the perceptual quality, ensuring comprehensive coverage of semantic information and distortion characteristics. Finally, a text input template is constructed to guide the quality feature fusion process using the CLIP features extracted from video frames. Experimental results on four benchmark datasets demonstrate that the proposed method achieves superior accuracy and stability. Specifically, on the KVQ short video dataset, the Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank order Correlation Coefficient (SRCC) reach 0.922 and 0.919, respectively. On the TaoLive livestream dataset, the two metrics show an average improvement of 1.1% compared to the second-best method. In terms of generalization, the cross-dataset performance achieves an average improvement of 1.6%, which is suitable for a wide range of application scenarios.
Key words: short video, video quality assessment, structural feature, Contrastive Language-Image Pre-Training model, Human visual system
摘要: 针对短视频内容丰富、结构复杂,难以进行有效质量评估的问题,提出了一种基于CLIP模型的短视频质量评价方法。首先依据短视频的特殊形式,设计了一个高效的结构特征提取模块,用来捕捉其文本、布局特性;在此基础上,构建了多特征提取器,从时空质量、结构质量和感知质量三方面捕捉视频不同维度的质量特征,包括全面的语义信息和失真特性;最后,构建文本输入模板,利用视频帧的CLIP特征引导质量特征融合过程。在四个主流数据集上的结果表明,该算法具有更高的准确性和稳定性。在短视频数据集KVQ上的皮尔逊线性相关系数(PLCC)和斯皮尔曼秩相关系数(SRCC)分别达到0.922,0.919;在直播数据集TaoLive上两个指标相对于次优方法平均提升了1.1%。在泛化性方面,跨数据集效果平均提升1.6%,适用于广泛的应用场景。
关键词: 短视频, 视频质量评价, 结构特征, CLIP模型, 人类视觉系统
CLC Number:
TP391
程帅博 颜佳. 基于CLIP模型的短视频质量评价[J]. 《计算机应用》唯一官方网站, DOI: 10.11772/j.issn.1001-9081.2025020201.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025020201