A convolutional neural network approach for objective video quality assessment

被引:128
作者
Le Callet, Patrick [1 ]
Viard-Gaudin, Christian [1 ]
Barba, Dominique [1 ]
机构
[1] Univ Nantes, Inst Rech Commun & Cybernet Nantes, F-44306 Nantes, France
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2006年 / 17卷 / 05期
关键词
convolutional neural network (CNN); MPEG-2; temporal pooling; video quality assessment;
D O I
10.1109/TNN.2006.879766
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes an application of neural networks in the field of objective measurement method designed to automatically assess the perceived quality of digital videos. This challenging issue aims to emulate human judgment and to replace very,complex and time consuming subjective quality assessment. Several metrics have been proposed in literature to tackle this issue. They are based on a general framework that combines different stages, each of them addressing complex problems. The ambition of this paper is not to present a global perfect quality metric but rather to focus on an original way to use neural networks in such a framework in the context of reduced reference (RR) quality metric. Especially, we point out the interest of such a tool for combining features and pooling them in order to compute quality scores. The proposed approach solves some problems inherent to objective metrics that should predict subjective quality score obtained using the single stimulus continuous quality evaluation (SSCQE) method. This latter has been adopted by video quality expert group (VQEG) in its recently finalized reduced referenced and no reference (RRNR-TV) test plan. The originality of such approach compared to previous attempts to use neural networks for quality assessment, relies on the use of a convolutional neural network (CNN) that allows a continuous time scoring of the video. Objective features are extracted on a frame-by-frame basis on both the reference and the distorted sequences; they are derived from a perceptual-based representation and integrated along the temporal axis using a time-delay neural network (TDNN). Experiments conducted on different MPEG-2 videos, with bit rates ranging 2-6 Mb/s, show the effectiveness of the proposed approach to get a plausible model of temporal pooling from the human vision system (HVS) point of view. More specifically, a linear correlation criteria, between objective and subjective scoring, up to 0.92 has been obtained on a set of typical TV videos.
引用
收藏
页码:1316 / 1327
页数:12
相关论文
共 39 条
[1]   A calibration method for continuous video quality (SSCQE) measurements [J].
Aldridge, R ;
Pearson, D .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2000, 16 (03) :321-332
[2]  
BAINA J, 1999, SOC MOTION PICTURE T, P108
[3]  
Bishop C. M, 1995, NEURAL NETWORKS PATT
[4]  
CARNEC M, 2003, INT C IM PROC ICIP B
[5]  
CARNEC M, 2003, P SPIE VIS COMM IM P, P1582
[6]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[7]  
DZUMRA M, 1998, P SOC PHOTO-OPT INS, V3299, P194
[8]   Image processing with neural networks - a review [J].
Egmont-Petersen, M ;
de Ridder, D ;
Handels, H .
PATTERN RECOGNITION, 2002, 35 (10) :2279-2301
[9]   COGNITRON - SELF-ORGANIZING MULTILAYERED NEURAL NETWORK [J].
FUKUSHIMA, K .
BIOLOGICAL CYBERNETICS, 1975, 20 (3-4) :121-136
[10]   Objective quality assessment of MPEG-2 video streams by using CBP neural networks [J].
Gastaldo, P ;
Rovetta, S ;
Zunino, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2002, 13 (04) :939-947