A fast block-matching algorithm based on adaptive search area and its VLSI architecture for H.264/AVC

被引:24
作者
Xi, Ying-Lai [1 ]
Hao, Chong-Yang [1 ]
Fan, Yang-Yu [1 ]
Hu, Hong-Qi [1 ]
机构
[1] Northwestern Polytech Univ, Inst Elect & Informat Engn, Xian 710072, Peoples R China
关键词
H.264/AVC; VLSI; motion estimation; adaptive search area; early termination; pipelined;
D O I
10.1016/j.image.2006.05.001
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 [电气工程]; 0809 [电子科学与技术];
摘要
In this paper, we propose a fast block-matching algorithm based on search center prediction and search early termination, called center-prediction and early-termination based motion search algorithm (CPETS). The CPETS satisfies high performance and efficient VLSI implementation. It makes use of the spatial and temporal correlation in motion vector (MV) fields and feature of all-zero blocks to accelerate the searching process. This paper describes the CPETS with three levels. At the coarsest level, which happens when center prediction fails, the search area is defined to enclose all original search range. At the middle level, the search area is defined as a 7 x 7-pels square area around the predicted center. At the finest level, a 5 x 5-pels search area around the predicted center is adopted. At each level, 9-points uniformly allocated search pattern is adopted. The experiment results show that the CPETS is able to achieve a reduction of 95.67% encoding time in average compared with full-search scheme, with a negligible peak signal-noise ratio (PSNR) loss and bitrate increase. Also, the efficiency of CPETS outperforms some popular fast algorithms such as: three-step search, new three-step search, four-step search evidently. This paper also describes an efficient four-way pipelined VLSI architecture based on the CPETS for H.264/AVC coding. The proposed architecture divides current block and search area into four subregions, respectively, with 4:1 sub-sampling and processes them in parallel. Also, each sub-region is processed by a pipelined structure to ensure the search for nine candidate points is performed simultaneously. By adopting search early-termination strategy, the architecture can compute one MV for 16 x 16 block in 81 clock cycles in the best case and 901 clock cycles in the poorest case. The architecture has been designed and simulated with VHDL language. Simulation results show that the proposed architecture achieves a high performance for real-time motion estimation. Only 47.3 K gates and 1624 x 8 bits on-chip RAM are needed for a search range of (- 15, + 15) with three reference frames and four candidate block modes by using 36 processing elements. (C) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:626 / 646
页数:21
相关论文
共 33 条
[1]
Fast motion vector estimation using multiresolution-spatio-temporal correlations [J].
Chalidabhongse, J ;
Kuo, CCJ .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1997, 7 (03) :477-488
[2]
CHALIDABHONGSE J, 1995, P SPIE VISUAL COMMUN, V2501, P810
[3]
A cost-effective three-step hierarchical search block-matching chip for motion estimation [J].
Chen, TH .
IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1998, 33 (08) :1253-1258
[4]
CHEN ZB, 2002, JVT ISO IEC MPEG ITU
[5]
A VLSI ARCHITECTURE FOR HIERARCHICAL MOTION ESTIMATION [J].
COSTA, A ;
DEGLORIA, A ;
FARABOSCHI, P ;
PASSAGGIO, F .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1995, 41 (02) :248-257
[6]
DEVOS L, 1990, P SOC PHOTO-OPT INS, V1360, P398, DOI 10.1117/12.24227
[7]
PARAMETERIZABLE VLSI ARCHITECTURES FOR THE FULL-SEARCH BLOCK-MATCHING ALGORITHM [J].
DEVOS, L ;
STEGHERR, M .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1989, 36 (10) :1309-1316
[8]
DUANMU C, 2003, ISCAS, V2, P356
[9]
A flexible parallel architecture adapted to block-matching motion-estimation algorithms [J].
Dutta, S ;
Wolf, W .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1996, 6 (01) :74-86
[10]
DISPLACEMENT MEASUREMENT AND ITS APPLICATION IN INTERFRAME IMAGE-CODING [J].
JAIN, JR ;
JAIN, AK .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1981, 29 (12) :1799-1808