What are textons?

被引:132
作者
Zhu, SC [1 ]
Guo, CE [1 ]
Wang, YZ [1 ]
Xu, ZJ [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci & Stat, Los Angeles, CA 90095 USA
关键词
textons; motons; lightons; transformed component analysis; textures;
D O I
10.1007/s11263-005-4638-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Textons refer to fundamental micro-structures in natural images (and videos) and are considered as the atoms of pre-attentive human visual perception (Julesz, 1981). Unfortunately, the word "texton" remains a vague concept in the literature for lack of a good mathematical model. In this article, we first present a three-level generative image model for learning textons from texture images. In this model, an image is a superposition of a number of image bases selected from an over-complete dictionary including various Gabor and Laplacian of Gaussian functions at various locations, scales, and orientations. These image bases are, in turn, generated by a smaller number of texton elements, selected from a dictionary of textons. By analogy to the waveform-phoneme-word hierarchy in speech, the pixel-base-texton hierarchy presents an increasingly abstract visual description and leads to dimension reduction and variable decoupling. By fitting the generative model to observed images, we can learn the texton dictionary as parameters of the generative model. Then the paper proceeds to study the geometric, dynamic, and photometric structures of the texton representation by further extending the generative model to account for motion and illumination variations. (1) For the geometric structures, a texton consists of a number of image bases with deformable spatial configurations. The geometric structures are learned from static texture images. (2) For the dynamic structures, the motion of a texton is characterized by a Markov chain model in time which sometimes can switch geometric configurations during the movement. We call the moving textons as "motons". The dynamic models are learned using the trajectories of the textons inferred from video sequence. (3) For photometric structures. a texton represents the set of images of a 3D surface element under varying illuminations and is called a 'lighton" in this paper. We adopt an illumination-cone representation where a lighton is a texton triplet. For a Given light source, a lighton image is generated as a linear sum of the three texton bases. We present a sequence of experiments for learning the geometric, dynamic, and photometric structures from images and videos, and we also present some comparison studies with K-mean clustering, sparse coding, independent component analysis, and transformed component analysis. We shall discuss how general textons can be learned from generic natural images.
引用
收藏
页码:121 / 143
页数:23
相关论文
共 37 条
[1]  
Adelson E. H., 1996, Perception as Bayesian inference, P409, DOI [DOI 10.1017/CBO9780511984037.014, 10.1017/CBO9780511984037.014]
[2]   WHAT DOES THE RETINA KNOW ABOUT NATURAL SCENES [J].
ATICK, JJ ;
REDLICH, AN .
NEURAL COMPUTATION, 1992, 4 (02) :196-210
[3]  
Barlow H. B., 1961, SENS COMMUN, P217, DOI DOI 10.7551/MITPRESS/9780262518420.003.0013
[4]  
Belhumeur Peter N., 1999, IJCV, V35
[5]  
Belhumeur Peter N., 1998, INT J COMPUTER VISIO, V28
[6]   AN INFORMATION MAXIMIZATION APPROACH TO BLIND SEPARATION AND BLIND DECONVOLUTION [J].
BELL, AJ ;
SEJNOWSKI, TJ .
NEURAL COMPUTATION, 1995, 7 (06) :1129-1159
[7]  
Bergeaud F, 1996, COMPUT APPL MATH, V15, P97
[8]   ENTROPY-BASED ALGORITHMS FOR BEST BASIS SELECTION [J].
COIFMAN, RR ;
WICKERHAUSER, MV .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (02) :713-718
[9]  
DANA KJ, 1999, P IEEE WORKSH INT AP, P46
[10]   UNCERTAINTY RELATION FOR RESOLUTION IN SPACE, SPATIAL-FREQUENCY, AND ORIENTATION OPTIMIZED BY TWO-DIMENSIONAL VISUAL CORTICAL FILTERS [J].
DAUGMAN, JG .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (07) :1160-1169