The difficulty of identifying genes in anonymous vertebrate sequences

被引:20
作者
Claverie, JM
Poirot, O
Lopez, F
机构
[1] Struct. and Genetic Info. Laboratory, C.N.R.S.-E.P. 91, Inst. Struct. Biol. and Microbiol., Marseille 13402
来源
COMPUTERS & CHEMISTRY | 1997年 / 21卷 / 04期
关键词
D O I
10.1016/S0097-8485(96)00039-3
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The identification of genes in newly determined vertebrate genomic sequences can range from a trivial to an impossible task. In a statistical preamble, we show how ''insignificant'' are the individual features on which gene identification can be rigorously based: promoter signals, splice sites, open reading frames, etc. The practical identification of genes is thus ultimately a tributary of their resemblance to those already present in sequence databases, or incorporated into training sets. The inherent conservatism of the currently popular methods (database similarity search, GRAIL) will greatly limit our capacity for making unexpected biological discoveries from increasingly abundant genomic data. Beyond a very limited subset of trivial cases, the automated interpretation (i.e. without experimental validation) of genomic data, is still a myth. On the other hand, characterizing the 60 000 to 100 000 genes thought to be hidden in the human genome by the mean of individual experiments is not feasible. Thus, it appears that our only hope of turning genome data into genome information must rely on drastic progresses in the way we identify and analyse genes in silico. (C) 1997 Elsevier Science Ltd.
引用
收藏
页码:203 / 214
页数:12
相关论文
共 60 条
[1]   COMPLEMENTARY-DNA SEQUENCING - EXPRESSED SEQUENCE TAGS AND HUMAN GENOME PROJECT [J].
ADAMS, MD ;
KELLEY, JM ;
GOCAYNE, JD ;
DUBNICK, M ;
POLYMEROPOULOS, MH ;
XIAO, H ;
MERRIL, CR ;
WU, A ;
OLDE, B ;
MORENO, RF ;
KERLAVAGE, AR ;
MCCOMBIE, WR ;
VENTER, JC .
SCIENCE, 1991, 252 (5013) :1651-1656
[2]   ISSUES IN SEARCHING MOLECULAR SEQUENCE DATABASES [J].
ALTSCHUL, SF ;
BOGUSKI, MS ;
GISH, W ;
WOOTTON, JC .
NATURE GENETICS, 1994, 6 (02) :119-129
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]   Detection of eukaryotic promoters using Markov transition matrices [J].
Audic, S ;
Claverie, JM .
COMPUTERS & CHEMISTRY, 1997, 21 (04) :223-227
[5]  
AUDIC S, 1997, UNPUB
[6]  
BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578
[7]   Mammalian X-chromosome inactivation and the XIST gene [J].
Ballabio, Andrea ;
Willard, Huntington F. .
CURRENT OPINION IN GENETICS & DEVELOPMENT, 1992, 2 (03) :439-447
[8]   Visualizing the spatial relationships between defined DNA sequences and the axial region of extracted metaphase chromosomes [J].
Bickmore, WA ;
Oghene, K .
CELL, 1996, 84 (01) :95-104
[9]   DBEST - DATABASE FOR EXPRESSED SEQUENCE TAGS [J].
BOGUSKI, MS ;
LOWE, TMJ ;
TOLSTOSHEV, CM .
NATURE GENETICS, 1993, 4 (04) :332-333
[10]   GENE DISCOVERY IN DBEST [J].
BOGUSKI, MS ;
TOLSTOSHEV, CM ;
BASSETT, DE .
SCIENCE, 1994, 265 (5181) :1993-1994