Assembly, annotation, and integration of UNIGENE clusters into the human genome draft

被引:46
作者
Zhuo, D
Zhao, WD
Wright, FA
Yang, HY
Wang, JP
Sears, R
Baer, T
Kwon, DH
Gordon, D
Gibbs, S
Dai, D
Yang, Q
Spitzner, J
Krahe, R
Stredney, D
Stutz, A
Yuan, B [1 ]
机构
[1] Ohio State Univ, James Canc Hosp & Solove Res Inst, Bioinformat Grp, Columbus, OH 43210 USA
[2] Ohio State Univ, James Canc Hosp & Solove Res Inst, Div Human Canc Genet, Columbus, OH 43210 USA
[3] Ohio Supercomp Ctr, Columbus, OH 43212 USA
[4] Labbook Com, Columbus, OH 43229 USA
关键词
D O I
10.1101/gr.GR-1645R
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The recent release of the first draft of the human genome provides an unprecedented opportunity to integrate human genes and their functions irt a complete positional context. However, at least three significant technical hurdles remain: first, to assemble a complete and nonredundant human transcript index; second, to accurately place the individual transcript indices on the human genome; and third, to functionally annotate ail human genes. Here, we report the extension of the UNIGENE database through the assembly of its sequence clusters into nonredundant sequence contigs. Each resulting consensus was aligned to the human genome draft. A unique location for each transcript within the human genome was determined by the integration of the restriction fingerprint, assembled genomic contig, and radiation hybrid (RH) maps. A total of 59,500 UNIGENE clusters were mapped on the basis of at least three independent criteria as compared with the 30,000 human genes/ESTs currently mapped in Genemap'99. Finally, the extension of the human transcript consensus in this study enabled a greater number of putative Functional assignments than the 11,000 annotated entries in UNIGENE. This study reports a draft physical map with annotations for a majority of the human transcripts, called the Human Index of Nonredundant Transcripts (HINT). Such information can be immediately applied to the discovery of new genes and the identification of candidate genes for positional cloning.
引用
收藏
页码:904 / 918
页数:15
相关论文
共 30 条
[1]   Toward the development of a gene index to the human genome: An assessment of the nature of high-throughput EST sequence data [J].
Aaronson, JS ;
Eckman, B ;
Blevins, RA ;
Borkowski, JA ;
Myerson, J ;
Imran, S ;
Elliston, KO .
GENOME RESEARCH, 1996, 6 (09) :829-845
[2]  
ADAMS MD, 1995, NATURE, V377, P3
[3]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[4]   2 ACETYL-COA ACETYLTRANSFERASE GENES LOCATED IN THE T-COMPLEX REGION OF MOUSE CHROMOSOME-17 PARTIALLY OVERLAP THE TCP-1 AND TCP-1X GENES [J].
ASHWORTH, A .
GENOMICS, 1993, 18 (02) :195-198
[5]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[6]   The Protein Information Resource (PIR) [J].
Barker, WC ;
Garavelli, JS ;
Huang, HZ ;
McGarvey, PB ;
Orcutt, BC ;
Srinivasarao, GY ;
Xiao, CL ;
Yeh, LSL ;
Ledley, RS ;
Janda, JF ;
Pfeiffer, F ;
Mewes, HW ;
Tsugita, A ;
Wu, C .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :41-44
[7]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
[8]   ESTABLISHING A HUMAN TRANSCRIPT MAP [J].
BOGUSKI, MS ;
SCHULER, GD .
NATURE GENETICS, 1995, 10 (04) :369-371
[9]   GENE DISCOVERY IN DBEST [J].
BOGUSKI, MS ;
TOLSTOSHEV, CM ;
BASSETT, DE .
SCIENCE, 1994, 265 (5181) :1993-1994
[10]   Reliable identification of large numbers of candidate SNPs from public EST data [J].
Buetow, KH ;
Edmonson, MN ;
Cassidy, AB .
NATURE GENETICS, 1999, 21 (03) :323-325