Comparing vertebrate whole-genome shotgun reads to the human genome

被引:14
作者
Chen, R [1 ]
Bouck, JB [1 ]
Weinstock, GM [1 ]
Gibbs, RA [1 ]
机构
[1] Baylor Coll Med, Human Genome Sequencing Ctr, Dept Mol & Human Genet, Houston, TX 77030 USA
关键词
D O I
10.1101/gr.203601
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation Of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted similar to 42,500 genes in the human, slightly more than reported previously.
引用
收藏
页码:1807 / 1816
页数:10
相关论文
共 15 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   A ''double adaptor'' method for improved shotgun library construction [J].
Andersson, B ;
Wentland, MA ;
Ricafrente, JY ;
Liu, W ;
Gibbs, RA .
ANALYTICAL BIOCHEMISTRY, 1996, 236 (01) :107-113
[3]   Large-scale sequencing in human chromosome 12p13: Experimental and computational gene structure determination [J].
AnsariLari, MA ;
Shen, Y ;
Muzny, DM ;
Lee, W ;
Gibbs, RA .
GENOME RESEARCH, 1997, 7 (03) :268-280
[4]   The Human Transcript Database: a catalogue of full length cDNA inserts [J].
Bouck, J ;
McLeod, MP ;
Worley, K ;
Gibbs, RA .
BIOINFORMATICS, 2000, 16 (02) :176-177
[5]   Shotgun sample sequence comparisons between mouse and human genomes [J].
Bouck, JB ;
Metzker, ML ;
Gibbs, RA .
NATURE GENETICS, 2000, 25 (01) :31-33
[6]   Analysis of expressed sequence tags indicates 35,000 human genes [J].
Ewing, B ;
Green, P .
NATURE GENETICS, 2000, 25 (02) :232-234
[7]   Prediction of transcription regulatory sites in Archaea by a comparative genomic approach [J].
Gelfand, MS ;
Koonin, EV ;
Mironov, AA .
NUCLEIC ACIDS RESEARCH, 2000, 28 (03) :695-705
[8]   Initial sequencing and analysis of the human genome [J].
Lander, ES ;
Int Human Genome Sequencing Consortium ;
Linton, LM ;
Birren, B ;
Nusbaum, C ;
Zody, MC ;
Baldwin, J ;
Devon, K ;
Dewar, K ;
Doyle, M ;
FitzHugh, W ;
Funke, R ;
Gage, D ;
Harris, K ;
Heaford, A ;
Howland, J ;
Kann, L ;
Lehoczky, J ;
LeVine, R ;
McEwan, P ;
McKernan, K ;
Meldrim, J ;
Mesirov, JP ;
Miranda, C ;
Morris, W ;
Naylor, J ;
Raymond, C ;
Rosetti, M ;
Santos, R ;
Sheridan, A ;
Sougnez, C ;
Stange-Thomann, N ;
Stojanovic, N ;
Subramanian, A ;
Wyman, D ;
Rogers, J ;
Sulston, J ;
Ainscough, R ;
Beck, S ;
Bentley, D ;
Burton, J ;
Clee, C ;
Carter, N ;
Coulson, A ;
Deadman, R ;
Deloukas, P ;
Dunham, A ;
Dunham, I ;
Durbin, R ;
French, L .
NATURE, 2001, 409 (6822) :860-921
[9]   An evolutionary trace method defines binding surfaces common to protein families [J].
Lichtarge, O ;
Bourne, HR ;
Cohen, FE .
JOURNAL OF MOLECULAR BIOLOGY, 1996, 257 (02) :342-358
[10]   Electrophoretically uniform fluorescent dyes for automated DNA sequencing [J].
Metzker, ML ;
Lu, J ;
Gibbs, RA .
SCIENCE, 1996, 271 (5254) :1420-1422