Anopheles gambiae genome reannotation through synthesis of ab initio and comparative gene prediction algorithms

被引:17
作者
Li, J
Riehle, MM
Zhang, Y
Xu, JN
Oduol, F
Gomez, SM
Eiglmeier, K
Beatrix, MU
Shabanowitz, J
Donald, FH
Ribeiro, JMC
Vernick, KD [1 ]
机构
[1] Univ Minnesota, Ctr Microbial & Plant Genom, St Paul, MN 55108 USA
[2] Univ Minnesota, Dept Microbiol, St Paul, MN 55108 USA
[3] Inst Pasteur, Unite Biochim & Biol Mol Insectes, F-75724 Paris 15, France
[4] Inst Pasteur, CNRS, FRE 2849, F-75724 Paris 15, France
[5] Univ Virginia, Dept Chem, Charlottesville, VA 22904 USA
[6] NIAID, Lab Malaria & Vector Res, Bethesda, MD 20892 USA
关键词
D O I
10.1186/gb-2006-7-3-r24
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Complete genome annotation is a necessary tool as Anopheles gambiae researchers probe the biology of this potent malaria vector. Results: We reannotate the A. gambiae genome by synthesizing comparative and ab initio sets of predicted coding sequences (CDSs) into a single set using an exon-gene-union algorithm followed by an open-reading-frame-selection algorithm. The reannotation predicts 20,970 CDSs supported by at least two lines of evidence, and it lowers the proportion of CDSs lacking start and/or stop codons to only approximately 4%. The reannotated CDS set includes a set of 4,681 novel CDSs not represented in the Ensembl annotation but with EST support, and another set of 4,031 Ensembl-supported genes that undergo major structural and, therefore, probably functional changes in the reannotated set. The quality and accuracy of the reannotation was assessed by comparison with end sequences from 20,249 full-length cDNA clones, and evaluation of mass spectrometry peptide hit rates from an A. gambiae shotgun proteomic dataset confirms that the reannotated CDSs offer a high quality protein database for proteomics. We provide a functional proteomics annotation, ReAnoXcel, obtained by analysis of the new CDSs through the AnoXcel pipeline, which allows functional comparisons of the CDS sets within the same bioinformatic platform. CDS data are available for download. Conclusion: Comprehensive A. gambiae genome reannotation is achieved through a combination of comparative and ab initio gene prediction algorithms.
引用
收藏
页数:12
相关论文
共 38 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   GeneWise and genomewise [J].
Birney, E ;
Clamp, M ;
Durbin, R .
GENOME RESEARCH, 2004, 14 (05) :988-995
[4]   Complement-like protein TEP1 is a determinant of vectorial capacity in the malaria vector Anopheles gambiae [J].
Blandin, S ;
Shiao, SH ;
Moita, LF ;
Janse, CJ ;
Waters, AP ;
Kafatos, FC ;
Levashina, EA .
CELL, 2004, 116 (05) :661-670
[5]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[6]   The Ensembl automatic gene annotation system [J].
Curwen, V ;
Eyras, E ;
Andrews, TD ;
Clarke, L ;
Mongin, E ;
Searle, SMJ ;
Clamp, M .
GENOME RESEARCH, 2004, 14 (05) :942-950
[7]   Gene expression patterns associated with blood-feeding in the malaria mosquito Anopheles gambiae -: art. no. 5 [J].
Dana, AN ;
Hong, YS ;
Kern, MK ;
Hillenmeyer, ME ;
Harker, BW ;
Lobo, NF ;
Hogan, JR ;
Romans, P ;
Collins, FH .
BMC GENOMICS, 2005, 6 (1)
[8]   The DNA sequence of human chromosome 22 [J].
Dunham, I ;
Shimizu, N ;
Roe, BA ;
Chissoe, S ;
Dunham, I ;
Hunt, AR ;
Collins, JE ;
Bruskiewich, R ;
Beare, DM ;
Clamp, M ;
Smink, LJ ;
Ainscough, R ;
Almeida, JP ;
Babbage, A ;
Bagguley, C ;
Balley, J ;
Barlow, K ;
Bates, KN ;
Beasley, O ;
Bird, CP ;
Blakey, S ;
Bridgeman, AM ;
Buck, D ;
Burgess, J ;
Burrill, WD ;
Burton, J ;
Carder, C ;
Carter, NP ;
Chen, Y ;
Clark, G ;
Clegg, SM ;
Cobley, V ;
Cole, CG ;
Collier, RE ;
Connor, RE ;
Conroy, D ;
Corby, N ;
Coville, GJ ;
Cox, AV ;
Davis, J ;
Dawson, E ;
Dhami, PD ;
Dockree, C ;
Dodsworth, SJ ;
Durbin, RM ;
Ellington, A ;
Evans, KL ;
Fey, JM ;
Fleming, K ;
French, L .
NATURE, 1999, 402 (6761) :489-495
[9]   The ENCODE (ENCyclopedia of DNA elements) Project [J].
Feingold, EA ;
Good, PJ ;
Guyer, MS ;
Kamholz, S ;
Liefer, L ;
Wetterstrand, K ;
Collins, FS ;
Gingeras, TR ;
Kampa, D ;
Sekinger, EA ;
Cheng, J ;
Hirsch, H ;
Ghosh, S ;
Zhu, Z ;
Pate, S ;
Piccolboni, A ;
Yang, A ;
Tammana, H ;
Bekiranov, S ;
Kapranov, P ;
Harrison, R ;
Church, G ;
Struhl, K ;
Ren, B ;
Kim, TH ;
Barrera, LO ;
Qu, C ;
Van Calcar, S ;
Luna, R ;
Glass, CK ;
Rosenfeld, MG ;
Guigo, R ;
Antonarakis, SE ;
Birney, E ;
Brent, M ;
Pachter, L ;
Reymond, A ;
Dermitzakis, ET ;
Dewey, C ;
Keefe, D ;
Denoeud, F ;
Lagarde, J ;
Ashurst, J ;
Hubbard, T ;
Wesselink, JJ ;
Castelo, R ;
Eyras, E ;
Myers, RM ;
Sidow, A ;
Batzoglou, S .
SCIENCE, 2004, 306 (5696) :636-640
[10]   Candidate odorant receptors from the malaria vector mosquito Anopheles gambiae and evidence of down-regulation in response to blood feeding [J].
Fox, AN ;
Pitts, RJ ;
Robertson, HM ;
Carlson, JR ;
Zwiebel, LJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (25) :14693-14697