Iterative gene prediction and pseudogene removal improves genome annotation

被引:44
作者
van Baren, MJ [1 ]
Brent, MR [1 ]
机构
[1] Washington Univ, Dept Comp Sci, Lab Computat Genom, St Louis, MO 63130 USA
关键词
D O I
10.1101/gr.4766206
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Correct gene prediction is impaired by the presence of processed pseudogenes: nonfunctional, intronless copies of real genes found elsewhere in the genome. Gene prediction programs frequently mistake processed pseudogenes for real genes or exons, leading to biologically irrelevant gene predictions. While methods exist to identify processed pseudogenes in genomes, no attempt has been made to integrate pseudogene removal with gene prediction, or even to provide a freestanding tool that identifies such erroneous gene predictions. We have created PPFINDER (for Processed Pseudogene finder), a program that integrates several methods of processed pseudogene finding in mammalian gene annotations. We used PPFINDER to remove pseudogenes from N-SCAN gene predictions, and show that gene prediction improves substantially when gene prediction and pseudogene masking are interleaved. In addition, we used PPFINDER with gene predictions as a parent database, eliminating the need for libraries of known genes. This allows us to run the gene prediction/PPFINDER procedure on newly sequenced genomes for which few genes are known.
引用
收藏
页码:678 / 685
页数:8
相关论文
共 28 条
  • [1] SLAM: Cross-species gene finding and alignment with a generalized pair hidden Markov model
    Alexandersson, M
    Cawley, S
    Pachter, L
    [J]. GENOME RESEARCH, 2003, 13 (03) : 496 - 502
  • [2] The Vertebrate Genome Annotation (Vega) database
    Ashurst, JL
    Chen, CK
    Gilbert, JGR
    Jekosch, K
    Keenan, S
    Meidl, P
    Searle, SM
    Stalker, J
    Storey, R
    Trevanion, S
    Wilming, L
    Hubbard, T
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D459 - D465
  • [3] BLANCO E, 2003, CURRENT PROTOCOLS BI
  • [4] Prediction of complete gene structures in human genomic DNA
    Burge, C
    Karlin, S
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) : 78 - 94
  • [5] Retroelements and formation of chimeric retrogenes
    Buzdin, AA
    [J]. CELLULAR AND MOLECULAR LIFE SCIENCES, 2004, 61 (16) : 2046 - 2059
  • [6] The Ensembl automatic gene annotation system
    Curwen, V
    Eyras, E
    Andrews, TD
    Clarke, L
    Mongin, E
    Searle, SMJ
    Clamp, M
    [J]. GENOME RESEARCH, 2004, 14 (05) : 942 - 950
  • [7] The DNA sequence of human chromosome 22
    Dunham, I
    Shimizu, N
    Roe, BA
    Chissoe, S
    Dunham, I
    Hunt, AR
    Collins, JE
    Bruskiewich, R
    Beare, DM
    Clamp, M
    Smink, LJ
    Ainscough, R
    Almeida, JP
    Babbage, A
    Bagguley, C
    Balley, J
    Barlow, K
    Bates, KN
    Beasley, O
    Bird, CP
    Blakey, S
    Bridgeman, AM
    Buck, D
    Burgess, J
    Burrill, WD
    Burton, J
    Carder, C
    Carter, NP
    Chen, Y
    Clark, G
    Clegg, SM
    Cobley, V
    Cole, CG
    Collier, RE
    Connor, RE
    Conroy, D
    Corby, N
    Coville, GJ
    Cox, AV
    Davis, J
    Dawson, E
    Dhami, PD
    Dockree, C
    Dodsworth, SJ
    Durbin, RM
    Ellington, A
    Evans, KL
    Fey, JM
    Fleming, K
    French, L
    [J]. NATURE, 1999, 402 (6761) : 489 - 495
  • [8] Leveraging the mouse genome for gene prediction in human: From whole-genome shotgun reads to a global synteny map
    Flicek, P
    Keibler, E
    Hu, P
    Korf, I
    Brent, MR
    [J]. GENOME RESEARCH, 2003, 13 (01) : 46 - 54
  • [9] Using multiple alignments to improve gene prediction
    Gross, SS
    Brent, MR
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2006, 13 (02) : 379 - 393
  • [10] The DNA sequence of human chromosome 7
    Hillier, LW
    Fulton, RS
    Fulton, LA
    Graves, TA
    Pepin, KH
    Wagner-McPherson, C
    Layman, D
    Maas, J
    Jaeger, S
    Walker, R
    Wylie, K
    Sekhon, M
    Becker, MC
    O'Laughlin, MD
    Schaller, ME
    Fewell, GA
    Delehaunty, KD
    Miner, TL
    Nash, WE
    Cordes, M
    Du, H
    Sun, H
    Edwards, J
    Bradshaw-Cordum, H
    Ali, J
    Andrews, S
    Isak, A
    VanBrunt, A
    Nguyen, C
    Du, FY
    Lamar, B
    Courtney, L
    Kalicki, J
    Ozersky, P
    Bielicki, L
    Scott, K
    Holmes, A
    Harkins, R
    Harris, A
    Strong, CM
    Hou, SF
    Tomlinson, C
    Dauphin-Kohlberg, S
    Kozlowicz-Reilly, A
    Leonard, S
    Rohlfing, T
    Rock, SM
    Tin-Wollam, AM
    Abbott, A
    Minx, P
    [J]. NATURE, 2003, 424 (6945) : 157 - U2