Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP plus

被引：1199

作者：

Davydov, Eugene V. ^{[1
]}

Goode, David L. ^{[2
]}

Sirota, Marina ^{[3
]}

Cooper, Gregory M. ^{[4
,5
]}

Sidow, Arend ^{[2
,6
]}

Batzoglou, Serafim ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA

[2] Stanford Univ, Dept Genet, Sch Med, Stanford, CA 94305 USA

[3] Stanford Univ, Biomed Informat Program, Stanford, CA 94305 USA

[4] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

[5] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA

[6] Stanford Univ, Sch Med, Dept Pathol, Stanford, CA 94305 USA

来源：

PLOS COMPUTATIONAL BIOLOGY | 2010年 / 6卷 / 12期

基金：

美国国家科学基金会;

关键词：

MAXIMUM-LIKELIHOOD; SEQUENCES; ELEMENTS; UCSC; DNA; IDENTIFICATION; 1-PERCENT; BROWSER; PROJECT;

D O I：

10.1371/journal.pcbi.1001025

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide-and element-level constraint scores within deep multiple sequence alignments.

引用

页数：13

共 22 条

[1] BASIC LOCAL ALIGNMENT SEARCH TOOL
ALTSCHUL, SF
GISH, W
MILLER, W
MYERS, EW
LIPMAN, DJ
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
[2] [Anonymous], 2002, Algorithms for Minimization Without Derivatives
[3] Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project
Birney, Ewan
Stamatoyannopoulos, John A.
Dutta, Anindya
Guigo, Roderic
Gingeras, Thomas R.
Margulies, Elliott H.
Weng, Zhiping
Snyder, Michael
Dermitzakis, Emmanouil T.
Stamatoyannopoulos, John A.
Thurman, Robert E.
Kuehn, Michael S.
Taylor, Christopher M.
Neph, Shane
Koch, Christoph M.
Asthana, Saurabh
Malhotra, Ankit
Adzhubei, Ivan
Greenbaum, Jason A.
Andrews, Robert M.
Flicek, Paul
Boyle, Patrick J.
Cao, Hua
Carter, Nigel P.
Clelland, Gayle K.
Davis, Sean
Day, Nathan
Dhami, Pawandeep
Dillon, Shane C.
Dorschner, Michael O.
Fiegler, Heike
Giresi, Paul G.
Goldy, Jeff
Hawrylycz, Michael
Haydock, Andrew
Humbert, Richard
James, Keith D.
Johnson, Brett E.
Johnson, Ericka M.
Frum, Tristan T.
Rosenzweig, Elizabeth R.
Karnani, Neerja
Lee, Kirsten
Lefebvre, Gregory C.
Navas, Patrick A.
Neri, Fidencio
Parker, Stephen C. J.
Sabo, Peter J.
Sandstrom, Richard
Shafer, Anthony
[J]. NATURE, 2007, 447 (7146) : 799 - 816
[4] Aligning multiple genomic sequences with the threaded blockset aligner
Blanchette, M
Kent, WJ
Riemer, C
Elnitski, L
Smit, AFA
Roskin, KM
Baertsch, R
Rosenbloom, K
Clawson, H
Green, ED
Haussler, D
Miller, W
[J]. GENOME RESEARCH, 2004, 14 (04) : 708 - 715
[5] Distribution and intensity of constraint in mammalian genomic sequence
Cooper, GM
Stone, EA
Asimenos, G
Green, ED
Batzoglou, S
Sidow, A
[J]. GENOME RESEARCH, 2005, 15 (07) : 901 - 913
[6] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
DEMPSTER, AP
LAIRD, NM
RUBIN, DB
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
[7] The ENCODE (ENCyclopedia of DNA elements) Project
Feingold, EA
Good, PJ
Guyer, MS
Kamholz, S
Liefer, L
Wetterstrand, K
Collins, FS
Gingeras, TR
Kampa, D
Sekinger, EA
Cheng, J
Hirsch, H
Ghosh, S
Zhu, Z
Pate, S
Piccolboni, A
Yang, A
Tammana, H
Bekiranov, S
Kapranov, P
Harrison, R
Church, G
Struhl, K
Ren, B
Kim, TH
Barrera, LO
Qu, C
Van Calcar, S
Luna, R
Glass, CK
Rosenfeld, MG
Guigo, R
Antonarakis, SE
Birney, E
Brent, M
Pachter, L
Reymond, A
Dermitzakis, ET
Dewey, C
Keefe, D
Denoeud, F
Lagarde, J
Ashurst, J
Hubbard, T
Wesselink, JJ
Castelo, R
Eyras, E
Myers, RM
Sidow, A
Batzoglou, S
[J]. SCIENCE, 2004, 306 (5696) : 636 - 640
[8] EVOLUTIONARY TREES FROM DNA-SEQUENCES - A MAXIMUM-LIKELIHOOD APPROACH
FELSENSTEIN, J
[J]. JOURNAL OF MOLECULAR EVOLUTION, 1981, 17 (06) : 368 - 376
[9] Identifying novel constrained elements by exploiting biased substitution patterns
Garber, Manuel
Guttman, Mitchell
Clamp, Michele
Zody, Michael C.
Friedman, Nir
Xie, Xiaohui
[J]. BIOINFORMATICS, 2009, 25 (12) : I54 - I62
[10] CONTRAST:: a discriminative, phylogeny-free approach to multiple informant de novo gene prediction
Gross, Samuel S.
Do, Chuong B.
Sirota, Marina
Batzoglou, Serafim
[J]. GENOME BIOLOGY, 2007, 8 (12)

← 1 2 3 →