Identifying and Seeing beyond Multiple Sequence Alignment Errors Using Intra-Molecular Protein Covariation
被引:17
作者:
Dickson, Russell J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Dept Biochem, London, ON, CanadaUniv Western Ontario, Dept Biochem, London, ON, Canada
Dickson, Russell J.
[1
]
Wahl, Lindi M.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Dept Appl Math, London, ON N6A 5B9, CanadaUniv Western Ontario, Dept Biochem, London, ON, Canada
Wahl, Lindi M.
[2
]
Fernandes, Andrew D.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Dept Biochem, London, ON, Canada
Univ Western Ontario, Dept Appl Math, London, ON N6A 5B9, CanadaUniv Western Ontario, Dept Biochem, London, ON, Canada
Fernandes, Andrew D.
[1
,2
]
Gloor, Gregory B.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Western Ontario, Dept Biochem, London, ON, CanadaUniv Western Ontario, Dept Biochem, London, ON, Canada
Gloor, Gregory B.
[1
]
机构:
[1] Univ Western Ontario, Dept Biochem, London, ON, Canada
[2] Univ Western Ontario, Dept Appl Math, London, ON N6A 5B9, Canada
Background: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. Methodology/Principal Findings: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. Conclusions/Significance: Protein alignments with errors lead to false positive and false negative conclusions ( incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.