Profile Hidden Markov Models for the Detection of Viruses within Metagenomic Sequence Data

被引:134
作者
Skewes-Cox, Peter [1 ,2 ,3 ,7 ]
Sharpton, Thomas J. [4 ]
Pollard, Katherine S. [4 ,5 ,6 ]
DeRisi, Joseph L. [2 ,3 ,7 ]
机构
[1] Univ Calif San Francisco, Biol & Med Informat Grad Program, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Dept Med Biochem & Biophys, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Dept Microbiol, San Francisco, CA 94143 USA
[4] Univ Calif San Francisco, J David Gladstone Inst, San Francisco, CA 94143 USA
[5] Univ Calif San Francisco, Inst Human Genet, San Francisco, CA 94143 USA
[6] Univ Calif San Francisco, Div Biostat, San Francisco, CA 94143 USA
[7] Howard Hughes Med Inst, Bethesda, MD 20817 USA
来源
PLOS ONE | 2014年 / 9卷 / 08期
基金
美国国家科学基金会;
关键词
PROTEIN FAMILIES DATABASE; HOMOLOGY DETECTION; IDENTIFICATION; ALIGNMENT; VIROLOGY; ILLNESS; TRENDS; GENE; RNA;
D O I
10.1371/journal.pone.0105067
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams'') to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).
引用
收藏
页数:12
相关论文
共 45 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   IDENTIFICATION AND SEPARATION OF THE 2 SUBUNITS OF THE HERPES-SIMPLEX VIRUS RIBONUCLEOTIDE REDUCTASE [J].
BACCHETTI, S ;
EVELEGH, MJ ;
MUIRHEAD, B .
JOURNAL OF VIROLOGY, 1986, 57 (03) :1177-1181
[3]  
Bexfield N., 2010, Vet J
[4]   Mimivirus shows dramatic genome reduction after intraamoebal culture [J].
Boyer, Michael ;
Azza, Said ;
Barrassi, Lina ;
Klose, Thomas ;
Campocasso, Angelique ;
Pagnier, Isabelle ;
Fournous, Ghislain ;
Borg, Audrey ;
Robert, Catherine ;
Zhang, Xinzheng ;
Desnues, Christelle ;
Henrissat, Bernard ;
Rossmann, Michael G. ;
La Scola, Bernard ;
Raoult, Didier .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (25) :10296-10301
[5]   Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships [J].
Brenner, SE ;
Chothia, C ;
Hubbard, TJP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (11) :6073-6078
[6]   Next-generation sequencing technology in clinical virology [J].
Capobianchi, M. R. ;
Giombini, E. ;
Rozera, G. .
CLINICAL MICROBIOLOGY AND INFECTION, 2013, 19 (01) :15-22
[7]   Viral metagenomics [J].
Delwart, Eric L. .
REVIEWS IN MEDICAL VIROLOGY, 2007, 17 (02) :115-131
[8]   Emerging pathogens: Challenges and successes of molecular diagnostics [J].
Dong, Jianli ;
Olano, Juan P. ;
McBride, Jere W. ;
Walker, David H. .
JOURNAL OF MOLECULAR DIAGNOSTICS, 2008, 10 (03) :185-197
[9]   A conserved family of cellular genes related to the baculovirus iap gene and encoding apoptosis inhibitors [J].
Duckett, CS ;
Nava, VE ;
Gedrich, RW ;
Clem, RJ ;
VanDongen, JL ;
Gilfillan, MC ;
Shiels, H ;
Hardwick, JM ;
Thompson, CB .
EMBO JOURNAL, 1996, 15 (11) :2685-2694
[10]   Profile hidden Markov models [J].
Eddy, SR .
BIOINFORMATICS, 1998, 14 (09) :755-763