CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

被引:36
作者
Borozan, Ivan [1 ]
Wilson, Shane [1 ]
Blanchette, Paola [4 ]
Laflamme, Philippe [1 ]
Watt, Stuart N. [1 ]
Krzyzanowski, Paul M. [2 ,3 ]
Sircoulomb, Fabrice [2 ,3 ]
Rottapel, Robert [2 ,3 ]
Branton, Philip E. [4 ,5 ,6 ]
Ferretti, Vincent [1 ]
机构
[1] Ontario Inst Canc Res, MaRS Ctr, Toronto, ON M5G 0A3, Canada
[2] Univ Toronto, Ontario Canc Inst, Toronto, ON M5G 1L7, Canada
[3] Univ Toronto, Campbell Family Canc Res Inst, Toronto, ON M5G 1L7, Canada
[4] McGill Univ, Dept Biochem, Montreal, PQ H3G 1Y6, Canada
[5] McGill Univ, Dept Oncol, Montreal, PQ H3G 1Y6, Canada
[6] McGill Univ, Goodman Canc Res Ctr, Montreal, PQ H3G 1Y6, Canada
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
加拿大健康研究院;
关键词
SUBTRACTION; DATABASE; BROWSER; SEARCH; TOOL; DNA;
D O I
10.1186/1471-2105-13-206
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results: Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions: To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID's predictions were successfully validated in vitro.
引用
收藏
页数:11
相关论文
共 26 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Rapid identification of non-human sequences in high-throughput sequencing datasets [J].
Bhaduri, Aparna ;
Qu, Kun ;
Lee, Carolyn S. ;
Ungewickell, Alexander ;
Khavari, Paul A. .
BIOINFORMATICS, 2012, 28 (08) :1174-1175
[3]   Both BC-Box motifs of adenovirus protein E4orf6 are required to efficiently assemble an E3 ligase complex that degrades p53 [J].
Blanchette, P ;
Cheng, CY ;
Yan, Q ;
Ketner, G ;
Ornelles, DA ;
Dobner, T ;
Conaway, RC ;
Conaway, JW ;
Branton, PE .
MOLECULAR AND CELLULAR BIOLOGY, 2004, 24 (21) :9619-9629
[4]   The E4orf6/E1B55K E3 Ubiquitin Ligase Complexes of Human Adenoviruses Exhibit Heterogeneity in Composition and Substrate Specificity [J].
Cheng, Chi Ying ;
Gilson, Timra ;
Dallaire, Frederic ;
Ketner, Gary ;
Branton, Philip E. ;
Blanchette, Paola .
JOURNAL OF VIROLOGY, 2011, 85 (02) :765-775
[5]   Biopython']python: freely available Python']Python tools for computational molecular biology and bioinformatics [J].
Cock, Peter J. A. ;
Antao, Tiago ;
Chang, Jeffrey T. ;
Chapman, Brad A. ;
Cox, Cymon J. ;
Dalke, Andrew ;
Friedberg, Iddo ;
Hamelryck, Thomas ;
Kauff, Frank ;
Wilczynski, Bartek ;
de Hoon, Michiel J. L. .
BIOINFORMATICS, 2009, 25 (11) :1422-1423
[6]   SHRiMP2: Sensitive yet Practical Short Read Mapping [J].
David, Matei ;
Dzamba, Misko ;
Lister, Dan ;
Ilie, Lucian ;
Brudno, Michael .
BIOINFORMATICS, 2011, 27 (07) :1011-1012
[7]  
Dirks WG, 2011, METHODS MOL BIOL, V731, P45, DOI 10.1007/978-1-61779-080-5_5
[8]   ANALYSIS OF MUTATION IN HUMAN-CELLS BY USING AN EPSTEIN-BARR-VIRUS SHUTTLE SYSTEM [J].
DUBRIDGE, RB ;
TANG, P ;
HSIA, HC ;
LEONG, PM ;
MILLER, JH ;
CALOS, MP .
MOLECULAR AND CELLULAR BIOLOGY, 1987, 7 (01) :379-387
[9]  
Feng HC, 2008, SCIENCE, V319, P1096, DOI 10.1126/science.1152586
[10]   Human transcriptome subtraction by using short sequence tags to search for tumor viruses in conjunctival carcinoma [J].
Feng, Huichen ;
Taylor, Jennifer L. ;
Benos, Panayiotis V. ;
Newton, Robert ;
Waddell, Keith ;
Lucas, Sebastien B. ;
Chang, Yuan ;
Moore, Patrick S. .
JOURNAL OF VIROLOGY, 2007, 81 (20) :11332-11340