PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics

被引:63
作者
Krasnov, George Sergeevich [1 ,2 ,3 ]
Dmitriev, Alexey Alexandrovich [1 ]
Kudryavtseva, Anna Viktorovna [1 ,4 ]
Shargunov, Alexander Valerievich [2 ,3 ]
Karpov, Dmitry Sergeevich [1 ,2 ]
Uroshlev, Leonid Andreevich [1 ]
Melnikova, Natalya Vladimirovna [1 ]
Blinov, Vladimir Mikhailovich [2 ,3 ]
Poverennaya, Ekaterina Vladimirovna [2 ]
Archakov, Alexander Ivanovich [2 ]
Lisitsa, Andrey Valerievich [2 ]
Ponomarenko, Elena Alexandrovna [2 ]
机构
[1] Russian Acad Sci, Engelhardt Inst Mol Biol, Moscow 111991, Russia
[2] Russian Acad Med Sci, Orekhov Inst Biomed Chem, Moscow 119121, Russia
[3] Mechnikov Res Inst Vaccines & Sera, Moscow 105064, Russia
[4] Minist Healthcare Russian Fed, Herzen Moscow Canc Res Inst, Moscow 125284, Russia
基金
俄罗斯科学基金会;
关键词
C-HPP; RNA-seq; SNP; SAP; indel; alternative reading frames; alternative splicing; proteotypic peptides; HUMAN PROTEOME PROJECT; SEQUENCING DATA; CHROMOSOME-18; TRANSCRIPTOME; MISSING PROTEINS; DEPLETED PLASMA; DEEP PROTEOME; LIVER-TISSUE; HEPG2; CELLS; HUMAN COLON; RNA;
D O I
10.1021/acs.jproteome.5b00490
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The fundamental mission of the Chromosome-Centric Human Proteome Project (C-HPP) is the research of human proteome diversity, including rare variants. Liver tissues, HepG2 cells, and plasma were selected as one of the major objects for C-HPP studies. The proteogenomic approach, a recently introduced technique, is a powerful method for predicting and validating proteoforms coming from alternative splicing, mutations, and transcript editing. We developed PPLine, a Python-based proteogenomic pipeline providing automated single-amino-acid polymorphism (SAP), indel, and alternative-spliced-variants discovery based on raw transcriptome and exome sequence data, single-nucleotide polymorphism (SNP) annotation and filtration, and the prediction of proteotypic peptides (available at https://sourceforge.net/projects/ppline). In this work, we performed deep transcriptome sequencing of HepG2 cells and liver tissues using two platforms: Illumina HiSeq and Applied Biosystems SOLiD. Using PPLine, we revealed 7756 SAP and indels for HepG2 cells and liver (including 659 variants nonannotated in dbSNP). We found 17 indels in transcripts associated with the translation of alternate reading frames (ARF) longer than 300 bp. The ARF products of two genes, SLMO1 and TMEM8A, demonstrate signatures of caspase-binding domain and Gcn5-related N-acetyltransferase. Alternative splicing analysis predicted novel proteoforms encoded by 203 (liver) and 475 (HepG2) genes according to both Illumina and SOLID data. The results of the present work represent a basis for subsequent proteomic studies by the C-HPP consortium.
引用
收藏
页码:3729 / 3737
页数:9
相关论文
共 56 条
[41]   Genomic analysis of smooth tubercle bacilli provides insights into ancestry and pathoadaptation of Mycobacterium tuberculosis [J].
Supply, Philip ;
Marceau, Michael ;
Mangenot, Sophie ;
Roche, David ;
Rouanet, Carine ;
Khanna, Varun ;
Majlessi, Laleh ;
Criscuolo, Alexis ;
Tap, Julien ;
Pawlik, Alexandre ;
Fiette, Laurence ;
Orgeur, Mickael ;
Fabre, Michel ;
Parmentier, Cecile ;
Frigui, Wafa ;
Simeone, Roxane ;
Boritsch, Eva C. ;
Debrie, Anne-Sophie ;
Willery, Eve ;
Walker, Danielle ;
Quail, Michael A. ;
Ma, Laurence ;
Bouchier, Christiane ;
Salvignol, Gregory ;
Sayes, Fadel ;
Cascioferro, Alessandro ;
Seemann, Torsten ;
Barbe, Valerie ;
Locht, Camille ;
Gutierrez, Maria-Cristina ;
Leclerc, Claude ;
Bentley, Stephen D. ;
Stinear, Timothy P. ;
Brisse, Sylvain ;
Medigue, Claudine ;
Parkhill, Julian ;
Cruveiller, Stephane ;
Brosch, Roland .
NATURE GENETICS, 2013, 45 (02) :172-179
[42]   PRADA: pipeline for RNA sequencing data analysis [J].
Torres-Garcia, Wandaliz ;
Zheng, Siyuan ;
Sivachenko, Andrey ;
Vegesna, Rahulsimham ;
Wang, Qianghu ;
Yao, Rong ;
Berger, Michael F. ;
Weinstein, John N. ;
Getz, Gad ;
Verhaak, Roel G. W. .
BIOINFORMATICS, 2014, 30 (15) :2224-2226
[43]   Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks [J].
Trapnell, Cole ;
Roberts, Adam ;
Goff, Loyal ;
Pertea, Geo ;
Kim, Daehwan ;
Kelley, David R. ;
Pimentel, Harold ;
Salzberg, Steven L. ;
Rinn, John L. ;
Pachter, Lior .
NATURE PROTOCOLS, 2012, 7 (03) :562-578
[44]   High-throughput discovery of rare insertions and deletions in large cohorts [J].
Vallania, Francesco L. M. ;
Druley, Todd E. ;
Ramos, Enrique ;
Wang, Jue ;
Borecki, Ingrid ;
Province, Michael ;
Mitra, Robi D. .
GENOME RESEARCH, 2010, 20 (12) :1711-1718
[45]   Direct Detection of Alternative Open Reading Frames Translation Products in Human Significantly Expands the Proteome [J].
Vanderperre, Benoit ;
Lucier, Jean-Francois ;
Bissonnette, Cyntia ;
Motard, Julie ;
Tremblay, Guillaume ;
Vanderperre, Solene ;
Wisztorski, Maxence ;
Salzet, Michel ;
Boisvert, Francois-Michel ;
Roucou, Xavier .
PLOS ONE, 2013, 8 (08)
[46]   HAltORF: a database of predicted out-of-frame alternative open reading frames in human [J].
Vanderperre, Benoit ;
Lucier, Jean-Francois ;
Roucou, Xavier .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
[47]   RVboost: RNA-seq variants prioritization using a boosting method [J].
Wang, Chen ;
Davila, Jaime I. ;
Baheti, Saurabh ;
Bhagwate, Aditya V. ;
Wang, Xue ;
Kocher, Jean-Pierre A. ;
Slager, Susan L. ;
Feldman, Andrew L. ;
Novak, Anne J. ;
Cerhan, James R. ;
Thompson, E. Aubrey ;
Asmann, Yan W. .
BIOINFORMATICS, 2014, 30 (23) :3414-3416
[48]   ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data [J].
Wang, Kai ;
Li, Mingyao ;
Hakonarson, Hakon .
NUCLEIC ACIDS RESEARCH, 2010, 38 (16) :e164
[49]   Detailed comparison of two popular variant calling packages for exome and targeted exon studies [J].
Warden, Charles D. ;
Adamson, AaronW. ;
Neuhausen, Susan L. ;
Wu, Xiwei .
PEERJ, 2014, 2
[50]   The potential role of ribosomal frameshifting in generating aberrant proteins implicated in neurodegenerative diseases [J].
Wills, Norma M. ;
Atkins, John F. .
RNA, 2006, 12 (07) :1149-1153