PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics

被引:57
作者
Krasnov, George Sergeevich [1 ,2 ,3 ]
Dmitriev, Alexey Alexandrovich [1 ]
Kudryavtseva, Anna Viktorovna [1 ,4 ]
Shargunov, Alexander Valerievich [2 ,3 ]
Karpov, Dmitry Sergeevich [1 ,2 ]
Uroshlev, Leonid Andreevich [1 ]
Melnikova, Natalya Vladimirovna [1 ]
Blinov, Vladimir Mikhailovich [2 ,3 ]
Poverennaya, Ekaterina Vladimirovna [2 ]
Archakov, Alexander Ivanovich [2 ]
Lisitsa, Andrey Valerievich [2 ]
Ponomarenko, Elena Alexandrovna [2 ]
机构
[1] Russian Acad Sci, Engelhardt Inst Mol Biol, Moscow 111991, Russia
[2] Russian Acad Med Sci, Orekhov Inst Biomed Chem, Moscow 119121, Russia
[3] Mechnikov Res Inst Vaccines & Sera, Moscow 105064, Russia
[4] Minist Healthcare Russian Fed, Herzen Moscow Canc Res Inst, Moscow 125284, Russia
基金
俄罗斯科学基金会;
关键词
C-HPP; RNA-seq; SNP; SAP; indel; alternative reading frames; alternative splicing; proteotypic peptides; HUMAN PROTEOME PROJECT; SEQUENCING DATA; CHROMOSOME-18; TRANSCRIPTOME; MISSING PROTEINS; DEPLETED PLASMA; DEEP PROTEOME; LIVER-TISSUE; HEPG2; CELLS; HUMAN COLON; RNA;
D O I
10.1021/acs.jproteome.5b00490
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The fundamental mission of the Chromosome-Centric Human Proteome Project (C-HPP) is the research of human proteome diversity, including rare variants. Liver tissues, HepG2 cells, and plasma were selected as one of the major objects for C-HPP studies. The proteogenomic approach, a recently introduced technique, is a powerful method for predicting and validating proteoforms coming from alternative splicing, mutations, and transcript editing. We developed PPLine, a Python-based proteogenomic pipeline providing automated single-amino-acid polymorphism (SAP), indel, and alternative-spliced-variants discovery based on raw transcriptome and exome sequence data, single-nucleotide polymorphism (SNP) annotation and filtration, and the prediction of proteotypic peptides (available at https://sourceforge.net/projects/ppline). In this work, we performed deep transcriptome sequencing of HepG2 cells and liver tissues using two platforms: Illumina HiSeq and Applied Biosystems SOLiD. Using PPLine, we revealed 7756 SAP and indels for HepG2 cells and liver (including 659 variants nonannotated in dbSNP). We found 17 indels in transcripts associated with the translation of alternate reading frames (ARF) longer than 300 bp. The ARF products of two genes, SLMO1 and TMEM8A, demonstrate signatures of caspase-binding domain and Gcn5-related N-acetyltransferase. Alternative splicing analysis predicted novel proteoforms encoded by 203 (liver) and 475 (HepG2) genes according to both Illumina and SOLID data. The results of the present work represent a basis for subsequent proteomic studies by the C-HPP consortium.
引用
收藏
页码:3729 / 3737
页数:9
相关论文
共 56 条
[1]   Novel Bioinformatics Method for Identification of Genome-Wide Non-Canonical Spliced Regions Using RNA-Seq Data [J].
Bai, Yongsheng ;
Hassler, Justin ;
Ziyar, Ahdad ;
Li, Philip ;
Wright, Zachary ;
Menon, Rajasree ;
Omenn, Gilbert S. ;
Cavalcoli, James D. ;
Kaufman, Randal J. ;
Sartor, Maureen A. .
PLOS ONE, 2014, 9 (07)
[2]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[3]   Effective filtering strategies to improve data quality from population-based whole exome sequencing studies [J].
Carson, Andrew R. ;
Smith, Erin N. ;
Matsui, Hiroko ;
Braekkan, Sigrid K. ;
Jepsen, Kristen ;
Hansen, John-Bjarne ;
Frazer, Kelly A. .
BMC BIOINFORMATICS, 2014, 15
[4]   Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals [J].
Cheng, Anthony Youzhi ;
Teo, Yik-Ying ;
Ong, Rick Twee-Hee .
BIOINFORMATICS, 2014, 30 (12) :1707-1713
[5]  
Delgado AP, 2014, CANCER GENOM PROTEOM, V11, P201
[6]   Tissue-Specific Alternative Splicing Remodels Protein-Protein Interaction Networks [J].
Ellis, Jonathan D. ;
Barrios-Rodiles, Miriam ;
Colak, Recep ;
Irimia, Manuel ;
Kim, TaeHyung ;
Calarco, John A. ;
Wang, Xinchen ;
Pan, Qun ;
O'Hanlon, Dave ;
Kim, Philip M. ;
Wrana, Jeffrey L. ;
Blencowe, Benjamin J. .
MOLECULAR CELL, 2012, 46 (06) :884-892
[7]   Proteogenomic Analysis of Human Colon Carcinoma Cell Lines LIM1215, LIM1899, and LIM2405 [J].
Fanayan, Susan ;
Smith, Joshua T. ;
Lee, Ling Y. ;
Yan, Fangfei ;
Snyder, Michael ;
Hancock, William S. ;
Nice, Edouard .
JOURNAL OF PROTEOME RESEARCH, 2013, 12 (04) :1732-1742
[8]   Ribosomal frameshifting used in influenza A virus expression occurs within the sequence UCC_UUU_CGU and is in the+1 direction [J].
Firth, A. E. ;
Jagger, B. W. ;
Wise, H. M. ;
Nelson, C. C. ;
Parsawar, K. ;
Wills, N. M. ;
Napthine, S. ;
Taubenberger, J. K. ;
Digard, P. ;
Atkins, J. F. .
OPEN BIOLOGY, 2012, 2
[9]   neXtProt: Organizing Protein Knowledge in the Context of Human Proteome Projects [J].
Gaudet, Pascale ;
Argoud-Puy, Ghislaine ;
Cusin, Isabelle ;
Duek, Paula ;
Evalet, Olivier ;
Gateau, Alain ;
Gleizes, Anne ;
Pereira, Mario ;
Zahn-Zabal, Monique ;
Zwahlen, Catherine ;
Bairoch, Amos ;
Lane, Lydie .
JOURNAL OF PROTEOME RESEARCH, 2013, 12 (01) :293-298
[10]   Expanding genome capacity via RNA editing [J].
Gott, JM .
COMPTES RENDUS BIOLOGIES, 2003, 326 (10-11) :901-908