Proteomics Standards Initiative Extended FASTA Format

被引:25
作者
Binz, Pierre-Alain [1 ]
Shofstahl, Jim [2 ]
Vizcaino, Juan Antonio [3 ]
Barsnes, Harald [4 ]
Chalkley, Robert J. [5 ]
Menschaert, Gerben [6 ]
Alpi, Emanuele [3 ]
Clauser, Karl [7 ]
Eng, Jimmy K. [8 ]
Lane, Lydie [9 ,17 ]
Seymour, Sean L. [10 ]
Sanchez, Luis Francisco Hernandez [11 ,16 ]
Mayer, Gerhard [18 ]
Eisenacher, Martin [18 ]
Perez-Riverol, Yasset [3 ]
Kapp, Eugene A. [12 ,13 ]
Mendoza, Luis [14 ]
Baker, Peter R. [5 ]
Collins, Andrew [15 ,19 ]
Van den Bossche, Tim [20 ]
Deutsch, Eric W. [14 ]
机构
[1] CHUV Ctr Hosp Univ Vaudois, CH-1011 Lausanne 14, Switzerland
[2] Thermo Fisher Sci, 355 River Oaks Pkwy, San Jose, CA 95134 USA
[3] European Mol Biol Lab, European Bioinformat Inst, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[4] Univ Bergen, Dept Biomed, Prote Unit, N-5009 Bergen, Norway
[5] Univ Calif San Francisco, San Francisco, CA 94143 USA
[6] Univ Ghent, Dept Data Anal & Math Modelling, Biobix, B-9000 Ghent, Belgium
[7] Broad Inst, Cambridge, MA 02142 USA
[8] Univ Washington, Seattle, WA 98195 USA
[9] SIB Swiss Inst Bioinformat, CH-1211 Geneva 4, Switzerland
[10] Seymour Data Sci LLC, San Francisco, CA 95000 USA
[11] Univ Bergen, Dept Clin Sci, KG Jebsen Ctr Diabet Res, N-5021 Bergen, Norway
[12] Walter & Eliza Hall Inst Med Res, Melbourne, Vic 3052, Australia
[13] Univ Melbourne, Melbourne, Vic 3052, Australia
[14] Inst Syst Biol, Seattle, WA 98109 USA
[15] Univ Bergen, Dept Informat, Computat Biol Unit, N-5008 Bergen, Norway
[16] Haukeland Hosp, Ctr Med Genet & Mol Med, N-5021 Bergen, Norway
[17] Univ Geneva, Dept Microbiol & Mol Med, Fac Med, CH-1211 Geneva 4, Switzerland
[18] Ruhr Univ Bochum, Fac Med, Med Proteom Ctr, D-44801 Bochum, Germany
[19] Univ Liverpool, Inst Integrated Biol, Dept Funct & Comparat Genom, Liverpool L69 7ZB, Merseyside, England
[20] Univ Ghent, VIB UGent Ctr Med Biotechnol, B-9000 Ghent, Belgium
基金
英国惠康基金; 欧洲研究理事会; 美国国家卫生研究院;
关键词
PEFF; Proteomics Standards Initiative; PSI; file formats; standards; mass spectrometry; PASTA; proteomics; proteogenomics; TANDEM MASS-SPECTROMETRY; PROTEIN; REPRESENTATION; SEARCH; IDENTIFICATION; ORGANIZATION; PROTEOFORM; TOOLS;
D O I
10.1021/acs.jproteome.9b00064
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mass-spectrometry-based proteomics enables the high-throughput identification and quantification of proteins, including sequence variants and post-translational modifications (PTMs) in biological samples. However, most workflows require that such variations be included in the search space used to analyze the data, and doing so remains challenging with most analysis tools. In order to facilitate the search for known sequence variants and PTMs, the Proteomics Standards Initiative (PSI) has designed and implemented the PSI extended FASTA format (PEFF). PEFF is based on the very popular FASTA format but adds a uniform mechanism for encoding substantially more metadata about the sequence collection as well as individual entries, including support for encoding known sequence variants, PTMs, and proteoforms. The format is very nearly backward compatible, and as such, existing FASTA parsers will require little or no changes to be able to read PEFF files as FASTA files, although without supporting any of the extra capabilities of PEFF. PEFF is defined by a full specification document, controlled vocabulary terms, a set of example files, software libraries, and a file validator. Popular software and resources are starting to support PEFF, including the sequence search engine Comet and the knowledge bases neXtProt and UniProtKB. Widespread implementation of PEFF is expected to further enable proteogenomics and top-down proteomics applications by providing a standardized mechanism for encoding protein sequences and their known variations. All the related documentation, including the detailed file format specification and example files, are available at http://www.psidev.info/peff.
引用
收藏
页码:2686 / 2692
页数:7
相关论文
共 51 条
  • [1] Mass spectrometry-based proteomics
    Aebersold, R
    Mann, M
    [J]. NATURE, 2003, 422 (6928) : 198 - 207
  • [2] Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing
    Bittremieux, Wout
    Meysman, Pieter
    Noble, William Stafford
    Laukens, Kris
    [J]. JOURNAL OF PROTEOME RESEARCH, 2018, 17 (10) : 3463 - 3474
  • [3] In-depth Analysis of Tandem Mass Spectrometry Data from Disparate Instrument Types
    Chalkley, Robert J.
    Baker, Peter R.
    Medzihradszky, Katalin F.
    Lynn, Aenoch J.
    Burlingame, A. L.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2008, 7 (12) : 2386 - 2398
  • [4] A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides
    Chick, Joel M.
    Kolippakkam, Deepak
    Nusinow, David P.
    Zhai, Bo
    Rad, Ramin
    Huttlin, Edward L.
    Gygi, Steven P.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (07) : 743 - 749
  • [5] phpMs: A PHP-Based Mass Spectrometry Utilities Library
    Collins, Andrew
    Jones, Andrew R.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2018, 17 (03) : 1309 - 1313
  • [6] TANDEM: matching proteins with tandem mass spectra
    Craig, R
    Beavis, RC
    [J]. BIOINFORMATICS, 2004, 20 (09) : 1466 - 1467
  • [7] PROTEOFORMER: deep proteome coverage through ribosome profiling and MS integration
    Crappe, Jeroen
    Ndah, Elvis
    Koch, Alexander
    Steyaert, Sandra
    Gawron, Daria
    De Keulenaer, Sarah
    De Meester, Ellen
    De Meyer, Tim
    Van Criekinge, Wim
    Van Damme, Petra
    Menschaert, Gerben
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (05)
  • [8] Unimod: Protein modifications for mass spectrometry
    Creasy, DM
    Cottrell, JS
    [J]. PROTEOMICS, 2004, 4 (06) : 1534 - 1536
  • [9] Locus Reference Genomic sequences: an improved basis for describing human DNA variants
    Dalgleish, Raymond
    Flicek, Paul
    Cunningham, Fiona
    Astashyn, Alex
    Tully, Raymond E.
    Proctor, Glenn
    Chen, Yuan
    McLaren, William M.
    Larsson, Pontus
    Vaughan, Brendan W.
    Beroud, Christophe
    Dobson, Glen
    Lehvaeslaiho, Heikki
    Taschner, Peter E. M.
    den Dunnen, Johan T.
    Devereau, Andrew
    Birney, Ewan
    Brookes, Anthony J.
    Maglott, Donna R.
    [J]. GENOME MEDICINE, 2010, 2
  • [10] Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics
    Deutsch, Eric W.
    Lam, Henry
    Aebersold, Ruedi
    [J]. PHYSIOLOGICAL GENOMICS, 2008, 33 (01) : 18 - 25