Experiments in searching small proteins in unannotated large eukaryotic genomes

被引:11
作者
Colinge, J [1 ]
Cusin, I [1 ]
Reffas, S [1 ]
Mahé, E [1 ]
Niknejad, A [1 ]
Rey, PA [1 ]
Mattou, H [1 ]
Moniatte, M [1 ]
Bougueleret, L [1 ]
机构
[1] GeneProt Inc, CH-1217 Meyrin, Switzerland
关键词
bioinformatics; genome; identification; eukaryote; small protein; ion trap; exon; intron;
D O I
10.1021/pr049811i
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
There is growing interest to use mass spectrometry data to search genome sequences directly. Previous work by other authors demonstrated that this approach is able to correct and complement available genome annotations. We discuss the practical difficulty of searching large eukaryotic genomes with peptide ion trap tandem mass spectra of small proteins (<40 kDa). The challenging problem of automatically identifying peptides that span across exon/intron boundaries is explored for the first time by using experimental data. In a human genome search, we find that roughly 30% of the peptides are missed, due to various reasons, compared to a Swiss-Prot search. We show that this percentage is significantly reduced with improved parent mass accuracy. We finally provide several examples of predicted gene structures that could be improved by proteomics data, in particular by peptides spanning across exon/intron boundaries.
引用
收藏
页码:167 / 174
页数:8
相关论文
共 27 条
[1]   In vitro and in silico processes to identify differentially expressed proteins [J].
Allet, N ;
Barrillat, N ;
Baussant, T ;
Boiteau, C ;
Botti, P ;
Bougueleret, L ;
Budin, N ;
Canet, D ;
Carraud, S ;
Chiappe, D ;
Christmann, N ;
Colinge, J ;
Cusin, I ;
Dafflon, N ;
Depresle, B ;
Fasso, I ;
Frauchiger, P ;
Gaertner, H ;
Gleizes, A ;
Gonzalez-Couto, E ;
Jeandenans, C ;
Karmime, A ;
Kowall, T ;
Lagache, S ;
Mahé, E ;
Masselot, A ;
Mattou, H ;
Moniatte, M ;
Niknejad, A ;
Paolini, M ;
Perret, F ;
Pinaud, N ;
Ranno, F ;
Raimondi, S ;
Reffas, S ;
Regamey, PO ;
Rey, PA ;
Rodriguez-Tomé, P ;
Rose, K ;
Rossellat, G ;
Saudrais, C ;
Schmidt, C ;
Villain, M ;
Zwahlen, C .
PROTEOMICS, 2004, 4 (08) :2333-2351
[2]   Using proteomics to mine genome sequences [J].
Arthur, JW ;
Wilkins, MR .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (03) :393-402
[3]  
BALOGH MP, 2004, LCGC N AM, V22
[4]  
BIRNEY E, 2004, GENOME RES
[5]   The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 [J].
Boeckmann, B ;
Bairoch, A ;
Apweiler, R ;
Blatter, MC ;
Estreicher, A ;
Gasteiger, E ;
Martin, MJ ;
Michoud, K ;
O'Donovan, C ;
Phan, I ;
Pilbout, S ;
Schneider, M .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :365-370
[6]   Prediction of complete gene structures in human genomic DNA [J].
Burge, C ;
Karlin, S .
JOURNAL OF MOLECULAR BIOLOGY, 1997, 268 (01) :78-94
[7]  
BURGE C, 1998, COMPUTATIONAL METHOD, pCH8
[8]  
CHEN T, 2001, P 5 ANN C INT C COMP, P87
[9]  
Choudhary JS, 2001, PROTEOMICS, V1, P651, DOI 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO
[10]  
2-N