Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMark analysis

被引:18
作者
Hirosawa, M [1 ]
Ishikawa, K [1 ]
Nagase, T [1 ]
Ohara, O [1 ]
机构
[1] Kazusa DNA Res Inst, Chiba 2920812, Japan
关键词
D O I
10.1101/gr.129500
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
cDNA is an artificial copy of mRNA and, therefore, no cDNA can be completely free from suspicion of cloning errors. Because overlooking these cloning errors results in serious misinterpretation of cDNA sequences, development of an alerting system targeting spurious sequences in cloned cDNAs is an urgent requirement for massive cDNA sequence analysis. We describe here the application of a modified GeneMark program, originally designed for prokaryotic gene finding. For detection of artifacts in cDNA clones. This program serves to provide a warning when any spurious split of protein-coding regions is detected through statistical analysis of cDNA sequences based on Markov models. In this study, 817 cDNA sequences deposited in public databases by us were subjected to analysis using this alerting system to assess its sensitivity and specificity. The results indicated that any spurious split of protein-coding regions in cloned cDNAs could be sensitively detected and systematically revised by means of this system after the experimental validation of the alerts. Furthermore, this study offered us, for the first time, statistical data regarding the rates and types of errors causing protein-coding splits in cloned cDNAs obtained by conventional cloning methods.
引用
收藏
页码:1333 / 1341
页数:11
相关论文
共 29 条
[11]   Generation and analysis of 280,000 human expressed sequence tags [J].
Hillier, L ;
Lennon, G ;
Becker, M ;
Bonaldo, MF ;
Chiapelli, B ;
Chissoe, S ;
Dietrich, N ;
DuBuque, T ;
Favello, A ;
Gish, W ;
Hawkins, M ;
Hultman, M ;
Kucaba, T ;
Lacy, M ;
Le, M ;
Le, N ;
Mardis, E ;
Moore, B ;
Morris, M ;
Parsons, J ;
Prange, C ;
Rifkin, L ;
Rohlfing, T ;
Schellenberg, K ;
Soares, MB ;
Tan, F ;
ThierryMeg, J ;
Trevaskis, E ;
Underwood, K ;
Wohldman, P ;
Waterston, R ;
Wilson, R ;
Marra, M .
GENOME RESEARCH, 1996, 6 (09) :807-828
[12]  
Hirosawa M, 1999, DNA Res, V6, P329, DOI 10.1093/dnares/6.5.329
[13]   Gene identification and classification in the Synechocystis genomic sequence by recursive gene mark analysis [J].
Hirosawa, M ;
Isono, K ;
Hayes, WS ;
Borodovsky, M .
DNA SEQUENCE, 1997, 8 (1-2) :17-29
[14]  
HIROSAWA M, 1997, GENOME INFORMATICS, P197
[15]  
Ishikawa K, 1997, DNA Res, V4, P307, DOI 10.1093/dnares/4.5.307
[16]   FIDELITY OF HIV-1 REVERSE-TRANSCRIPTASE COPYING RNA INVITRO [J].
JI, JP ;
LOEB, LA .
BIOCHEMISTRY, 1992, 31 (04) :954-958
[17]   HUGE: a database for human large proteins identified in the Kazusa cDNA sequencing project [J].
Kikuno, R ;
Nagase, T ;
Suyama, M ;
Waki, M ;
Hirosawa, M ;
Ohara, O .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :331-332
[18]   Interpreting cDNA sequences: Some insights from studies on translation [J].
Kozak, M .
MAMMALIAN GENOME, 1996, 7 (08) :563-574
[19]   A CATALOG OF SPLICE JUNCTION SEQUENCES [J].
MOUNT, SM .
NUCLEIC ACIDS RESEARCH, 1982, 10 (02) :459-472
[20]   Prediction of the coding sequences of unidentified human genes.: XVII.: The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro [J].
Nagase, T ;
Kikuno, R ;
Ishikawa, K ;
Hirosawa, M ;
Ohara, O .
DNA RESEARCH, 2000, 7 (02) :143-150