CORRUPTION OF GENOMIC DATABASES WITH ANOMALOUS SEQUENCE

被引:26
作者
LAMPERTI, ED
KITTELBERGER, JM
SMITH, TF
VILLAKOMAROFF, L
机构
[1] HARVARD UNIV, CHILDRENS HOSP,SCH MED,DEPT NEUROL,ENDERS 250, 300 LONGWOOD AVE, BOSTON, MA 02115 USA
[2] HARVARD UNIV, SCH PUBL HLTH, DANA FARBER CANC INST, BOSTON, MA 02115 USA
关键词
D O I
10.1093/nar/20.11.2741
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
引用
收藏
页码:2741 / 2747
页数:7
相关论文
共 67 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   PRIMARY STRUCTURE OF A GENE ENCODING RAT T-KININOGEN [J].
ANDERSON, KP ;
CROYLE, ML ;
LINGREL, JB .
GENE, 1989, 81 (01) :119-128
[3]  
ANN DK, 1987, J BIOL CHEM, V262, P3958
[4]   GENE STRUCTURE AND PRECURSOR PROCESSING OF A NOVEL BACILLUS-SUBTILIS SPORE COAT PROTEIN [J].
ARONSON, AI ;
SONG, HY ;
BOURNE, N .
MOLECULAR MICROBIOLOGY, 1989, 3 (03) :437-444
[5]   STRUCTURE OF THE DICTYOSTELIUM-DISCOIDEUM PRESTALK D11 GENE AND PROTEIN [J].
BARKLIS, E ;
PONTIUS, B ;
LODISH, HF .
MOLECULAR AND CELLULAR BIOLOGY, 1985, 5 (06) :1473-1479
[6]   CHROMOSOMAL TRANSLOCATION IN A HUMAN-LEUKEMIC STEM-CELL LINE DISRUPTS THE T-CELL ANTIGEN RECEPTOR DELTA-CHAIN DIVERSITY REGION AND RESULTS IN A PREVIOUSLY UNREPORTED FUSION TRANSCRIPT [J].
BEGLEY, CG ;
APLAN, PD ;
DAVEY, MP ;
NAKAHARA, K ;
TCHORZ, K ;
KURTZBERG, J ;
HERSHFIELD, MS ;
HAYNES, BF ;
COHEN, DI ;
WALDMANN, TA ;
KIRSCH, IR .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1989, 86 (06) :2031-2035
[7]   CODING SEQUENCES FOR VASOACTIVE INTESTINAL PEPTIDE AND PHM-27 PEPTIDE ARE LOCATED ON 2 ADJACENT EXONS IN THE HUMAN GENOME [J].
BODNER, M ;
FRIDKIN, M ;
GOZES, I .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1985, 82 (11) :3548-3551
[8]   CLEANING UP GENE DATABASES [J].
BRUNAK, S ;
ENGELBRECHT, J ;
KNUDSEN, S .
NATURE, 1990, 343 (6254) :123-123
[9]   ELECTRONIC DATA PUBLISHING AND GENBANK [J].
CINKOSKY, MJ ;
FICKETT, JW ;
GILNA, P ;
BURKS, C .
SCIENCE, 1991, 252 (5010) :1273-1277
[10]   STRUCTURE OF A CDNA CLONE SPECIFIC TO HEPATOMA-CELLS WITH REARRANGED MITOCHONDRIAL SEQUENCES [J].
CORRAL, M ;
BAFFET, G ;
DEFER, N .
NUCLEIC ACIDS RESEARCH, 1988, 16 (22) :10935-10935