CORRUPTION OF GENOMIC DATABASES WITH ANOMALOUS SEQUENCE

被引:26
作者
LAMPERTI, ED
KITTELBERGER, JM
SMITH, TF
VILLAKOMAROFF, L
机构
[1] HARVARD UNIV, CHILDRENS HOSP,SCH MED,DEPT NEUROL,ENDERS 250, 300 LONGWOOD AVE, BOSTON, MA 02115 USA
[2] HARVARD UNIV, SCH PUBL HLTH, DANA FARBER CANC INST, BOSTON, MA 02115 USA
关键词
D O I
10.1093/nar/20.11.2741
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
引用
收藏
页码:2741 / 2747
页数:7
相关论文
共 67 条
[41]   A COMMON LANGUAGE FOR PHYSICAL MAPPING OF THE HUMAN GENOME [J].
OLSON, M ;
HOOD, L ;
CANTOR, C ;
BOTSTEIN, D .
SCIENCE, 1989, 245 (4925) :1434-1435
[42]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[43]   2 IDENTICAL SYMMETRICAL REGIONS IN THE MINICIRCLE STRUCTURE OF TRYPANOSOMA-LEWISI KINETOPLAST DNA [J].
PONZI, M ;
BIRAGO, C ;
BATTAGLIA, PA .
MOLECULAR AND BIOCHEMICAL PARASITOLOGY, 1984, 13 (01) :111-119
[44]   TILAPIA PROLACTIN - MOLECULAR-CLONING OF 2 CDNAS AND EXPRESSION IN ESCHERICHIA-COLI [J].
RENTIERDELRUE, F ;
SWENNEN, D ;
PRUNET, P ;
LION, M ;
MARTIAL, JA .
DNA-A JOURNAL OF MOLECULAR & CELLULAR BIOLOGY, 1989, 8 (04) :261-270
[45]   THE HOMEOTIC GENE SPALT (SAL) EVOLVED DURING DROSOPHILA SPECIATION [J].
REUTER, D ;
SCHUH, R ;
JACKLE, H .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1989, 86 (14) :5483-5486
[46]   NEW GAME-PLAN FOR GENOME MAPPING [J].
ROBERTS, L .
SCIENCE, 1989, 245 (4925) :1438-1440
[47]   FINDING DNA SEQUENCING ERRORS [J].
ROBERTS, L .
SCIENCE, 1991, 252 (5010) :1255-1256
[48]   U4-SMALL NUCLEAR-RNA PSEUDOGENES FROM RAT GENOME HAVE COMMON TRUNCATED 3'-ENDS [J].
SABA, JA ;
BUSCH, H ;
REDDY, R .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 1985, 130 (02) :828-834
[49]   LYMPHOCYTE-B LINEAGE-RESTRICTED EXPRESSION OF MB-1, A GENE WITH CD3-LIKE STRUCTURAL-PROPERTIES [J].
SAKAGUCHI, N ;
KASHIWAMURA, S ;
KIMOTO, M ;
THALMANN, P ;
MELCHERS, F .
EMBO JOURNAL, 1988, 7 (11) :3457-3464
[50]   THE GENETIC-STRUCTURE OF MOUSE ORNITHINE TRANSCARBAMYLASE [J].
SCHERER, SE ;
VERES, G ;
CASKEY, CT .
NUCLEIC ACIDS RESEARCH, 1988, 16 (04) :1593-1601