Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining

被引:50
作者
Dauter, Zbigniew [1 ]
Wlodawer, Alexander [2 ]
Minor, Wladek [3 ,4 ,5 ,6 ,7 ]
Jaskolski, Mariusz [8 ,9 ]
Rupp, Bernhard [10 ,11 ]
机构
[1] NCI, Synchrotron Radiat Res Sect, Macromol Crystallog Lab, Argonne Natl Lab, Argonne, IL 60439 USA
[2] NCI, Prot Struct Sect, Macromol Crystallog Lab, Frederick, MD 21702 USA
[3] Univ Virginia, Dept Mol Physiol & Biol Phys, Charlottesville, VA 22908 USA
[4] Midwest Ctr Struct Genom, Argonne, IL USA
[5] New York Struct Genom Consortium, New York, NY USA
[6] Ctr Struct Genom Infect Dis, Seattle, WA USA
[7] Enzyme Funct Initiat, Urbana, IL USA
[8] Adam Mickiewicz Univ, Fac Chem, Dept Crystallog, Poznan, Poland
[9] Polish Acad Sci, Inst Bioorgan Chem, Ctr Biocrystallog Res, Poznan, Poland
[10] KK Hofkristallamt, Vista, CA 92084 USA
[11] Med Univ Innsbruck, Dept Genet Epidemiol, A-6020 Innsbruck, Austria
来源
IUCRJ | 2014年 / 1卷
基金
美国国家卫生研究院;
关键词
macromolecular crystallography; model validation; Protein Data Bank; PROTEIN DATA-BANK; VALIDATION TASK-FORCE; CRYSTAL-STRUCTURE; CRYSTALLOGRAPHIC ANALYSIS; POWDER DIFFRACTION; DATA QUALITY; UNIT-CELL; Z-DNA; RESOLUTION; BINDING;
D O I
10.1107/S2052252514005442
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Whereas the vast majority of the more than 85 000 crystal structures of macromolecules currently deposited in the Protein Data Bank are of high quality, some suffer from a variety of imperfections. Although this fact has been pointed out in the past, it is still worth periodic updates so that the metadata obtained by global analysis of the available crystal structures, as well as the utilization of the individual structures for tasks such as drug design, should be based on only the most reliable data. Here, selected abnormal deposited structures have been analysed based on the Bayesian reasoning that the correctness of a model must be judged against both the primary evidence as well as prior knowledge. These structures, as well as information gained from the corresponding publications (if available), have emphasized some of the most prevalent types of common problems. The errors are often perfect illustrations of the nature of human cognition, which is frequently influenced by preconceptions that may lead to fanciful results in the absence of proper validation. Common errors can be traced to negligence and a lack of rigorous verification of the models against electron density, creation of non-parsimonious models, generation of improbable numbers, application of incorrect symmetry, illogical presentation of the results, or violation of the rules of chemistry and physics. Paying more attention to such problems, not only in the final validation stages but during the structure-determination process as well, is necessary not only in order to maintain the highest possible quality of the structural repositories and databases but most of all to provide a solid basis for subsequent studies, including large-scale data-mining projects. For many scientists PDB deposition is a rather infrequent event, so the need for proper training and supervision is emphasized, as well as the need for constant alertness of reason and critical judgment as absolutely necessary safeguarding measures against such problems. Ways of identifying more problematic structures are suggested so that their users may be properly alerted to their possible shortcomings.
引用
收藏
页码:179 / 193
页数:15
相关论文
共 84 条
[1]   Draft crystal structure of the vault shell at 9-Å resolution [J].
Anderson, Daniel H. ;
Kickhoefer, Valerie A. ;
Sievers, Stuart A. ;
Rome, Leonard H. ;
Eisenberg, David .
PLOS BIOLOGY, 2007, 5 (11) :2661-2670
[2]  
[Anonymous], Philosophical Transactions of the Royal Society of London for, DOI DOI 10.1098/RSTL.1763.0053
[3]  
[Anonymous], 2003, XPREP
[4]   Allosteric modulation of the RNA polymerase catalytic reaction is an essential component of transcription control by rifamycins [J].
Artsimovitch, I ;
Vassylyeva, MN ;
Svetlov, D ;
Svetlov, V ;
Perederina, A ;
Igarashi, N ;
Matsugaki, N ;
Wakatsuki, S ;
Tahirov, TH ;
Vassylyev, DG .
CELL, 2005, 122 (03) :351-363
[5]   Structural basis for transcription regulation by alarmone ppGpp [J].
Artsimovitch, I ;
Patlan, V ;
Sekine, SI ;
Vassylyeva, MN ;
Hosaka, T ;
Ochi, K ;
Yokoyama, S ;
Vassylyev, DG .
CELL, 2004, 117 (03) :299-310
[6]  
Bacon F., 1620, APHOPRISM, V49
[7]   STUDIES OF NACL-KCL SOLID SOLUTIONS .1. HEATS OF FORMATION, LATTICE SPACINGS, DENSITIES, SCHOTTKY DEFECTS AND MUTUAL SOLUBILITIES [J].
BARRETT, WT ;
WALLACE, WE .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1954, 76 (02) :366-369
[8]   Announcing the worldwide Protein Data Bank [J].
Berman, H ;
Henrick, K ;
Nakamura, H .
NATURE STRUCTURAL BIOLOGY, 2003, 10 (12) :980-980
[9]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[10]   BETWEEN OBJECTIVITY AND SUBJECTIVITY [J].
BRANDEN, CI ;
JONES, TA .
NATURE, 1990, 343 (6260) :687-689