Data capture in bioinformatics: requirements and experiences with Pedro

被引:8
作者
Jameson, Daniel [2 ]
Garwood, Kevin [1 ]
Garwood, Chris
Booth, Tim [3 ]
Alper, Pinar [1 ]
Oliver, Stephen G. [4 ,5 ]
Paton, Norman W. [1 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Manchester, Sch Chem, Manchester Interdisciplinary Bioctr, Manchester M1 7DN, Lancs, England
[3] NERC, Ctr Ecol & Hydrol, Oxford OX1 3SR, England
[4] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
[5] Univ Cambridge, Dept Biochem, Cambridge CB2 1GA, England
基金
英国自然环境研究理事会; 英国工程与自然科学研究理事会; 英国生物技术与生命科学研究理事会;
关键词
D O I
10.1186/1471-2105-9-183
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The systematic capture of appropriately annotated experimental data is a prerequisite for most bioinformatics analyses. Data capture is required not only for submission of data to public repositories, but also to underpin integrated analysis, archiving, and sharing - both within laboratories and in collaborative projects. The widespread requirement to capture data means that data capture and annotation are taking place at many sites, but the small scale of the literature on tools, techniques and experiences suggests that there is work to be done to identify good practice and reduce duplication of effort. Results: This paper reports on experience gained in the deployment of the Pedro data capture tool in a range of representative bioinformatics applications. The paper makes explicit the requirements that have recurred when capturing data in different contexts, indicates how these requirements are addressed in Pedro, and describes case studies that illustrate where the requirements have arisen in practice. Conclusion: Data capture is a fundamental activity for bioinformatics; all biological data resources build on some form of data capture activity, and many require a blend of import, analysis and annotation. Recurring requirements in data capture suggest that model-driven architectures can be used to construct data capture infrastructures that can be rapidly configured to meet the needs of individual use cases. We have described how one such model-driven infrastructure, namely Pedro, has been deployed in representative case studies, and discussed the extent to which the model-driven approach has been effective in practice.
引用
收藏
页数:15
相关论文
共 24 条
[1]  
AG S, TAM XML DAT
[2]   Teallach - a flexible user-interface development environment for object database applications [J].
Barclay, PJ ;
Griffiths, T ;
McKirdy, J ;
Kennedy, J ;
Cooper, R ;
Paton, NW ;
Gray, P .
JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2003, 14 (01) :47-77
[3]   Gene expression omnibus: Microarray data storage, submission, retrieval, and analysis [J].
Barrett, Tanya ;
Edgar, Ron .
DNA MICROARRAYS, PART B: DATABASES AND STATISTICS, 2006, 411 :352-369
[4]  
Benson DA, 2010, NUCLEIC ACIDS RES, V38, pD46, DOI [10.1093/nar/gkp1024, 10.1093/nar/gkq1079, 10.1093/nar/gkl986, 10.1093/nar/gks1195, 10.1093/nar/gkw1070, 10.1093/nar/gkr1202, 10.1093/nar/gkn723, 10.1093/nar/gkx1094]
[5]   Minimum information about a microarray experiment (MIAME) - toward standards for microarray data [J].
Brazma, A ;
Hingamp, P ;
Quackenbush, J ;
Sherlock, G ;
Spellman, P ;
Stoeckert, C ;
Aach, J ;
Ansorge, W ;
Ball, CA ;
Causton, HC ;
Gaasterland, T ;
Glenisson, P ;
Holstege, FCP ;
Kim, IF ;
Markowitz, V ;
Matese, JC ;
Parkinson, H ;
Robinson, A ;
Sarkans, U ;
Schulze-Kremer, S ;
Stewart, J ;
Taylor, R ;
Vilo, J ;
Vingron, M .
NATURE GENETICS, 2001, 29 (04) :365-371
[6]   Genetic and physical maps of Saccharomyces cerevisiae [J].
Cherry, JM ;
Ball, C ;
Weng, S ;
Juvik, G ;
Schmidt, R ;
Adler, C ;
Dunn, B ;
Dwight, S ;
Riles, L ;
Mortimer, RK ;
Botstein, D .
NATURE, 1997, 387 (6632) :67-73
[7]  
FALLSIDE DC, XML SCHEMA
[8]   Pfam:: clans, web tools and services [J].
Finn, Robert D. ;
Mistry, Jaina ;
Schuster-Bockler, Benjamin ;
Griffiths-Jones, Sam ;
Hollich, Volker ;
Lassmann, Timo ;
Moxon, Simon ;
Marshall, Mhairi ;
Khanna, Ajay ;
Durbin, Richard ;
Eddy, Sean R. ;
Sonnhammer, Erik L. L. ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D247-D251
[9]   The Molecular Biology Database Collection: 2007 update [J].
Galperin, Michael Y. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D3-D4
[10]   Model-driven user interfaces for bioinformatics data resources: regenerating the wheel as an alternative to reinventing it [J].
Garwood, Kevin ;
Garwood, Christopher ;
Hedeler, Cornelia ;
Griffiths, Tony ;
Swainston, Neil ;
Oliver, Stephen G. ;
Paton, Norman W. .
BMC BIOINFORMATICS, 2006, 7 (1)