Automating data sharing through authoring tools

被引:1
作者
Kitchin J.R. [1 ]
Van Gulick A.E. [2 ,3 ]
Zilinski L.D. [2 ]
机构
[1] Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, 15213, PA
[2] University Libraries, Carnegie Mellon University, Pittsburgh, 15213, PA
[3] Center for the Neural Basis of Cognition, Carnegie Mellon University, Pittsburgh, PA
关键词
Authoring; Data sharing; Embedding; Org-mode;
D O I
10.1007/s00799-016-0173-7
中图分类号
学科分类号
摘要
In the current scientific publishing landscape, there is a need for an authoring workflow that easily integrates data and code into manuscripts and that enables the data and code to be published in reusable form. Automated embedding of data and code into published output will enable superior communication and data archiving. In this work, we demonstrate a proof of concept for a workflow, org-mode, which successfully provides this authoring capability and workflow integration. We illustrate this concept in a series of examples for potential uses of this workflow. First, we use data on citation counts to compute the h-index of an author, and show two code examples for calculating the h-index. The source for each example is automatically embedded in the PDF during the export of the document. We demonstrate how data can be embedded in image files, which themselves are embedded in the document. Finally, metadata about the embedded files can be automatically included in the exported PDF, and accessed by computer programs. In our customized export, we embedded metadata about the attached files in the PDF in an Info field. A computer program could parse this output to get a list of embedded files and carry out analyses on them. Authoring tools such as Emacs + org-mode can greatly facilitate the integration of data and code into technical writing. These tools can also automate the embedding of data into document formats intended for consumption. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:93 / 98
页数:5
相关论文
共 14 条
[1]  
Dominik C., The Org Mode 8 Reference Manual: Organize Your Life with GNU Emacs, (2014)
[2]  
Elsevier Content Innovations: Content innovation, (2015)
[3]  
Hirsch J.E., An index to quantify an individual’s scientific research output, Proc. Natl. Acad. Sci., 102, 46, pp. 16,569-16,572, (2005)
[4]  
The Jupyter Project provides a web-browser based computational notebook with a range of computational backends including Python, (2015)
[5]  
Kitchin J.R., Data sharing in surface science, Surface science (in Press), (2015)
[6]  
Kitchin J.R., Examples of effective data sharing in scientific publishing, ACS Cata., 5, 6, pp. 3894-3899, (2015)
[7]  
Manuscript formatting guide
[8]  
Pakin S., v1.5b
[9]  
PDFtk the pdf toolkit
[10]  
Perez F., Granger B.E., IPython: a system for interactive scientific computing, Comput. Sci. Eng., 9, 3, pp. 21-29, (2007)