The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool

被引:126
作者
Kery, Mary Beth [1 ]
Radensky, Marissa [2 ]
Arya, Mahima [1 ]
John, Bonnie E. [3 ]
Myers, Brad A. [1 ]
机构
[1] Carnegie Mellon Univ, Human Comp Interact Inst, Pittsburgh, PA 15213 USA
[2] Amherst Coll, Amherst, MA 01002 USA
[3] Bloomberg LP, New York, NY USA
来源
PROCEEDINGS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2018) | 2018年
基金
美国国家科学基金会;
关键词
Literate Programming; Exploratory Programming; Data Science; End-User Programmers (EUP); End-User Software Engineering (FUSE);
D O I
10.1145/3173574.3173748
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Literate programming tools are used by millions of programmers today, and are intended to facilitate presenting data analyses in the form of a narrative. We interviewed 21 data scientists to study coding behaviors in a literate programming environment and how data scientists kept track of variants they explored. For participants who tried to keep a detailed history of their experimentation, both informal and formal versioning attempts led to problems, such as reduced notebook readability. During iteration, participants actively curated their notebooks into narratives, although primarily through cell structure rather than markdown explanations. Next, we surveyed 45 data scientists and asked them to envision how they might use their past history in a future version control system. Based on these results, we give design guidance for future literate programming tools, such as providing history search based on how programmers recall their explorations, through contextual details including images and parameters.
引用
收藏
页数:11
相关论文
共 33 条
[11]  
Hill C, 2016, S VIS LANG HUM CEN C, P162, DOI 10.1109/VLHCC.2016.7739680
[12]  
Holtzblatt K., 1997, Contextual design: defining customer-centered systems, V1
[13]  
Hudson S. E., 1997, Proceedings of the ACM Symposium on User Interface Software and Technology. 10th Annual Symposium. UIST '97, P179, DOI 10.1145/263407.263542
[14]   Variolite: Supporting Exploratory Programming by Data Scientists [J].
Kery, Mary Beth ;
Horvath, Amber ;
Myers, Brad .
PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, :1265-1276
[15]   Rethinking Laboratory Notebooks [J].
Klokmose, Clemens Nylandsted ;
Zander, Par-Ola .
PROCEEDINGS OF COOP 2010, 2010, :119-139
[16]   Jupyter Notebooks-a publishing format for reproducible computational workflows [J].
Kluyver, Thomas ;
Ragan-Kelley, Benjamin ;
Perez, Fernando ;
Granger, Brian ;
Bussonnier, Matthias ;
Frederic, Jonathan ;
Kelley, Kyle ;
Hamrick, Jessica ;
Grout, Jason ;
Corlay, Sylvain ;
Ivanov, Paul ;
Avila, Damin ;
Abdalla, Safia ;
Willing, Carol .
POSITIONING AND POWER IN ACADEMIC PUBLISHING: PLAYERS, AGENTS AND AGENDAS, 2016, :87-90
[17]   LITERATE PROGRAMMING [J].
KNUTH, DE .
COMPUTER JOURNAL, 1984, 27 (02) :97-111
[18]   How software engineers use documentation: The state of the practice [J].
Lethbridge, TC ;
Singer, J ;
Forward, A .
IEEE SOFTWARE, 2003, 20 (06) :35-+
[19]  
Martin Robert C., 2009, Clean code: a handbook of agile software craftsmanship
[20]  
Oleksik G., 2014, P 17 ACM C COMPUTER, P120, DOI [10.1145/2531602.2531709, DOI 10.1145/2531602.2531709]