CAVA: A Visual Analytics System for Exploratory Columnar Data Augmentation Using Knowledge Graphs

被引:13
作者
Cashman, Dylan [1 ]
Xu, Shenyu [2 ]
Das, Subhajit [2 ]
Heimerl, Florian [3 ]
Liu, Cong [1 ]
Humayoun, Shah Rukh [4 ]
Gleicher, Michael [3 ]
Endert, Alex [2 ]
Chang, Remco [1 ]
机构
[1] Tufts Univ, Medford, MA 02155 USA
[2] Georgia Tech, Atlanta, GA USA
[3] Univ Wisconsin, Madison, WI USA
[4] San Francisco State Univ, San Francisco, CA 94132 USA
基金
美国国家科学基金会;
关键词
Task analysis; Visual analytics; Machine learning; Data models; Data visualization; Google; Training; Visual Analytics; Information Foraging; Data Augmentation; SEARCH;
D O I
10.1109/TVCG.2020.3030443
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most visual analytics systems assume that all foraging for data happens before the analytics process; once analysis begins, the set of data attributes considered is fixed. Such separation of data construction from analysis precludes iteration that can enable foraging informed by the needs that arise in-situ during the analysis. The separation of the foraging loop from the data analysis tasks can limit the pace and scope of analysis. In this paper, we present CAVA, a system that integrates data curation and data augmentation with the traditional data exploration and analysis tasks, enabling information foraging in-situ during analysis. Identifying attributes to add to the dataset is difficult because it requires human knowledge to determine which available attributes will be helpful for the ensuing analytical tasks. CAVA crawls knowledge graphs to provide users with a a broad set of attributes drawn from external data to choose from. Users can then specify complex operations on knowledge graphs to construct additional attributes. CAVA shows how visual analytics can help users forage for attributes by letting users visually explore the set of available data, and by serving as an interface for query construction. It also provides visualizations of the knowledge graph itself to help users understand complex joins such as multi-hop aggregations. We assess the ability of our system to enable users to perform complex data combinations without programming in a user study over two datasets. We then demonstrate the generalizability of CAVA through two additional usage scenarios. The results of the evaluation confirm that CAVA is effective in helping the user perform data foraging that leads to improved analysis outcomes, and offer evidence in support of integrating data augmentation as a part of the visual analytics pipeline.
引用
收藏
页码:1731 / 1741
页数:11
相关论文
共 76 条
  • [51] WORDNET - A LEXICAL DATABASE FOR ENGLISH
    MILLER, GA
    [J]. COMMUNICATIONS OF THE ACM, 1995, 38 (11) : 39 - 41
  • [52] Open Data Integration
    Miller, Renee J.
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (12): : 2130 - 2139
  • [53] Mintz Mike, 2009, P ACL, P1003
  • [54] RDFFrames: Knowledge Graph Access for Machine Learning Tools
    Mohamed, Aisha
    Abuoda, Ghadeer
    Ghanem, Abdurrahman
    Kaoudi, Zoi
    Aboulnaga, Ashraf
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (12): : 2889 - 2892
  • [55] Utilizing Knowledge Graphs for Neural Machine Translation Augmentation
    Moussallem, Diego
    Ngomo, Axel-Cyrille Ngonga
    Buitelaar, Paul
    Arcan, Mihael
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 139 - 146
  • [56] Deep Learning for Entity Matching: A Design Space Exploration
    Mudgal, Sidharth
    Li, Han
    Rekatsinas, Theodoros
    Doan, Anhai
    Park, Youngchoon
    Krishnan, Ganesh
    Deep, Rohit
    Arcaute, Esteban
    Raghavendra, Vijay
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 19 - 34
  • [57] Entity-Aspect Linking: Providing Fine-Grained Semantics of Entities in Context
    Nanni, Federico
    Ponzetto, Simone Paolo
    Dietz, Laura
    [J]. JCDL'18: PROCEEDINGS OF THE 18TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2018, : 49 - 58
  • [58] Park D. S., 2020, SPECAUGMENT
  • [59] Pedregosa F, 2011, J MACH LEARN RES, V12, P2825
  • [60] Information foraging
    Pirolli, P
    Card, S
    [J]. PSYCHOLOGICAL REVIEW, 1999, 106 (04) : 643 - 675