Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows

Handle URI:
http://hdl.handle.net/10754/599412
Title:
Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows
Authors:
Ikeda, Robert; Cho, Junsang; Fang, Charlie; Salihoglu, Semih; Torikai, Satoshi; Widom, Jennifer
Abstract:
Panda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance - provenance information stored at the processing-node level - and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest. © 2012 IEEE.
Citation:
Ikeda R, Cho J, Fang C, Salihoglu S, Torikai S, et al. (2012) Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows. 2012 IEEE 28th International Conference on Data Engineering. Available: http://dx.doi.org/10.1109/icde.2012.118.
Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Journal:
2012 IEEE 28th International Conference on Data Engineering
Issue Date:
Apr-2012
DOI:
10.1109/icde.2012.118
Type:
Conference Paper
Sponsors:
This work is supported by the National Science Foundation (IIS-0904497)and a KAUST research grant.
Appears in Collections:
Publications Acknowledging KAUST Support

Full metadata record

DC FieldValue Language
dc.contributor.authorIkeda, Roberten
dc.contributor.authorCho, Junsangen
dc.contributor.authorFang, Charlieen
dc.contributor.authorSalihoglu, Semihen
dc.contributor.authorTorikai, Satoshien
dc.contributor.authorWidom, Jenniferen
dc.date.accessioned2016-02-28T05:50:39Zen
dc.date.available2016-02-28T05:50:39Zen
dc.date.issued2012-04en
dc.identifier.citationIkeda R, Cho J, Fang C, Salihoglu S, Torikai S, et al. (2012) Provenance-Based Debugging and Drill-Down in Data-Oriented Workflows. 2012 IEEE 28th International Conference on Data Engineering. Available: http://dx.doi.org/10.1109/icde.2012.118.en
dc.identifier.doi10.1109/icde.2012.118en
dc.identifier.urihttp://hdl.handle.net/10754/599412en
dc.description.abstractPanda (for Provenance and Data) is a system that supports the creation and execution of data-oriented workflows, with automatic provenance generation and built-in provenance tracing operations. Workflows in Panda are arbitrary a cyclic graphs containing both relational (SQL) processing nodes and opaque processing nodes programmed in Python. For both types of nodes, Panda generates logical provenance - provenance information stored at the processing-node level - and uses the generated provenance to support record-level backward tracing and forward tracing operations. In our demonstration we use Panda to integrate, process, and analyze actual education data from multiple sources. We specifically demonstrate how Panda's provenance generation and tracing capabilities can be very useful for workflow debugging, and for drilling down on specific results of interest. © 2012 IEEE.en
dc.description.sponsorshipThis work is supported by the National Science Foundation (IIS-0904497)and a KAUST research grant.en
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)en
dc.titleProvenance-Based Debugging and Drill-Down in Data-Oriented Workflowsen
dc.typeConference Paperen
dc.identifier.journal2012 IEEE 28th International Conference on Data Engineeringen
dc.contributor.institutionStanford University, Palo Alto, United Statesen
All Items in KAUST are protected by copyright, with all rights reserved, unless otherwise indicated.