Show simple item record

dc.contributor.authorIkeda, Robert
dc.contributor.authorSalihoglu, Semih
dc.contributor.authorWidom, Jennifer
dc.date.accessioned2016-02-28T05:50:40Z
dc.date.available2016-02-28T05:50:40Z
dc.date.issued2011
dc.identifier.citationIkeda R, Salihoglu S, Widom J (2011) Provenance-based refresh in data-oriented workflows. Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM ’11. Available: http://dx.doi.org/10.1145/2063576.2063816.
dc.identifier.doi10.1145/2063576.2063816
dc.identifier.urihttp://hdl.handle.net/10754/599413
dc.description.abstractWe consider a general workflow setting in which input data sets are processed by a graph of transformations to produce output results. Our goal is to perform efficient selective refresh of elements in the output data, i.e., compute the latest values of specific output elements when the input data may have changed. We explore how data provenance can be used to enable efficient refresh. Our approach is based on capturing one-level data provenance at each transformation when the workflow is run initially. Then at refresh time provenance is used to determine (transitively) which input elements are responsible for given output elements, and the workflow is rerun only on that portion of the data needed for refresh. Our contributions are to formalize the problem setting and the problem itself, to specify properties of transformations and provenance that are required for efficient refresh, and to provide algorithms that apply to a wide class of transformations and workflows. We have built a prototype system supporting the features and algorithms presented in the paper. We report preliminary experimental results on the overhead of provenance capture, and on the crossover point between selective refresh and full workflow recomputation. © 2011 ACM.
dc.description.sponsorshipThis work is supported by the National Science Foundation undergrants IIS-0414762 and IIS-0904497 and by a KAUST researchgrant.
dc.publisherAssociation for Computing Machinery (ACM)
dc.subjectprovenance
dc.titleProvenance-based refresh in data-oriented workflows
dc.typeConference Paper
dc.identifier.journalProceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11
dc.contributor.institutionStanford University, Palo Alto, United States


This item appears in the following Collection(s)

Show simple item record