Show simple item record

dc.contributor.authorBilal, Muhammad
dc.contributor.authorCanini, Marco
dc.contributor.authorRodrigues, Rodrigo
dc.date.accessioned2020-11-19T12:27:26Z
dc.date.available2020-11-19T12:27:26Z
dc.date.issued2020-10-12
dc.identifier.citationBilal, M., Canini, M., & Rodrigues, R. (2020). Finding the right cloud configuration for analytics clusters. Proceedings of the 11th ACM Symposium on Cloud Computing. doi:10.1145/3419111.3421305
dc.identifier.isbn9781450381376
dc.identifier.doi10.1145/3419111.3421305
dc.identifier.urihttp://hdl.handle.net/10754/666044
dc.description.abstractFinding good cloud configurations for deploying a single distributed system is already a challenging task, and it becomes substantially harder when a data analytics cluster is formed by multiple distributed systems since the search space becomes exponentially larger. In particular, recent proposals for single system deployments rely on benchmarking runs that become prohibitively expensive as we shift to joint optimization of multiple systems, as users have to wait until the end of a long optimization run to start the production run of their job. We propose Vanir, an optimization framework designed to operate in an ecosystem of multiple distributed systems forming an analytics cluster. To deal with this large search space, Vanir takes the approach of quickly finding a good enough configuration and then attempts to further optimize the configuration during production runs. This is achieved by combining a series of techniques in a novel way, namely a metrics-based optimizer for the benchmarking runs, and a Mondrian forest-based performance model and transfer learning during production runs. Our results show that Vanir can find deployments that perform comparably to the ones found by state-of-the-art single-system cloud configuration optimizers while spending 2X fewer benchmarking runs. This leads to an overall search cost that is 1.3 - 24X lower compared to the state-of-the-art. Additionally, when transfer learning can be used, Vanir can minimize the benchmarking runs even further, and use online optimization to achieve a performance comparable to the deployments found by today's single-system frameworks.
dc.publisherAssociation for Computing Machinery (ACM)
dc.relation.urlhttps://dl.acm.org/doi/10.1145/3419111.3421305
dc.rightsArchived with thanks to ACM
dc.titleFinding the right cloud configuration for analytics clusters
dc.typeConference Paper
dc.contributor.departmentComputer Science Program
dc.contributor.departmentComputer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division
dc.conference.date2020-10-19 to 2020-10-21
dc.conference.name11th ACM Symposium on Cloud Computing, SoCC 2020
dc.conference.locationVirtual, Online, USA
dc.eprint.versionPre-print
dc.contributor.institutionUCLouvain and IST(ULisboa)/INESC-ID
dc.contributor.institutionIST(ULisboa)/INESC-ID
dc.identifier.pages208-222
kaust.personCanini, Marco
dc.identifier.eid2-s2.0-85095439102
refterms.dateFOA2020-11-19T12:29:35Z
dc.date.published-online2020-10-12
dc.date.published-print2020-10-12


Files in this item

Thumbnail
Name:
3419111.3421305.pdf
Size:
337.3Kb
Format:
PDF
Description:
Published version

This item appears in the following Collection(s)

Show simple item record