• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • PhD Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • PhD Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguideTheses and Dissertations LibguideSubmit an Item

    Statistics

    Display statistics

    Tackling the Communication Bottlenecks of Distributed Deep Learning Training Workloads

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Tackling_the_Communication_Bottlenecks_of_Distributed_Deep_Learning_Training_Workloads.pdf
    Size:
    12.64Mb
    Format:
    PDF
    Description:
    Ph.D. Dissertation
    Download
    Type
    Dissertation
    Authors
    Ho, Chen-Yu cc
    Advisors
    Canini, Marco cc
    Committee members
    Keyes, David E. cc
    Fahmy, Suhaib A. cc
    Park, KyoungSoo
    Program
    Computer Science
    KAUST Department
    Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
    Date
    2023-08
    Permanent link to this record
    http://hdl.handle.net/10754/693744
    
    Metadata
    Show full item record
    Abstract
    Deep Neural Networks (DNNs) find widespread applications across various domains, including computer vision, recommendation systems, and natural language processing. Despite their versatility, training DNNs can be a time-consuming process, and accommodating large models and datasets on a single machine is often impractical. To tackle these challenges, distributed deep learning (DDL) training workloads have gained increasing significance. However, DDL training introduces synchronization requirements among nodes, and the mini-batch stochastic gradient descent algorithm heavily burdens network connections. This dissertation proposes, analyzes, and evaluates three solutions addressing the communication bottleneck in DDL learning workloads. The first solution, SwitchML, introduces an in-network aggregation (INA) primitive that accelerates DDL workloads. By aggregating model updates from multiple workers within the network, SwitchML reduces the volume of exchanged data. This approach, which incorporates switch processing, end-host protocols, and Deep Learning frameworks, enhances training speed by up to 5.5 times for real-world benchmark models. The second solution, OmniReduce, is an efficient streaming aggregation system designed for sparse collective communication. It optimizes performance for parallel computing applications, such as distributed training of large-scale recommendation systems and natural language processing models. OmniReduce achieves maximum effective bandwidth utilization by transmitting only nonzero data blocks and leveraging fine-grained parallelization and pipelining. Compared to state-of-the-art TCP/IP and RDMA network solutions, OmniReduce outperforms them by 3.5 to 16 times, delivering significantly better performance for network-bottlenecked DNNs, even at 100 Gbps. The third solution, CoInNetFlow, addresses congestion in shared data centers, where multiple DNN training jobs compete for bandwidth on the same node. The study explores the feasibility of coflow scheduling methods in hierarchical and multi-tenant in-network aggregation communication patterns. CoInNetFlow presents an innovative utilization of the Sincronia priority assignment algorithm. Through packet-level DDL job simulation, the research demonstrates that appropriate weighting functions, transport layer priority scheduling, and gradient compression on low-priority tensors can significantly improve the median Job Completion Time Inflation by over $70\%$. Collectively, this dissertation contributes to mitigating the network communication bottleneck in distributed deep learning. The proposed solutions can enhance the efficiency and speed of distributed deep learning systems, ultimately improving the performance of DNN training across various domains.
    DOI
    10.25781/KAUST-2V95Q
    ae974a485f413a2113503eed53cd6c53
    10.25781/KAUST-2V95Q
    Scopus Count
    Collections
    PhD Dissertations; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division

    entitlement

     
    DSpace software copyright © 2002-2023  DuraSpace
    Quick Guide | Contact Us | KAUST University Library
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.