High Performance Distributed Deep Learning: A Beginner's Guide
TimeMonday, July 298:30am - 12pm
DescriptionThe current wave of advances in Deep Learning (DL) has led to many exciting
challenges and opportunities for Computer Science and Artificial Intelligence
researchers alike. DL frameworks like TensorFlow, PyTorch, Caffe, and several
others have emerged that offer ease of use and flexibility to describe, train,
and deploy various types of Deep Neural Networks (DNNs). In this tutorial, we
will provide an overview of interesting trends in DNN design and how
cutting-edge hardware architectures are playing a key role in moving the field
forward. We will also present an overview of different DNN architectures and DL
frameworks. Most DL frameworks started with a single-node/single-GPU design.
However, approaches to parallelize the process of DNN training are also being
actively explored. The DL community has moved along different distributed
training designs that exploit communication runtimes like gRPC, MPI, and NCCL.
In this context, we highlight new challenges and opportunities for communication
runtimes to efficiently support distributed DNN training. We also highlight some
of our co-design efforts to utilize CUDA-Aware MPI for large-scale DNN training
on modern GPU clusters. Finally, we also include hands-on exercises to enable the
attendees gain first-hand experience of running distributed DNN training
experiments on a modern GPU cluster.