Deploying a Top-100 Supercomputer for Large Parallel Workloads: the Niagara Supercomputer
Event Type
Cluster Management
TimeThursday, August 111am - 11:30am
LocationCrystal A
DescriptionNiagara is currently the fastest supercomputer accessible to academics in Canada.
It was deployed at the beginning of 2018 and has been serving the research community ever since.
This homogeneous 60,000-core cluster, owned by the University of Toronto and operated by SciNet, was intended to enable large parallel jobs and has a measured performance
of 3.02 petaflops, which put it at #53 in the June 2018 TOP500 list.
It was designed to optimize throughput of a range of scientific codes running at scale,
energy efficiency, and network and storage performance and capacity.
It replaces two system that SciNet operated for over 8 years, the Tightly Coupled
System (TCS) and the General Purpose Cluster (GPC).
In this paper we will describe the transition process from these two systems, the procurement and
deployment processes, as well as the unique features that make Niagara
a one of a kind machine in Canada.
We believe that there are important lessons and knowledge that can be transferred
and applied in other supercomputer centers as ours.