From BigDog to BigDawg: Transitioning an HPC Cluster for Sustainability
Event Type
TimeTuesday, July 306:30pm - 8:30pm
LocationCrystal Foyer and Crystal B
DescriptionThis paper relates the experiences of managing the transition of a high performance computing (HPC) cluster from Rocks, SGE, and Cisco to OpenHPC, SLURM, and Dell. This transition was made because of sustainability issues related to security, the software, and the hardware. The BigDog HPC cluster was placed in production at Southern Illinois University Carbondale (SIUC) December, 2015, with 40 Cisco servers containing 800 CPU cores running Rocks management software and the SGE job scheduler. Sustainability issues of BigDog were encountered, especially related to keeping software updated for security reasons, and the decision was made to update the cluster for sustainability. This was done by replacing some Cisco servers with Dell hardware, and replacing Rocks with OpenHPC, and replacing SGE with SLURM. The name of the cluster was changed to BigDawg in keeping with the southern pronunciation and the spelling of the names of the university mascots Grey Dawg and Brown Dawg.