Approaches to High Resolution Network Telemetry & Analytics with Machine Learning In Support of High Performance Computing
TimeTuesday, July 306:30pm - 8:30pm
LocationCrystal Foyer and Crystal B
DescriptionNetwork measurement and analytics are key to the overall operation, planning and root-cause analysis of issues within network infrastructure. The increased line rate (100Gb/s and beyond) of network connections along with the explosion in data transfer needs for scientific datasets has changed the methodology of effectively monitoring network infrastructure. Network measurement utilizing 5 minute polling intervals with binary threshold based alerting has proven to be unreliable for accurate measurement and alerting of critical network systems. This presentation will discuss the ongoing efforts at NCSA to gather high resolution (< 10s collection interval) network telemetry data utilizing SNMP and streaming telemetry with machine learning being utilized to analyze and generate alerts on the data being collected.