Training Working Group

HPC Working Group

Mission

Create MLPerf™ HPC training benchmarks based on science applications to run on large-scale supercomputers.

Purpose

The MLPerf HPC benchmark suite includes scientific applications that use ML, especially Deep Learning (DL) at HPC scale. These benchmarks will help project future system performance and assist in the design and specification of future HPC systems. The benchmark suite aims to evaluate behavior unique to HPC applications and improve our understanding across several dimensions. First, we explore model-system interactions. Second, we characterize and optimize deep learning workloads, and identify potential bottlenecks. Last, we quantify the scalability for different deep learning methods, frameworks and metrics on hardware diverse HPC systems.

Deliverables

MLPerf HPC Training benchmarks with rules and definitions
Reference implementations of the MLPerf HPC Training benchmarks
Release roadmap for future versions
Publish benchmark results annually during Supercomputing

Meeting Schedule

Weekly alternating between Monday at 8:00-9:00AM Pacific and Monday at 3:00-4:0PM Pacific.

How to Join

Use this link to request to join the group/mailing list, and receive the meeting invite:
HPC Google Group.
Requests are manually reviewed, so please be patient.

Working Group Resources

Shared documents and meeting minutes:
1. Associate a Google account with your e-mail address.
2. Ask to join our Public Google Group.
3. Once approved, go to the HPC folder in our Public Google Drive.
GitHub (public)
1. If you want to contribute code, please sign our CLA first.
2. GitHub link.

Working Group Chair Emails

Murali Emani (memani@anl.gov)

Steve Farrell (sfarrell@lbl.gov)

Working Group Chair Bios

Murali Emani is a Computer Scientist in the Data Science group with the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory. His research interests include scalable machine learning, high performance computing, emerging HPC and AI architectures. Prior, he was a Postdoctoral Research Staff Member at the Lawrence Livermore National Laboratory, US. He obtained his PhD from University of Edinburgh, UK. He was recently awarded DoE ASCR grant to develop a framework ‘HPC-FAIR’ to manage datasets and AI Models for Analyzing and Optimizing Scientific Applications.

Steven Farrell is a Machine Learning Engineer at the NERSC supercomputing center. He supports scientific deep learning workflows on HPC systems through software development, benchmarking, user support, and training. His research interests include applications of deep learning to high energy physics, generative modeling, and applications of learning on structured data such as graphs. He was a member of the ATLAS experiment at CERN for many years, first during his Ph.D studies at UC Irvine working on searches for electroweak supersymmetry, and then as a postdoc at Berkeley Lab working on software development and machine learning applications for analysis and simulation.