High Performance Computing (B.Sc.)

The seminar will take place during the following time slot: Fri, 12PM - 2PM. All of our meetings will be face-to-face in room 3220, EAP2. Seminar participants must register in Friedolin. We have one presentation per week, so the number of participants is limited to 11.

Format

The seminar is divided into two parts. In the first part, we will read the book Efficient Processing of Deep Neural Networks. An electronic version of the book is available from thulb. The first chapter will be presented by the teaching staff, later chapters can be chosen by students. The second part discusses recent research papers in the area of High Performance Computing (HPC). Students may also choose any of the papers listed below as their seminar topic.

The general format of the seminar is similar to a reading group. That is, all participants read the book chapter or paper before attending the respective sessions. One person, either a student or teaching staff, becomes the expert on the topic. This person presents the topic for 30 minutes and then leads the discussion.

Student Papers

All participants will write a scientific paper on their chosen seminar topic. The paper is due via email four weeks after the respective topic was discussed in the seminar. Use the ACM proceedings template with the sigconf option for your paper. The paper should be 4-6 pages in length (excluding references). You can write your paper in English or German.

Supervision

Preparing presentations and writing papers is hard. You can always ask for advice! Start early and keep in touch with your advisor!

Two meetings with your advisor are required:

The first meeting should be at least one week before your presentation.
The second meeting should be at least one week before your paper is due.

Schedule

Date	What?	Chosen?
04/11	Kickoff	–
04/25	Introduction (Ch. 1)	–
05/02	Overview of Deep Neural Networks (Ch. 2)	✔
05/09	Key Metrics and Design Objectives (Ch. 3)	✔
05/16	Kernel Computation (Ch. 4)	✔
05/23	Designing DNN Accelerators (Ch. 5)	–
06/06	Operation Mapping on Specialized Hardware (Ch. 6)	–
06/13	Reducing Precision (Ch. 7)	✔
06/27	Exploiting Sparsity (Ch. 8)	✔
07/04	Roller: Fast and Efficient Tensor Compilation for Deep Learning (paper)	–

Topics

Select one of the following chapters/papers as your seminar topic. You may also suggest a topic/paper. Topics will be assigned on a first-come, first-served basis.

Efficient Processing of Deep Neural Networks (book):
- Overview of Deep Neural Networks (Ch. 2)
- Key Metrics and Design Objectives (Ch. 3)
- Kernel Computation (Ch. 4)
- Designing DNN Accelerators (Ch. 5)
- Operation Mapping on Specialized Hardware (Ch. 6)
- Reducing Precision (Ch. 7)
- Exploiting Sparsity (Ch. 8)
- Designing Efficient DNN Models (Ch. 9)
- Advanced Technologies (Ch. 10)
IPDPS23 (proceedings):
- GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets (paper)
- Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility Graph (paper)
- Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication (paper)
- Accelerating CNN inference on long vector architectures via co-design (paper)
- Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (paper)
- An Efficient 2D Method for Training Super-Large Deep Learning Models (paper)
- Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation (paper)
- Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training (paper)
- A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices (paper)
- A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication (paper)
- Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms (paper)
- Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits (paper)
MLSYS23:
- Reducing Activation Recomputation in Large Transformer Models (paper)
- Efficiently Scaling Transformer Inference (paper)
- On Optimizing the Communication of Model Parallelism (preprint)
CGO23:
- oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation (preprint)
- JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication (preprint)
ASPLOS23:
- BaCO: A Fast and Portable Bayesian Compiler Optimization Framework (paper)
- Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores (paper)
- Finding Unstable Code via Compiler-Driven Differential Testing (paper)
SC23:
- Application Performance Modeling via Tensor Completion (paper)
SC22:
- HammingMesh: A Network Topology for Large-Scale Deep Learning (paper)
- CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication (paper)
- DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale (paper)
- SpDISTAL: Compiling Distributed Sparse Tensor Computations (paper)
- STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training (paper)
- Lessons Learned on MPI+Threads Communication (paper)

Generative AI

The use of generative AI in any capacity is strictly prohibited. Write the paper by yourself!