High Performance Computing (B.Sc.)

The seminar takes place in the following time slot: Mon, 10:00AM - 12:00PM. All of our meetings are face-to-face in room 3220, EAP2. Participants of the seminar have to register in Friedolin.

Format

The seminar has two parts. During the first part we read selected chapters of the book Introduction to High Performance Computing for Scientists and Engineers. An electronic version of the book is available through thulb. Earlier chapters are presented by the teaching staff, later chapters can be selected as student topics. The second part discusses recent research papers in the domain of High Performance Computing (HPC). Students may also choose any of the papers given below as their seminar topic.

The general format of the seminar is similar to that of a reading group. This means that all participants read the book chapter(s) or paper before attending the respective sessions. A single person, either a student or somebody of the teaching staff, becomes an expert in the topic. This person presents the topic in 30 minutes and leads the discussion afterwards.

Student Papers

All participants write a scientific paper about their chosen seminar topic. The paper has to be submitted via email four weeks after the respective topic was discussed in the seminar. Use the ACM proceedings template with the sigconf option for your paper. The paper should be 4-6 pages in length (excl. references). You may write your paper in either English or German.

Supervision

Preparing presentations and writing scientific papers is hard. You may ask for advise at any time! Start early and keep in touch with your advisor!

Two meetings with your advisor are mandatory:

The first meeting should be at least one week before your presentation.
The second meeting should be at least one week before your paper submission deadline.

Schedule

Date	What?	Chosen?
04/08	Kickoff	–
04/15	Deadline for choosing a topic	–
04/15	Modern Processors (Ch. 1)	–
04/22	Basic optimization techniques for serial code (Ch. 2)	✔
04/29	Data access optimization (Ch. 3)	✔
05/06	Parallel Computers (Ch. 4)	✔
05/13	Basics of Parallelization (Ch. 5)	✔
05/27	OpenMP (Ch. 6 and 7)	✔
06/03	Locality and NUMA (Ch. 8)	✔
06/10	Message Passing Interface (Ch. 9 and 10)	✔
06/17	Lessons Learned on MPI+Threads Communication	✔
06/24	Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation	✔
06/25	Get Together, 6PM, Room: 3220	–
07/01	Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores	✔

Topics

Select any of the following chapters/papers as your seminar topic. Additionally, you may also suggest a topic/paper. Topics will be given out on a first-come, first-served basis.

Introduction to High Performance Computing for Scientists and Engineers (book):
- Basic optimization techniques for serial code (Ch. 2)
- Data Access Optimization (Ch. 3)
- Parallel Computers (Ch. 4)
- Basics of Parallelization (Ch. 5)
- OpenMP (Ch. 6 and 7)
- Locality and NUMA (Ch. 8)
- Message Passing Interface (Ch. 9 and 10)
IPDPS23 (proceedings):
- GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets (paper)
- Accurate and Efficient Distributed COVID-19 Spread Prediction based on a Large-Scale Time-Varying People Mobility Graph (paper)
- Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication (paper)
- Accelerating CNN inference on long vector architectures via co-design (paper)
- Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (paper)
- An Efficient 2D Method for Training Super-Large Deep Learning Models (paper)
- Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation (paper)
- Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training (paper)
- A Novel Triangular Space-Filling Curve for Cache-Oblivious In-Place Transposition of Square Matrices (paper)
- A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication (paper)
- Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms (paper)
- Neural Network Compiler for Parallel High-Throughput Simulation of Digital Circuits (paper)
MLSYS23:
- Reducing Activation Recomputation in Large Transformer Models (paper)
- Efficiently Scaling Transformer Inference (paper)
- On Optimizing the Communication of Model Parallelism (preprint)
CGO23:
- oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation (preprint)
- JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication (preprint)
ASPLOS23:
- BaCO: A Fast and Portable Bayesian Compiler Optimization Framework (paper)
- Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores (paper)
- Finding Unstable Code via Compiler-Driven Differential Testing (paper)
SC23:
- Application Performance Modeling via Tensor Completion (paper)
SC22:
- HammingMesh: A Network Topology for Large-Scale Deep Learning (paper)
- CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication (paper)
- DeepSpeed-Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale (paper)
- SpDISTAL: Compiling Distributed Sparse Tensor Computations (paper)
- STRONGHOLD: Fast and Affordable Billion-Scale Deep Learning Model Training (paper)
- Lessons Learned on MPI+Threads Communication (paper)

Generative AI

The use of generative AI in whatever capacity is prohibited. Write the paper by yourself!