High Performance Computing (B.Sc.)
The seminar takes place in the following time slot: Mon, 10:00AM  12:00PM. All of our meetings are facetoface in room 3220, EAP2. Participants of the seminar have to register in Friedolin.
Format
The seminar has two parts. During the first part we read selected chapters of the book Introduction to High Performance Computing for Scientists and Engineers. An electronic version of the book is available through thulb. Earlier chapters are presented by the teaching staff, later chapters can be selected as student topics. The second part discusses recent research papers in the domain of High Performance Computing (HPC). Students may also choose any of the papers given below as their seminar topic.
The general format of the seminar is similar to that of a reading group. This means that all participants read the book chapter(s) or paper before attending the respective sessions. A single person, either a student or somebody of the teaching staff, becomes an expert in the topic. This person presents the topic in 30 minutes and leads the discussion afterwards.
Student Papers
All participants write a scientific paper about their chosen seminar topic. The paper has to be submitted via email four weeks after the respective topic was discussed in the seminar. Use the ACM proceedings template with the sigconf option for your paper. The paper should be 46 pages in length (excl. references). You may write your paper in either English or German.
Supervision
Preparing presentations and writing scientific papers is hard. You may ask for advise at any time! Start early and keep in touch with your advisor!
 Two meetings with your advisor are mandatory:
The first meeting should be at least one week before your presentation.
The second meeting should be at least one week before your paper submission deadline.
Schedule
Date 
What? 
Chosen? 

04/08 
Kickoff 
– 
04/15 
Deadline for choosing a topic 
– 
04/15 
Modern Processors (Ch. 1) 
– 
04/22 
Basic optimization techniques for serial code (Ch. 2) 
✔ 
04/29 
Data access optimization (Ch. 3) 
✔ 
05/06 
Parallel Computers (Ch. 4) 
✔ 
05/13 
Basics of Parallelization (Ch. 5) 
✔ 
05/27 
OpenMP (Ch. 6 and 7) 
✔ 
06/03 
Locality and NUMA (Ch. 8) 
✔ 
06/10 
Message Passing Interface (Ch. 9 and 10) 
✔ 
06/17 
Lessons Learned on MPI+Threads Communication 
✔ 
06/24 
Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation 
✔ 
06/25 
Get Together, 6PM, Room: 3220 
– 
07/01 
Occamy: Elastically Sharing a SIMD Coprocessor across Multiple CPU Cores 
✔ 
Topics
Select any of the following chapters/papers as your seminar topic. Additionally, you may also suggest a topic/paper. Topics will be given out on a firstcome, firstserved basis.
Introduction to High Performance Computing for Scientists and Engineers (book):
Basic optimization techniques for serial code (Ch. 2)
Data Access Optimization (Ch. 3)
Parallel Computers (Ch. 4)
Basics of Parallelization (Ch. 5)
OpenMP (Ch. 6 and 7)
Locality and NUMA (Ch. 8)
Message Passing Interface (Ch. 9 and 10)

GraphTensor: Comprehensive GNNAcceleration Framework for Efficient Parallel Processing of Massive Datasets (paper)
Accurate and Efficient Distributed COVID19 Spread Prediction based on a LargeScale TimeVarying People Mobility Graph (paper)
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and ReduceScatter Communication (paper)
Accelerating CNN inference on long vector architectures via codesign (paper)
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU (paper)
An Efficient 2D Method for Training SuperLarge Deep Learning Models (paper)
Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation (paper)
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training (paper)
A Novel Triangular SpaceFilling Curve for CacheOblivious InPlace Transposition of Square Matrices (paper)
A Machine Learning Approach Towards Runtime Optimisation of Matrix Multiplication (paper)
Porting a Computational Fluid Dynamics Code with AMR to Largescale GPU Platforms (paper)
Neural Network Compiler for Parallel HighThroughput Simulation of Digital Circuits (paper)
SC23:
Application Performance Modeling via Tensor Completion (paper)
SC22:
HammingMesh: A Network Topology for LargeScale Deep Learning (paper)
CA3DMM: A New Algorithm Based on a Unified View of Parallel Matrix Multiplication (paper)
DeepSpeedInference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale (paper)
SpDISTAL: Compiling Distributed Sparse Tensor Computations (paper)
STRONGHOLD: Fast and Affordable BillionScale Deep Learning Model Training (paper)
Lessons Learned on MPI+Threads Communication (paper)
Generative AI
The use of generative AI in whatever capacity is prohibited. Write the paper by yourself!