Reading Group

The lab organizes an informal reading group where we study research papers of interest to us. You can join the reading group if you are interested! Just contact us for details!

2024 Schedule

Date

Topic

11/11

RDMA-Based Algorithms for Sparse Matrix Multiplication

on GPUs (preprint)

11/04

Understanding the Limitations of Mathematical Reasoning

in Large Language Models (preprint)

10/28

The Llama 3 Herd of Models (bis einschließlich 3.3) (paper)

10/21

TCP: A Tensor Contraction Processor for AI Workloads (paper)

10/14

SmartMem: Layout Transformation Elimination and Adaptation

for Efficient DNN Execution on Mobile (paper)

09/16

The MLIR Transform Dialect.

Your compiler is more powerful than you think (preprint)

09/10

Tessel: Boosting Distributed Execution of Large DNN Models

via Flexible Schedule Search (paper)

08/27

nnScaler: Constraint-Guided Parallelization Plan Generation

for Deep Learning Training (paper)

08/19

PyTorch 2: Faster Machine Learning Through Dynamic Python

Bytecode Transformation and Graph Compilation (paper)

08/05

MLIR-Based Code Generation for GPU Tensor Cores (paper)

07/30

Optimal Kernel Orchestration for Tensor Programs with Korch (paper)

07/23

Harnessing Discrete Representations for

Continual Reinforcement Learning (preprint)

07/16

Optimizing Deep Learning Inference via Global Analysis

and Tensor Expressions (paper)

07/08

A Code Generator for High-Performance Tensor

Contractions on GPUs (paper)

07/02

An Efficient 2D Method for

Training Super-Large Deep Learning Models (paper)

06/24

JITSPMM: Just-in-Time Instruction Generation for

Accelerated Sparse Matrix-Matrix Multiplication (paper)

06/18

A Generalized Packing Analysis and Transformation (paper)

06/11

YOLOv10: Real-Time End-to-End Object Detection (preprint)

06/04

A Machine Learning Approach Towards Runtime Optimization

of Matrix Multiplication (paper)

05/28

With Shared Microexponents,

A Little Shifting Goes a Long Way (paper)

05/21

Fine-mixing: Mitigating Backdoors in Fine-tuned

Language Models (preprint)

05/14

Classical Simulation of Quantum Supremacy Circuits (preprint)

05/07

Triton: An Intermediate Language and Compiler for Tiled

Neural Network Computations (preprint)

05/02

MLP-Mixer: An all-MLP Architecture for Vision (paper)

04/25

Memory-Efficient Fine-Tuning of Compressed

Large Language Models via sub-4-bit Integer Quantization (paper)

04/18

Spectre Attacks: Exploiting Speculative Execution

(paper)

04/11

The Deep Learning Compiler:

A Comprehensive Survey (paper)

04/04

Large Language Models for Compiler Optimization (preprint)

03/27

FlashAttention: Fast and Memory-Efficient

Exact Attention with IO-Awareness (paper)

03/20

Peer Review Session

03/14

TensorIR: An Abstraction for Automatic

Tensorized Program Optimization (paper)

03/06

FP8 Quantization: The Power of the Exponent (paper)

02/28

Novel adaptive quantization methodology

for 8-bit floating-point DNN training (paper)

02/21

A Tensor Compiler for Unified Machine Learning

Prediction Serving (paper)

02/14

LoopTune: Optimizing Tensor Computations

with Reinforcement Learning (preprint)

02/07

A massively parallel tensor contraction framework

for coupled-cluster computations (paper)

01/31

LoopStack: a Lightweight Tensor Algebra Compiler Stack (preprint)

01/24

Towards an efficient use of the BLAS library

for multilinear tensor contractions (paper)

01/17

Chapter 5.7: Efficient Processing of Deep Neural Networks (book)

2023 Schedule

Date

Topic

12/19

oneDNN Graph Compiler: A Hybrid Approach for High-Performance

Deep Learning Compilation (preprint)

12/12

Chapter 5.1 - 5.6: Efficient Processing of Deep Neural Networks (book)

12/05

RISC-V Composable Extensions for MX Microscaling Data Formats

for AI Tensors: Part One: Introduction to MX Data (blog post)

11/28

Chapter 4: Efficient Processing of Deep Neural Networks (book)

11/22

Chapter 3: Efficient Processing of Deep Neural Networks (book)

11/14

HighLight: Efficient and Flexible DNN Acceleration

with Hierarchical Structured Sparsity (preprint)

11/07

Higher-dimensional processing using a photonic tensor core

with continuous-time data (paper)

11/01

Toward Matrix Multiplication for Deep Learning Inference

on the Xilinx Versal (paper)

10/24

Optimizing Direct Convolutions on ARM Multi-Cores (preprint)

08/28

Hot Chips 2023 watch party (program)

07/12

DGEMM on Integer Matrix Multiplication Unit (preprint)

07/05

A Design of a High-Performance GEMM-like

Tensor-Tensor Multiplication (preprint, paper)

06/28

High-Performance Tensor Contraction without Transposition (paper)

06/21

Can Computers Learn Common Sense? (article)

06/14

Dynamo: amazon’s highly available key-value store (paper)

06/07

A White Paper on Neural Network Quantization (white paper)

05/31

LazyTensor: combining eager execution with

domain-specific compilers (preprint)

05/24

Neural Galerkin Scheme with Active Learning for High-Dimensional

Evolution Equations (preprint)

05/17

Architecture and Performance of Devito, a System for Automated

Stencil Computation (paper)

05/10

Efficient Design Space Exploration for Sparse Mixed Precision Neural

Architectures (paper)

05/03

BLIS: A Framework for Rapidly Instantiating BLAS Functionality (paper)

04/26

Anatomy of High-Performance Matrix Multiplication (preprint)

04/19

Harnessing Deep Learning and HPC Kernels via High-Level Loop and

Tensor Abstractions on CPU Architectures (preprint)

04/12

Tensor Contractions Tutorial (tutorial)

03/20

Speculative Vectorisation with Selective Replay (paper)

03/14

An Attack on The Speculative Vectorization: Leakage from Higher

Dimensional Speculation (preprint)

03/07

DLA: Compiler and FPGA Overlay for Neural Network Inference

Acceleration (paper)

02/24

MLPerf Mobile Inference Benchmark (preprint)

02/03

Massively parallel universal linear transformations using a

wavelength-multiplexed diffractive optical network (paper)

01/27

Efficient Quantized Sparse Matrix Operations on Tensor Cores (paper)