Reading Group

The lab organizes an informal reading group where we study research contributions of interest to us. You may join the reading group if you are interested! Simply get in touch for details!

2024 Schedule

Date

Topic

04/25

Memory-Efficient Fine-Tuning of Compressed

Large Language Models via sub-4-bit Integer Quantization (paper)

04/18

Spectre Attacks: Exploiting Speculative Execution

(paper)

04/11

The Deep Learning Compiler:

A Comprehensive Survey (paper)

04/04

Large Language Models for Compiler Optimization (preprint)

03/27

FlashAttention: Fast and Memory-Efficient

Exact Attention with IO-Awareness (paper)

03/20

Peer Review Session

03/14

TensorIR: An Abstraction for Automatic

Tensorized Program Optimization (paper)

03/06

FP8 Quantization: The Power of the Exponent (paper)

02/28

Novel adaptive quantization methodology

for 8-bit floating-point DNN training (paper)

02/21

A Tensor Compiler for Unified Machine Learning

Prediction Serving (paper)

02/14

LoopTune: Optimizing Tensor Computations

with Reinforcement Learning (preprint)

02/07

A massively parallel tensor contraction framework

for coupled-cluster computations (paper)

01/31

LoopStack: a Lightweight Tensor Algebra Compiler Stack (preprint)

01/24

Towards an efficient use of the BLAS library

for multilinear tensor contractions (paper)

01/17

Chapter 5.7: Efficient Processing of Deep Neural Networks (book)

2023 Schedule

Date

Topic

12/19

oneDNN Graph Compiler: A Hybrid Approach for High-Performance

Deep Learning Compilation (preprint)

12/12

Chapter 5.1 - 5.6: Efficient Processing of Deep Neural Networks (book)

12/05

RISC-V Composable Extensions for MX Microscaling Data Formats

for AI Tensors: Part One: Introduction to MX Data (blog post)

11/28

Chapter 4: Efficient Processing of Deep Neural Networks (book)

11/22

Chapter 3: Efficient Processing of Deep Neural Networks (book)

11/14

HighLight: Efficient and Flexible DNN Acceleration

with Hierarchical Structured Sparsity (preprint)

11/07

Higher-dimensional processing using a photonic tensor core

with continuous-time data (paper)

11/01

Toward Matrix Multiplication for Deep Learning Inference

on the Xilinx Versal (paper)

10/24

Optimizing Direct Convolutions on ARM Multi-Cores (preprint)

08/28

Hot Chips 2023 watch party (program)

07/12

DGEMM on Integer Matrix Multiplication Unit (preprint)

07/05

A Design of a High-Performance GEMM-like

Tensor-Tensor Multiplication (preprint, paper)

06/28

High-Performance Tensor Contraction without Transposition (paper)

06/21

Can Computers Learn Common Sense? (article)

06/14

Dynamo: amazon’s highly available key-value store (paper)

06/07

A White Paper on Neural Network Quantization (white paper)

05/31

LazyTensor: combining eager execution with

domain-specific compilers (preprint)

05/24

Neural Galerkin Scheme with Active Learning for High-Dimensional

Evolution Equations (preprint)

05/17

Architecture and Performance of Devito, a System for Automated

Stencil Computation (paper)

05/10

Efficient Design Space Exploration for Sparse Mixed Precision Neural

Architectures (paper)

05/03

BLIS: A Framework for Rapidly Instantiating BLAS Functionality (paper)

04/26

Anatomy of High-Performance Matrix Multiplication (preprint)

04/19

Harnessing Deep Learning and HPC Kernels via High-Level Loop and

Tensor Abstractions on CPU Architectures (preprint)

04/12

Tensor Contractions Tutorial (tutorial)

03/20

Speculative Vectorisation with Selective Replay (paper)

03/14

An Attack on The Speculative Vectorization: Leakage from Higher

Dimensional Speculation (preprint)

03/07

DLA: Compiler and FPGA Overlay for Neural Network Inference

Acceleration (paper)

02/24

MLPerf Mobile Inference Benchmark (preprint)

02/03

Massively parallel universal linear transformations using a

wavelength-multiplexed diffractive optical network (paper)

01/27

Efficient Quantized Sparse Matrix Operations on Tensor Cores (paper)