Reading Group
The lab organizes an informal reading group where we study research papers of interest to us. You can join the reading group if you are interested! Just contact us for details!
2024 Schedule
Date |
Topic |
---|---|
12/09 |
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability (preprint) |
12/05 |
SpinQuant: LLM Quantization with Learned Rotations (preprint) |
11/25 |
M^3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUs (paper) |
11/18 |
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation (paper) |
11/11 |
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs (preprint) |
11/04 |
Understanding the Limitations of Mathematical Reasoning in Large Language Models (preprint) |
10/28 |
The Llama 3 Herd of Models (bis einschließlich 3.3) (paper) |
10/21 |
TCP: A Tensor Contraction Processor for AI Workloads (paper) |
10/14 |
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile (paper) |
09/16 |
The MLIR Transform Dialect. Your compiler is more powerful than you think (preprint) |
09/10 |
Tessel: Boosting Distributed Execution of Large DNN Models via Flexible Schedule Search (paper) |
08/27 |
nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training (paper) |
08/19 |
PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation (paper) |
08/05 |
MLIR-Based Code Generation for GPU Tensor Cores (paper) |
07/30 |
Optimal Kernel Orchestration for Tensor Programs with Korch (paper) |
07/23 |
Harnessing Discrete Representations for Continual Reinforcement Learning (preprint) |
07/16 |
Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions (paper) |
07/08 |
A Code Generator for High-Performance Tensor Contractions on GPUs (paper) |
07/02 |
An Efficient 2D Method for Training Super-Large Deep Learning Models (paper) |
06/24 |
JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication (paper) |
06/18 |
A Generalized Packing Analysis and Transformation (paper) |
06/11 |
YOLOv10: Real-Time End-to-End Object Detection (preprint) |
06/04 |
A Machine Learning Approach Towards Runtime Optimization of Matrix Multiplication (paper) |
05/28 |
With Shared Microexponents, A Little Shifting Goes a Long Way (paper) |
05/21 |
Fine-mixing: Mitigating Backdoors in Fine-tuned Language Models (preprint) |
05/14 |
Classical Simulation of Quantum Supremacy Circuits (preprint) |
05/07 |
Triton: An Intermediate Language and Compiler for Tiled Neural Network Computations (preprint) |
05/02 |
MLP-Mixer: An all-MLP Architecture for Vision (paper) |
04/25 |
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization (paper) |
04/18 |
Spectre Attacks: Exploiting Speculative Execution (paper) |
04/11 |
The Deep Learning Compiler: A Comprehensive Survey (paper) |
04/04 |
Large Language Models for Compiler Optimization (preprint) |
03/27 |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness (paper) |
03/20 |
Peer Review Session |
03/14 |
TensorIR: An Abstraction for Automatic Tensorized Program Optimization (paper) |
03/06 |
FP8 Quantization: The Power of the Exponent (paper) |
02/28 |
Novel adaptive quantization methodology for 8-bit floating-point DNN training (paper) |
02/21 |
A Tensor Compiler for Unified Machine Learning Prediction Serving (paper) |
02/14 |
LoopTune: Optimizing Tensor Computations with Reinforcement Learning (preprint) |
02/07 |
A massively parallel tensor contraction framework for coupled-cluster computations (paper) |
01/31 |
LoopStack: a Lightweight Tensor Algebra Compiler Stack (preprint) |
01/24 |
Towards an efficient use of the BLAS library for multilinear tensor contractions (paper) |
01/17 |
Chapter 5.7: Efficient Processing of Deep Neural Networks (book) |
2023 Schedule
Date |
Topic |
---|---|
12/19 |
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation (preprint) |
12/12 |
Chapter 5.1 - 5.6: Efficient Processing of Deep Neural Networks (book) |
12/05 |
RISC-V Composable Extensions for MX Microscaling Data Formats for AI Tensors: Part One: Introduction to MX Data (blog post) |
11/28 |
Chapter 4: Efficient Processing of Deep Neural Networks (book) |
11/22 |
Chapter 3: Efficient Processing of Deep Neural Networks (book) |
11/14 |
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity (preprint) |
11/07 |
Higher-dimensional processing using a photonic tensor core with continuous-time data (paper) |
11/01 |
Toward Matrix Multiplication for Deep Learning Inference on the Xilinx Versal (paper) |
10/24 |
Optimizing Direct Convolutions on ARM Multi-Cores (preprint) |
08/28 |
Hot Chips 2023 watch party (program) |
07/12 |
DGEMM on Integer Matrix Multiplication Unit (preprint) |
07/05 |
A Design of a High-Performance GEMM-like |
06/28 |
High-Performance Tensor Contraction without Transposition (paper) |
06/21 |
Can Computers Learn Common Sense? (article) |
06/14 |
Dynamo: amazon’s highly available key-value store (paper) |
06/07 |
A White Paper on Neural Network Quantization (white paper) |
05/31 |
LazyTensor: combining eager execution with domain-specific compilers (preprint) |
05/24 |
Neural Galerkin Scheme with Active Learning for High-Dimensional Evolution Equations (preprint) |
05/17 |
Architecture and Performance of Devito, a System for Automated Stencil Computation (paper) |
05/10 |
Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures (paper) |
05/03 |
BLIS: A Framework for Rapidly Instantiating BLAS Functionality (paper) |
04/26 |
Anatomy of High-Performance Matrix Multiplication (preprint) |
04/19 |
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures (preprint) |
04/12 |
Tensor Contractions Tutorial (tutorial) |
03/20 |
Speculative Vectorisation with Selective Replay (paper) |
03/14 |
An Attack on The Speculative Vectorization: Leakage from Higher Dimensional Speculation (preprint) |
03/07 |
DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration (paper) |
02/24 |
MLPerf Mobile Inference Benchmark (preprint) |
02/03 |
Massively parallel universal linear transformations using a wavelength-multiplexed diffractive optical network (paper) |
01/27 |
Efficient Quantized Sparse Matrix Operations on Tensor Cores (paper) |