TeAAL and HiFiber: Precise and Concise Descriptions of (Sparse) Tensor Algebra Accelerators

Overview

This tutorial (hosted in conjuction with MICRO 2024) will show how to distill the variety we see in efficient implementations of tensor algebra kernels (in both hardware and software) into a small set of common abstractions. The tutorial will consist of a series of talks by the organizers with references to specific code examples that participants can explore afterwards. The key learning objective will be to teach participants a new language for precisely/concisely describing accelerators in mediums such as research papers.

Motivation

Tensor algebra workloads have exploded in popularity over the past few years, with applications ranging from deep learning to graph algorithms to physical simulations. This surge has been accompanied by a corresponding rise in proposals for custom hardware to service common kernels, e.g., matrix multiply or convolution. However, performing tensor algebra kernels efficiently can be difficult, so implementations of these kernels often look quite different. Because there is no separation of concerns between the different features that comprise the design, the details of the algorithm, dataflow, tensor formats, and so on are all entangled, making each accelerator seem like a one-off exotic technique. Without a separation of concerns, it is difficult to perform apples-to-apples comparisons between existing designs or evaluate the impact of proposed design changes.

Key Learning Objectives

Participants will learn:

A set of fundamental abstractions (cascades of Einsums, fibertrees, etc.) that can be used to describe domain-specific kernels
A declarative specification language (TeAAL) for writing an accelerator design in terms of the above abstractions
A format-agnostic loop nest representation (HiFiber) for visualizing and executing kernel implementations
A set of analysis techniques that can be applied to kernels expressed in terms of the above abstractions

As a part of the tutorial, we provide an accelerator zoo—a list of recent accelerator proposals, their TeAAL specifications, and calls to a compiler to automatically generate the corresponding HiFiber code (from the TeAAL specification).

Agenda

8-8:20am: Introduction (presented by Chris Fletcher)
8:20-8:45am: First Look at TeAAL: From Algorithm to Hardware (presented by Nandeeka Nayak)
8:45-9:10am: Mapping for Dense Tensor Algebra (presented by Toluwa Odemuyiwa)
9:10-10am: Introduction and Deep Dive into Fibertrees (presented by Michael Pellauer)
10-10:30am: Break
10:30-11am: Optimization on Sparse Fibertrees (presented by Yingchen Wang)
11-11:30am: Performance Modeling with HiFiber (presented by Nandeeka Nayak)
11:30-11:45am: Cascades of Einsums: Expanding the Algorithm Specification (presented by Toluwa Odemuyiwa)
11:45-12pm: Conclusion (presented by Chris Fletcher)

The slide decks for all talks can be found here.

Organizers

Nandeeka Nayak is a Computer Science PhD student at University of California, Berkeley, advised by Chris Fletcher. She works on understanding efficient implementations of domain-specific kernels with a focus on building abstractions that unify a wide variety of kernels and accelerator designs into a small set of primitives.

Toluwanimi O. Odemuyiwa (Toluwa) is an Electrical and Computer Engineering PhD Candidate at UC Davis, advised by John Owens. Her work focuses on exploring tensor algebra-based abstractions for graph algorithms (and other domains) in order to succinctly describe and explore the algorithmic and implementation space.

Yingchen Wang is a Postdoc at University of California, Berkeley, mentored by Chris Fletcher. During her PhD at UT Austin, she worked on microarchitecture side-channel attacks. Now she is transitioning into domain-specific accelerators and exploring efficient mappings of different kernels onto domain-specific accelerators.

Joel S. Emer received B.S. (Hons.) and M.S. degrees in electrical engineering from Purdue University in 1974 and 1975, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, in 1979. He is a Professor of the Practice in Electrical Engineering and Computer Science Department at MIT and a Senior Distinguished Research Scientist at NVIDIA.

Michael Pellauer is a Principal Research Scientist at Nvidia’s Architecture Research Group (ARG). His research focuses on domain-specific hardware accelerators, and how their learnings can be integrated into a programmable substrate like a GPU. His current focus is on sparse tensor algebra acceleration for deep learning. He has a PhD from MIT in Computer Science, a Master of Science from Chalmers University of Technology, and a double Bachelors from Brown University in Computer Science and English. He previously worked at Intel Corporation’s Versatile Systems and Simulation Advanced Development (VSSAD) group as a senior architect.

Christopher W. Fletcher (Chris) is an Associate Professor in Computer Science at the University of California, Berkeley. He has broad interests ranging from Computer Architecture to Security to High-Performance Computing (ranging from theory to practice).

Resources

Tutorial Artifacts

Accelerator Zoo - A set of recent accelerator proposals, their TeAAL specifications, and calls to a compiler automatically generate the corresponding HiFiber code (from the TeAAL specification).
Tutorial Slides
All slide decks used while presenting the tutorial

Background Reading

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators - Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, and Joel S. Emer (MICRO 2023)
The EDGE Language: Extended General Einsums for Graph Algorithms - Toluwanimi O. Odemuyiwa, Joel S. Emer, John D. Owens (ArXiv 2024)
Efficient Processing of Deep Neural Networks - Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel S. Emer (Springer 2020)
Format Abstraction for Sparse Tensor Algebra Compilers - Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe (OOPSLA 2018)
The Tensor Algebra Compiler - Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, Saman Amarasinghe (OOPSLA 2017)