TeAAL and HiFiber: Precise and Concise Descriptions of (Sparse) Tensor Algebra Accelerators

Overview

This tutorial (hosted in conjuction with MICRO 2025) will show how to distill the variety we see in efficient implementations of tensor algebra kernels (in both hardware and software) into a small set of common abstractions. The tutorial will consist of a series of talks by the organizers with references to specific code examples that participants can explore afterwards. The key learning objective will be to teach participants a new language for precisely/concisely describing accelerators in mediums such as research papers.

Motivation

Tensor algebra workloads have exploded in popularity over the past few years, with applications ranging from deep learning to graph algorithms to physical simulations. This surge has been accompanied by a corresponding rise in proposals for custom hardware to service common kernels, e.g., matrix multiply or convolution. However, performing tensor algebra kernels efficiently can be difficult, so implementations of these kernels often look quite different. Because there is no separation of concerns between the different features that comprise the design, the details of the algorithm, dataflow, tensor formats, and so on are all entangled, making each accelerator seem like a one-off exotic technique. Without a separation of concerns, it is difficult to perform apples-to-apples comparisons between existing designs or evaluate the impact of proposed design changes.

Key Learning Objectives

Participants will learn:

As a part of the tutorial, we provide an accelerator zoo—a list of recent accelerator proposals, their TeAAL specifications, and calls to a compiler to automatically generate the corresponding HiFiber code (from the TeAAL specification).

Agenda

The slide decks for all talks can be found here.

Organizers

Nandeeka Nayak is a Computer Science PhD student at University of California, Berkeley, advised by Chris Fletcher. She works on understanding efficient implementations of domain-specific kernels with a focus on building abstractions that unify a wide variety of kernels and accelerator designs into a small set of primitives.

Toluwanimi O. Odemuyiwa (Toluwa) is an Electrical and Computer Engineering PhD Candidate at UC Davis, advised by John Owens. Her work focuses on exploring tensor algebra-based abstractions for graph algorithms (and other domains) in order to succinctly describe and explore the algorithmic and implementation space.

Yan Zhu is an EECS Ph.D. student at the University of California, Berkeley, advised by Chris Fletcher. Her research interests lie in domain-specific acceleration, with a particular focus on optimizing sparse computing applications. Her current work centers on accelerating applications with inherent sparsity, such as RTL simulation, by generalizing existing sparse tensor algebra analysis and optimization techniques. Before joining UC Berkeley, she received a B.A.S. in Engineering Science from the University of Toronto.

Michael Pellauer is a Principal Research Scientist at Nvidia’s Architecture Research Group (ARG). His research focuses on domain-specific hardware accelerators, and how their learnings can be integrated into a programmable substrate like a GPU. His current focus is on sparse tensor algebra acceleration for deep learning. He has a PhD from MIT in Computer Science, a Master of Science from Chalmers University of Technology, and a double Bachelors from Brown University in Computer Science and English. He previously worked at Intel Corporation’s Versatile Systems and Simulation Advanced Development (VSSAD) group as a senior architect.

Christopher W. Fletcher (Chris) is an Associate Professor in Computer Science at the University of California, Berkeley. He has broad interests ranging from Computer Architecture to Security to High-Performance Computing (ranging from theory to practice).

Joel S. Emer received B.S. (Hons.) and M.S. degrees in electrical engineering from Purdue University in 1974 and 1975, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, in 1979. He is a Professor of the Practice in Electrical Engineering and Computer Science Department at MIT and a Senior Distinguished Research Scientist at NVIDIA.

Contributors

We would also like to extend a special thank you to following people, who have contributed to this tutorial by adding accelerators to the accelerator zoo:

Resources

Tutorial Artifacts

Background Reading

MIT Accessibility Statement