TeAAL and HiFiber: Precise and Concise Descriptions of (Sparse) Tensor Algebra Accelerators

Overview

This tutorial (hosted in conjuction with MICRO 2024) will show how to distill the variety we see in efficient implementations of tensor algebra kernels (in both hardware and software) into a small set of common abstractions. The tutorial will consist of a series of talks by the organizers with references to specific code examples that participants can explore afterwards. The key learning objective will be to teach participants a new language for precisely/concisely describing accelerators in mediums such as research papers.

Motivation

Tensor algebra workloads have exploded in popularity over the past few years, with applications ranging from deep learning to graph algorithms to physical simulations. This surge has been accompanied by a corresponding rise in proposals for custom hardware to service common kernels, e.g., matrix multiply or convolution. However, performing tensor algebra kernels efficiently can be difficult, so implementations of these kernels often look quite different. Because there is no separation of concerns between the different features that comprise the design, the details of the algorithm, dataflow, tensor formats, and so on are all entangled, making each accelerator seem like a one-off exotic technique. Without a separation of concerns, it is difficult to perform apples-to-apples comparisons between existing designs or evaluate the impact of proposed design changes.

Key Learning Objectives

Participants will learn:

As a part of the tutorial, we provide an accelerator zoo—a list of recent accelerator proposals, their TeAAL specifications, and calls to a compiler to automatically generate the corresponding HiFiber code (from the TeAAL specification).

Agenda

The slide decks for all talks can be found here.

Organizers

Nandeeka Nayak is a Computer Science PhD student at University of California, Berkeley, advised by Chris Fletcher. She works on understanding efficient implementations of domain-specific kernels with a focus on building abstractions that unify a wide variety of kernels and accelerator designs into a small set of primitives.

Toluwanimi O. Odemuyiwa (Toluwa) is an Electrical and Computer Engineering PhD Candidate at UC Davis, advised by John Owens. Her work focuses on exploring tensor algebra-based abstractions for graph algorithms (and other domains) in order to succinctly describe and explore the algorithmic and implementation space.

Yingchen Wang is a Postdoc at University of California, Berkeley, mentored by Chris Fletcher. During her PhD at UT Austin, she worked on microarchitecture side-channel attacks. Now she is transitioning into domain-specific accelerators and exploring efficient mappings of different kernels onto domain-specific accelerators.

Joel S. Emer received B.S. (Hons.) and M.S. degrees in electrical engineering from Purdue University in 1974 and 1975, respectively, and the Ph.D. degree in electrical engineering from the University of Illinois at Urbana–Champaign, in 1979. He is a Professor of the Practice in Electrical Engineering and Computer Science Department at MIT and a Senior Distinguished Research Scientist at NVIDIA.

Michael Pellauer is a Principal Research Scientist at Nvidia’s Architecture Research Group (ARG). His research focuses on domain-specific hardware accelerators, and how their learnings can be integrated into a programmable substrate like a GPU. His current focus is on sparse tensor algebra acceleration for deep learning. He has a PhD from MIT in Computer Science, a Master of Science from Chalmers University of Technology, and a double Bachelors from Brown University in Computer Science and English. He previously worked at Intel Corporation’s Versatile Systems and Simulation Advanced Development (VSSAD) group as a senior architect.

Christopher W. Fletcher (Chris) is an Associate Professor in Computer Science at the University of California, Berkeley. He has broad interests ranging from Computer Architecture to Security to High-Performance Computing (ranging from theory to practice).

Resources

Tutorial Artifacts

Background Reading

MIT Accessibility Statement