Po-An Tsai's personal website

Short bio:

I am a Principal Research Scientist at NVIDIA Research. My research focuses on computer system and architecture for emerging applications. I received my S.M. and Ph.D. degrees in CS from MIT, advised by Prof. Daniel Sanchez at MIT CSAIL. My current research projects focus on tensor accelerators design and modeling, memory hierarchy optimizations for domain-specific accelerators, and hardware acceleration for autonomous machines. During my Ph.D., I also worked on memory hierarchy design, resource management in multi-core systems, and software/hardware co-optimization for object-based programming model.

Last update: March 2026

CV and Resume

Here are my latest CV and resume

Publications

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

Zhongkai Yu, Yue Guan, Zihao Yu, Chenyang Zhou, Zhengding Hu, Shuyi Pei, Yangwook Kang, Yufei Ding, Po-An Tsai
The 53rd Annual International Symposium on Computer Architecture (ISCA-53), June 2026.
[arxiv]

RocketKV: Accelerating Long-context LLM Inference via Two-stage KV Cache Compression

Payman Behnam, Yaosheng Fu, Ritchie Zhao, Po-An Tsai, Zhiding Yu, Alexey Tumanov
Proceedings of the 42nd International Conference on Machine Learning (ICML 2025), July 2025.
[arxiv]

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Geonhwa Jeong, Po-An Tsai, Abhimanyu R. Bambhaniya, Stephen W. Keckler, Tushar Krishna
The 8th Annual Conference on Machine Learning and Systems (MLSys 2025), May 2025.
[arxiv]

Sparsepipe: Sparse Inter-operator Dataﬂow Architecture with Cross-Iteration Reuse

Yunan Zhang, Po-An Tsai, Hung-Wei Tseng
The 57th IEEE/ACM International Symposium on Microarchitecture (MICRO-57), November 2024.
[paper]

Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms

Qijing (Jenny) Huang, Po-An Tsai, Joel S Emer, Angshuman Parashar
The 51st Annual International Symposium on Computer Architecture (ISCA-51), June 2024.
Best Paper Award Nominee.
[paper]

Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing

Michael Pellauer, Jason Clemons, Vignesh Balaji, Neal Crago, Aamer Jaleel, Donghyuk Lee, Mike O’Connor, Angshuman Parashar, Sean Treichler, Po-An Tsai, Stephen W. Keckler, Joel S. Emer
ACM Transactions on Computer Systems, Vol. 41, December 2023.
[paper]

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

Guyue Huang, Zhengyang Wang, Po-An Tsai, Chen Zhang, Yufei Ding, Yuan Xie
The 56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), October 2023.
[paper]

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel Emer
The 56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), October 2023.
[paper][arxiv] [MIT news]

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling

Toluwanimi O Odemuyiwa, Hadi Asghari-Moghaddam, Michael Pellauer, Kartik Hegde, Po-An Tsai, Neal C Crago, Aamer Jaleel, John D Owens, Edgar Solomonik, Joel S Emer, Christopher W Fletcher
The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-28), March 2023.
[paper]

Demystifying Map Space Exploration for NPUs

Sheng-Chun Kao, Angshuman Parashar, Po-An Tsai, Tushar Krishna
2022 IEEE International Symposium on Workload Characterization (IISWC 2022), November 2022.
[paper]

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S Emer
The 55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-55), October 2022.
[paper][arxiv]

SIMD^2: A Generalized Matrix Instruction Set for Accelerating Tensor Computation beyond GEMM

Yunan Zhang, Po-An Tsai, Hung-Wei Tseng
The 49th Annual International Symposium on Computer Architecture (ISCA'22), June 2022.
[paper]

Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization

Mark Horeni, Pooria Taheri, Po-An Tsai, Angshuman Parashar, Joel Emer, Siddharth Joshi
2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2022), May 2022.
[paper]

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, and Tushar Krishna.
The 30th International Conference on Parallel Architectures and Compilation Techniques (PACT-30), September 2021.
[paper]

Leaking Secrets through Compressed Caches

Po-An Tsai, Andres Sanchez, Christopher W. Fletcher and Daniel Sanchez.
IEEE Micro's Top Picks from the Computer Architecture Conferences, May/June 2021.
[paper]

Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search

Kartik Hegde, Po-An Tsai, Sitao Huang, Vikas Chandram, Angshuman Parashar, and Christopher W. Fletcher.
The 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-26), April 2021.
[paper] [code]

Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators

Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer.
2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (poster)
[paper] [tutorial]

Hardware Abstractions for Targeting EDDO Architectures with the Polyhedral Model

Angshuman Parashar, Prasanth Chatarasi, and Po-An Tsai.
International Workshop on Polyhedral Compilation Techniques (IMPACT), January 2021.
[paper]

Safecracker: Leaking Secrets through Compressed Caches

Po-An Tsai, Andres Sanchez, Christopher W. Fletcher and Daniel Sanchez.
The 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-25), March 2020.
[paper] [talk] [video] [code]

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy

Po-An Tsai and Daniel Sanchez.
The 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-24), April 2019.
[paper] [talk] [poster] [lightning] [MIT news]

Rethinking the Memory Hierarchy for Modern Languages

Po-An Tsai, Yee Ling Gan, and Daniel Sanchez.
The 51st IEEE/ACM International Symposium on Microarchitecture (MICRO-51), October 2018.
[paper] [talk] [poster] [lightning]

Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies

Po-An Tsai, Changping Chen, and Daniel Sanchez.
The 51st IEEE/ACM International Symposium on Microarchitecture (MICRO-51), October 2018.
[paper] [talk] [poster] [lightning]

KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores

Nosayba El-Sayed, Anurag Mukkara, Po-An Tsai, Harshad Kasture, Xiaosong Ma, and Daniel Sanchez.
The 24th Intl' Symposium on High Performance Computer Architecture (HPCA-24), February 2018.
[paper] [talk] [code]

Nexus: A New Approach to Replication in Distributed Shared Caches

Po-An Tsai, Nathan Beckmann, and Daniel Sanchez.
The 26th Intl' Conference on Parallel Architectures and Compilation Techniques (PACT-26), September 2017.
[paper] [talk]

Jenga: Software-Defined Cache Hierarchies

Po-An Tsai, Nathan Beckmann, and Daniel Sanchez.
The 44th International Symposium on Computer Architecture (ISCA-44), June 2017.
[paper] [talk] [tech-report] [MIT news]

Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling

Nathan Beckmann, Po-An Tsai, and Daniel Sanchez.
The 21st Intl' Symposium on High Performance Computer Architecture (HPCA-21), February 2015.
*Nominated for best paper award
[paper] [talk] [MIT news]

Hybrid Path-Diversity-Aware Adaptive Routing with Latency Prediction Model in Network-on-Chip Systems

Po-An Tsai, Yu-Hsin Kuo, En-Jui Chang, and An-Yeu Wu.
2013 International Symposium on VLSI Design, Automation and Test, (VLSI-DAT), March 2013.
[paper]

Path-Diversity-Aware Adaptive Routing in Network-on-Chip Systems

Yu-Hsin Kuo, Po-An Tsai, Hao-Ping Ho, En-Jui Chang, Hsien-Kai Hsin, and An-Yeu Wu.
The 6th International Symposium on Embedded Multicore SoCs (MCSoC), September 2012.
[paper]

Contact Me

Email: poant@nvidia.com

Press

MIT news about Zippads

Hacker News discussion on Zippads

The Morning Paper summary on Zippads

MIT news about Jenga

Hacker News discussion on Jenga

MIT news about CDCS

Techenablement article about CDCS

The Industry-Academia Partnership (IAP) post about my best poster award