Lancern's Treasure Chest
10:38 · Jul 4, 2024 · Thu
Beating NumPy matrix multiplication in 150 lines of C
https://salykova.github.io/matmul-cpu
salykova blog
Beating OpenBLAS and MKL in 150 lines of C Code: A Tutorial on High-Performance Matrix Multiplication
In this step by step tutorial we’ll implement high-performance multi-threaded matrix multiplication on CPU from scratch and learn how to optimize and parallelize code in C. On Ryzen 7700 our implementation is faster than NumPy with OpenBLAS and MKL backends…
Home
Powered by
BroadcastChannel
&
Sepia