Lancern's Treasure Chest
10:38 · Jul 4, 2024 · Thu
Beating NumPy matrix multiplication in 150 lines of C
https://salykova.github.io/matmul-cpu
salykova blog
Beating NumPy’s Matrix Multiplication in 150 lines of C code
In this step by step tutorial we’ll optimize matrix multiplication on CPU in C achieving over 1 TFLOPS on an 8-core Ryzen 7 7700. The final optimized implementation is just 150 LOC and outperforms both OpenBLAS and MKL on Ryzen 7 7700. High-performance GEMM…
Home
Powered by
BroadcastChannel
&
Sepia