matmul benchmark

Benchmarking fp32 matmul kernels on 4096x4096 matrices.