Matvec benchmark
Benchmarking fp32 matrix-vector kernels on a 4096×4096 matrix, repeated 20× per timed run to amortize dispatch and readback overhead.
One untimed warmup run, followed by one measured run.
No samples yet.
Run an individual strategy or start the full queue.