conv2d benchmark
Benchmarking fp32 conv2d kernels on 1x64x256x256 input with 128 filters of size 3x3.
- "naive" is a simple nested-loop WebGPU kernel
- "onnx" runs a
Convoperator in onnxruntime-web - "tfjs" runs
tf.conv2d()with NHWC format - "jax-js" runs
jax.lax.convGeneralDilated()