Running locally with jax-js + WebGPU
WebGPU uses fp16 weights. Wasm casts weights to fp32 on load.
KV cache is allocated dynamically for the current chat.
The first message downloads and caches a 536 MB fp16 checkpoint. Everything runs locally in your browser.