Results

No results yet. The benchmark sweep hasn't landed in docs/public/api/summary.json. Run the sweep on the target host and push to main — the CI will redeploy this page.

Reading the columns

gen tok/s — generation throughput from llama-bench (-n 128 -r 2, mean).
prompt tok/s — prompt-eval throughput from llama-bench (-p 256 -r 2, mean).
p50 / p95 ms — end-to-end wall-clock per BFCL turn, including network round-trip.
Tool overall — strict pass: valid JSON ∧ correct function ∧ all expected arguments present and equal.
Format pass — strict pass on JSON / tool-call shape only (regardless of correctness).

Raw JSON

Programmatic access at /api/summary.json and /api/cells/{cell_id}.json. See the API page for the schema.

Results ​

Reading the columns ​

Raw JSON ​

Results

Reading the columns

Raw JSON