Skip to content

Results

No results yet. The benchmark sweep hasn't landed in docs/public/api/summary.json. Run the sweep on the target host and push to main — the CI will redeploy this page.

Reading the columns

  • gen tok/s — generation throughput from llama-bench (-n 128 -r 2, mean).
  • prompt tok/s — prompt-eval throughput from llama-bench (-p 256 -r 2, mean).
  • p50 / p95 ms — end-to-end wall-clock per BFCL turn, including network round-trip.
  • Tool overall — strict pass: valid JSON ∧ correct function ∧ all expected arguments present and equal.
  • Format pass — strict pass on JSON / tool-call shape only (regardless of correctness).

Raw JSON

Programmatic access at /api/summary.json and /api/cells/{cell_id}.json. See the API page for the schema.

Benchmarks run on a single shared CPU host · Xeon E-2176G · CPU-only