Skip to content

HTTP API

The same per-cell data the results table and article cite is available as plain JSON over HTTP, served by GitHub Pages alongside this site.

Endpoints

PathDescription
/api/summary.jsonAggregated table of all cells
/api/results.jsonAlias for summary.json
/api/cells/{cell_id}.jsonPer-cell raw result with full BFCL trace

Standard cells: qwen3.5-4b_std, qwen3.5-4b_tbq3, gemma-4-e4b_std, gemma-4-e4b_tbq3, phi-4-mini_std, phi-4-mini_std_workaround, phi-4-mini_tbq3.

CORS: Access-Control-Allow-Origin: *.

Example

bash
curl -s https://deemwar-products.github.io/llama-cpu-benchmarks/api/summary.json \
  | jq '.cells[] | {id: .cell_id, tps: .gen_eval_tps, tool: .overall_pass}'

Schema

Each cell document:

ts
type Cell = {
  cell_id: string
  model_id: string
  weight_quant: 'Q4_K_M'
  kv_quant: 'fp16' | 'tbq3_0' | 'tbq4_0'
  llamacpp_variant: string  // image tag or PR/fork SHA
  throughput: { prompt_eval_tps: number; gen_eval_tps: number }
  memory:    { peak_rss_str: string }
  latency_ms: { p50: number; p95: number; mean: number }
  tool_calling: {
    format_pass_rate: number
    function_accuracy: number
    argument_accuracy: number
    overall_pass: number
    n_cases: number
  }
  by_category: Record<'simple' | 'parallel' | 'multiple_function', { n: number; overall_pass: number }>
  started_at: string  // ISO-8601 UTC
  duration_sec: number
}

Optional local mirror

If you check out the repo and run the harness yourself, the per-cell JSONs land in results/ and you can serve them with the stdlib endpoint/ service (docker build -t llamabench-endpoint endpoint/ && docker run …). Useful for live re-runs that should update without redeploying the static site. Source: endpoint/.

Benchmarks run on a single shared CPU host · Xeon E-2176G · CPU-only