Examples
Task-first index of how-tos, ordered by the adoption funnel: try it in a terminal, then over HTTP, then wire it into your app. Each page is copy-paste-runnable. No tutorial content lives here — this page is just the router.
Run the demo app first
Every HTTP example below talks to the demo app on http://localhost:8080:
bash
./gradlew :app:bootRunThe port comes up immediately but the model loads asynchronously — wait for state: READY in the logs (or poll /actuator/health) before sending chat requests. Until then the endpoint returns 503 { "error": "model loading" }. The CLI examples need no server.
Try it (zero install)
- CLI —
npx @deemwario/mochallama chat -m qwen2.5-1.5b. Proves the local engine from a terminal with no Java install (ships a jlink JDK-22 image). 0.1.6 adds real multi-turn chat with persistent sessions:--resume <id>,mochallama sessions, in-REPL/reset/help/exit,--no-save.
Drive it over HTTP
- curl —
POST /v1/chat/completions(+ SSEstream:true) andGET /v1/modelsstraight from the shell. Proves the OpenAI-compatible wire API. - OpenAI SDK (Python) — the official
openaiclient withbase_urlpointed at the local endpoint. Proves drop-in compatibility — point existing OpenAI code at localhost with one line.
Embed it in your app
- Spring Boot — add the
mochallama-spring-boot-starter; autoconfig exposes/v1/chat/completions(+SSE),/v1/models, and Actuator health/metrics. The same page covers the Spring AI adapter (mochallama-spring-ai): add it alongside the starter and inject aChatClient/ChatModel. Proves in-process, tool-calling inference inside your own JVM process — no sidecar.
Tool calling & streaming
- Tools & Streaming — the full tool-calling round-trip plus SSE consumption. Proves that tools,
tool_choice, and streaming work together (mochallama is tool-calling-only; non-tool models are rejected at load withMODEL_NOT_TOOL_CAPABLE).
Reference
- Switching models — presets (
qwen2.5-1.5bdefault,qwen2.5-3b,qwen3-4b,phi-4-mini) and any tool-capable Hugging Face GGUF id: see Model Profiles. - Metrics & health — inference meters and the readiness indicator: see Metrics.
Coordinates
Maven Central io.github.deemwario:*:0.1.6 · npm @deemwario/mochallama · requires JDK 22+ (FFM GA).