"response truncated due to output length limit"
Hermes cut the response off at its output limit. The most common cause is that model.max_tokens in config.yaml is silently ignored (bug #4404); other causes are a low provider default (Ollama defaults to 2048) or a full context window. Set HERMES_MAX_TOKENS in ~/.hermes/.env, raise num_ctx and num_predict for Ollama, or run /compress when context is near full.
Likely cause
Hermes cut the response off at its output limit. Most often model.max_tokens in config.yaml is silently ignored (bug #4404), a provider default is too low (Ollama defaults to 2048), or the context window is full.
The fix
- 1 Set HERMES_MAX_TOKENS in ~/.hermes/.env - the config.yaml path is buggy and ignored.
- 2 For Ollama, raise num_ctx and num_predict via a Modelfile.
- 3 If context is near full, run /compress or start a new session.
export HERMES_MAX_TOKENS=8192Hit a different error?
Paste any agent error and get the cause and fix in seconds.
Frequently asked questions
I set max_tokens in config.yaml but nothing changed. Why?
That's confirmed bug #4404: the config.yaml value never reaches the API request. Use the environment variable HERMES_MAX_TOKENS in ~/.hermes/.env instead, which does take effect.
Why does Ollama truncate so early?
Ollama's default context window is only 2048 tokens. Create a Modelfile that sets PARAMETER num_ctx and PARAMETER num_predict higher so there's room for a full response.
My context isn't full but /compress never fires. Is that a bug?
If your model's context_length equals 64000, auto-compression never triggers due to a math bug (#14690). Set context_length above 64000 (e.g. 128000), or run /compress manually.
Stop firefighting agent errors
Decoding errors one at a time is the manual version of what BetterClaw automates. Run your OpenClaw agents hosted with managed models, retries and config validation built in.
$19/month per agent · BYOK · 7-day money-back guarantee
