[{"data":1,"prerenderedAt":2198},["ShallowReactive",2],{"blog-post-hermes-agent-not-working":3,"related-posts-hermes-agent-not-working":810},{"id":4,"title":5,"author":6,"body":10,"category":787,"date":788,"description":789,"extension":790,"featured":791,"image":792,"imageHeight":793,"imageWidth":793,"meta":794,"navigation":795,"path":796,"readingTime":797,"seo":798,"seoTitle":799,"stem":800,"tags":801,"updatedDate":788,"__hash__":809},"blog/blog/hermes-agent-not-working.md","Hermes Agent Not Working? Every Common Error and How to Fix It",{"name":7,"role":8,"avatar":9},"Shabnam Katoch","Growth Head","/img/avatars/shabnam-profile.jpeg",{"type":11,"value":12,"toc":751},"minimark",[13,22,29,32,35,38,43,46,71,74,81,84,91,95,100,106,117,130,133,145,148,202,209,213,220,225,228,250,253,257,260,267,280,287,291,294,297,301,305,308,311,325,331,337,344,357,360,364,367,376,382,386,389,395,399,406,417,423,427,430,435,438,442,445,448,451,455,459,462,468,471,493,496,499,503,506,517,520,523,527,531,534,545,551,559,571,578,582,585,591,595,598,605,630,633,637,640,643,646,650,653,656,662,670,678,681,684,688,692,705,709,716,720,733,737,740,744,747],[14,15,16,17,21],"p",{},"I spent a Saturday afternoon setting up Hermes Agent. Install script ran clean. No errors. I typed ",[18,19,20],"code",{},"hermes"," into my terminal and got nothing. Literally nothing. \"Command not found.\"",[14,23,24,25,28],{},"Three hours later, after bouncing between GitHub issues, Discord messages, and half a dozen blog posts that each covered one specific error, I had it running. The fix for my problem was one line: ",[18,26,27],{},"source ~/.bashrc",".",[14,30,31],{},"One line. Three hours.",[14,33,34],{},"That's the thing about Hermes Agent not working. The errors themselves are usually simple. The pain is figuring out which error you're dealing with because the symptoms overlap and the documentation is scattered across multiple sources.",[14,36,37],{},"This is the consolidated guide I wish existed when I started. Every common Hermes error, organized by the layer where it actually breaks. Install first, then provider, then tools, then gateways. Debug in that order and you'll save hours.",[39,40,42],"h2",{"id":41},"start-here-hermes-doctor-fix","Start here: hermes doctor --fix",[14,44,45],{},"Before digging into individual errors, run this:",[47,48,53],"pre",{"className":49,"code":50,"language":51,"meta":52,"style":52},"language-bash shiki shiki-themes github-light","hermes doctor --fix\n","bash","",[18,54,55],{"__ignoreMap":52},[56,57,60,63,67],"span",{"class":58,"line":59},"line",1,[56,61,20],{"class":62},"s7eDp",[56,64,66],{"class":65},"sYBdl"," doctor",[56,68,70],{"class":69},"sYu0t"," --fix\n",[14,72,73],{},"According to Hermes's own documentation, this catches about 80% of common issues. It checks dependencies, validates your PATH, verifies API keys, confirms model availability, and auto-fixes what it can.",[14,75,76,77,80],{},"If ",[18,78,79],{},"hermes doctor --fix"," solves your problem, you're done. Close this tab.",[14,82,83],{},"If it doesn't, or if you can't even run that command, keep reading.",[14,85,86],{},[87,88],"img",{"alt":89,"src":90},"The Five Debug Layers for Hermes Agent, stacked top to bottom to debug in order without skipping: Layer 1 install plus PATH (command not found, start here), Layer 2 API keys plus provider (auth errors, HTTP 400s, empty responses), Layer 3 tool calling (tools registered but nothing happens), Layer 4 context plus memory (agent works then breaks, memory lost), and Layer 5 gateways (Telegram/Discord crashes, token bloat). Most failures look mysterious because two layers broke at once","/img/blog/hermes-agent-five-debug-layers.jpg",[39,92,94],{"id":93},"layer-1-install-and-path-errors-the-it-wont-even-start-problems","Layer 1: Install and PATH errors (the \"it won't even start\" problems)",[96,97,99],"h3",{"id":98},"command-not-found-after-install","\"command not found\" after install",[14,101,102,103,105],{},"The install script finishes without errors. You type ",[18,104,20],{},". Nothing.",[14,107,108,112,113,116],{},[109,110,111],"strong",{},"The fix:"," Your shell configuration wasn't reloaded after installation. The Hermes binary lives in ",[18,114,115],{},"~/.hermes/bin/"," but your current terminal session doesn't know about it yet.",[47,118,120],{"className":49,"code":119,"language":51,"meta":52,"style":52},"source ~/.bashrc\n",[18,121,122],{"__ignoreMap":52},[56,123,124,127],{"class":58,"line":59},[56,125,126],{"class":69},"source",[56,128,129],{"class":65}," ~/.bashrc\n",[14,131,132],{},"Or for Zsh users:",[47,134,136],{"className":49,"code":135,"language":51,"meta":52,"style":52},"source ~/.zshrc\n",[18,137,138],{"__ignoreMap":52},[56,139,140,142],{"class":58,"line":59},[56,141,126],{"class":69},[56,143,144],{"class":65}," ~/.zshrc\n",[14,146,147],{},"If that doesn't work, add the path manually:",[47,149,151],{"className":49,"code":150,"language":51,"meta":52,"style":52},"export PATH=\"$HOME/.hermes/bin:$PATH\"\necho 'export PATH=\"$HOME/.hermes/bin:$PATH\"' >> ~/.bashrc\nsource ~/.bashrc\n",[18,152,153,181,195],{"__ignoreMap":52},[56,154,155,159,163,166,169,172,175,178],{"class":58,"line":59},[56,156,158],{"class":157},"sD7c4","export",[56,160,162],{"class":161},"sgsFI"," PATH",[56,164,165],{"class":157},"=",[56,167,168],{"class":65},"\"",[56,170,171],{"class":161},"$HOME",[56,173,174],{"class":65},"/.hermes/bin:",[56,176,177],{"class":161},"$PATH",[56,179,180],{"class":65},"\"\n",[56,182,184,187,190,193],{"class":58,"line":183},2,[56,185,186],{"class":69},"echo",[56,188,189],{"class":65}," 'export PATH=\"$HOME/.hermes/bin:$PATH\"'",[56,191,192],{"class":157}," >>",[56,194,129],{"class":65},[56,196,198,200],{"class":58,"line":197},3,[56,199,126],{"class":69},[56,201,129],{"class":65},[14,203,204,205,208],{},"Then verify with ",[18,206,207],{},"hermes --version",". This is the single most common Hermes Agent error. It's not broken. Your shell just doesn't know where to find it.",[96,210,212],{"id":211},"installation-script-times-out-or-hangs","Installation script times out or hangs",[14,214,215,216,219],{},"The ",[18,217,218],{},"curl | bash"," install hangs or the script downloads but fails mid-install.",[14,221,222,224],{},[109,223,111],{}," Network restrictions are blocking GitHub access. This is common in corporate networks and certain regions. Two options:",[14,226,227],{},"Use a mirror proxy:",[47,229,231],{"className":49,"code":230,"language":51,"meta":52,"style":52},"git config --global url.\"https://mirror.ghproxy.com/https://github.com\".insteadOf \"https://github.com\"\n",[18,232,233],{"__ignoreMap":52},[56,234,235,238,241,244,247],{"class":58,"line":59},[56,236,237],{"class":62},"git",[56,239,240],{"class":65}," config",[56,242,243],{"class":69}," --global",[56,245,246],{"class":65}," url.\"https://mirror.ghproxy.com/https://github.com\".insteadOf",[56,248,249],{"class":65}," \"https://github.com\"\n",[14,251,252],{},"Or download the install script manually, inspect it, and run it locally.",[96,254,256],{"id":255},"python-version-conflict","Python version conflict",[14,258,259],{},"Hermes requires Python 3.11 or higher. If you're running 3.9 or 3.10, the agent won't start or will throw import errors.",[14,261,262,263,266],{},"The good news: Hermes uses ",[18,264,265],{},"uv"," for isolated environment management, so this rarely becomes a blocking issue. But if it does, check your version:",[47,268,270],{"className":49,"code":269,"language":51,"meta":52,"style":52},"python3 --version\n",[18,271,272],{"__ignoreMap":52},[56,273,274,277],{"class":58,"line":59},[56,275,276],{"class":62},"python3",[56,278,279],{"class":69}," --version\n",[14,281,282,283,286],{},"If you're below 3.11, install the correct version alongside your existing one. Don't replace your system Python. Use ",[18,284,285],{},"pyenv"," or your distro's package manager to install 3.11+ as a secondary version.",[96,288,290],{"id":289},"windows-use-wsl2","Windows? Use WSL2.",[14,292,293],{},"Native Windows is not supported. As of June 2026, official Hermes install paths support Linux, macOS, and WSL2 only. Android has a Termux path, but Windows PowerShell or CMD will not work.",[14,295,296],{},"If you're on Windows, install WSL2 first, then run the Hermes install script inside the WSL2 terminal. This adds 10-15 minutes to your setup but it's the only supported path.",[39,298,300],{"id":299},"layer-2-api-key-and-provider-errors","Layer 2: API key and provider errors",[96,302,304],{"id":303},"api-key-not-recognized","API key not recognized",[14,306,307],{},"You've set your key but Hermes returns authentication errors or empty responses.",[14,309,310],{},"Check three things:",[14,312,313,316,317,320,321,324],{},[109,314,315],{},"Wrong key format."," OpenAI keys start with ",[18,318,319],{},"sk-",". Anthropic keys start with ",[18,322,323],{},"sk-ant-",". If you copied the key wrong or have trailing whitespace, it won't work.",[14,326,327,330],{},[109,328,329],{},"Expired or revoked key."," Log into your provider dashboard and verify the key is still active.",[14,332,333,336],{},[109,334,335],{},"Exhausted quota."," A valid key with no remaining credits produces the same error as an invalid key. Check your billing page.",[14,338,339,340,343],{},"Your key lives in ",[18,341,342],{},"~/.hermes/.env",". Open it and verify:",[47,345,347],{"className":49,"code":346,"language":51,"meta":52,"style":52},"cat ~/.hermes/.env\n",[18,348,349],{"__ignoreMap":52},[56,350,351,354],{"class":58,"line":59},[56,352,353],{"class":62},"cat",[56,355,356],{"class":65}," ~/.hermes/.env\n",[14,358,359],{},"After fixing the key, restart Hermes. It doesn't hot-reload env changes.",[96,361,363],{"id":362},"http-400-errors-from-the-model-provider","HTTP 400 errors from the model provider",[14,365,366],{},"This usually means you're sending a malformed request to the API. Common causes: requesting a model that doesn't exist on your plan, sending tool schemas in a format the provider doesn't support, or exceeding the provider's context length.",[14,368,369,370,375],{},"For specific ",[371,372,374],"a",{"href":373},"/blog/hermes-agent-error-400","Hermes 400 error debugging",", we have a dedicated walkthrough that covers each provider's error codes.",[14,377,378],{},[87,379],{"alt":380,"src":381},"API Key Error, three causes and one fix flow. Starting from \"getting an API error?\" three branches: wrong format (check the key prefix, OpenAI sk-, Anthropic sk-ant-, no trailing spaces), expired or revoked (log into the provider dashboard and verify the key is active), and no credits (check the billing page, a valid key with zero credits gives the same error). All three lead to fixing the key in ~/.hermes/.env, then restarting Hermes since it doesn't hot-reload .env changes","/img/blog/hermes-agent-api-key-error-flow.jpg",[39,383,385],{"id":384},"layer-3-tool-calling-failures-the-subtle-ones","Layer 3: Tool-calling failures (the subtle ones)",[14,387,388],{},"This is where most people waste the most time. The agent starts, connects to the model, but tools don't work. No error messages. Just... nothing happens.",[14,390,391],{},[87,392],{"alt":393,"src":394},"Layer 3, the three tool-calling failures, each with a cause and a fix. Empty tool_calls every time: caused by apply_chat_template called without tools=tools so the model never sees the schemas, fixed by adding tools=tools to your template call. JSON parse errors cascading: bad JSON in turn 1 corrupts turns 2, 3 and 4 in a chain reaction, fixed by catching parse errors per turn and stripping and retrying independently. Web tools silently fail: local Ollama models fail web backend checks while cloud models pass, fixed by using a cloud model for web tools and local for offline reasoning. No error messages, just nothing happens","/img/blog/hermes-agent-three-tool-calling-failures.jpg",[96,396,398],{"id":397},"empty-tool_calls-every-time","Empty tool_calls every time",[14,400,401,402,405],{},"Your agent is supposed to call a Python function or use a tool, but the ",[18,403,404],{},"tool_calls"," array comes back empty. Every single time.",[14,407,408,409,412,413,416],{},"The most likely cause: ",[18,410,411],{},"apply_chat_template"," was called without ",[18,414,415],{},"tools=tools",". The model never sees the tool schema, so it can't emit a structured call. It's not refusing to use tools. It literally doesn't know they exist.",[14,418,419,420,422],{},"This is a code-level fix. Make sure your template application includes the tool definitions. If you're using someone else's wrapper or tutorial code, check their ",[18,421,411],{}," call specifically.",[96,424,426],{"id":425},"json-parse-errors-in-the-agentic-loop","JSON parse errors in the agentic loop",[14,428,429],{},"The model returns malformed JSON in one tool call, and the error cascades across subsequent turns. Turn 1 has a broken bracket. Turn 2 tries to parse Turn 1's output. Turn 3 is complete nonsense.",[14,431,432,434],{},[109,433,111],{}," Implement graceful error handling that catches JSON parse failures on each turn independently. Don't let one bad response corrupt the entire conversation history. Strip the malformed turn and retry.",[14,436,437],{},"Hard limit: Hermes 3 8B struggles with more than 3 parallel tool schemas. For complex multi-tool agents, use Hermes 3 70B or move to hosted inference.",[96,439,441],{"id":440},"local-models-cant-use-web-tools","Local models can't use web tools",[14,443,444],{},"Ollama is running. Your local model works fine for reasoning. But web browsing, web search, and other internet-dependent tools silently fail.",[14,446,447],{},"This is a known limitation. Web tools only get enabled when the configured web backend passes certain checks that local models often fail. Cloud models (Anthropic, OpenRouter) work fine because they satisfy those backend checks natively.",[14,449,450],{},"The workaround: keep a cloud model configured as your \"web access\" provider and use local models for offline reasoning tasks. You can switch between them per conversation.",[39,452,454],{"id":453},"layer-4-context-and-memory-problems","Layer 4: Context and memory problems",[96,456,458],{"id":457},"context-overflow-the-2048-token-trap","Context overflow (the 2,048 token trap)",[14,460,461],{},"This is the sneakiest Hermes Agent error. Hermes 3 8B defaults to 2,048 tokens in Ollama, which fills up in 2-3 tool turns. Your agent works perfectly for the first interaction, then suddenly stops making sense or returns empty responses.",[14,463,464],{},[87,465],{"alt":466,"src":467},"The 2,048 Token Trap, why your agent breaks after turn 2. The default Ollama context of 2,048 tokens fills with a 400-token system prompt, a 600-token tool output on turn 1 and a 700-token tool output on turn 2, then overflows and the agent breaks. After running ollama run hermes3 --ctx-size 8192, the fixed 8,192-token context holds the same system prompt and two tool outputs with room for 5-6 more turns. Default Ollama context kills agents in 2-3 tool turns","/img/blog/hermes-agent-2048-token-trap.jpg",[14,469,470],{},"The fix is increasing the context window in your Ollama configuration:",[47,472,474],{"className":49,"code":473,"language":51,"meta":52,"style":52},"ollama run hermes3 --ctx-size 8192\n",[18,475,476],{"__ignoreMap":52},[56,477,478,481,484,487,490],{"class":58,"line":59},[56,479,480],{"class":62},"ollama",[56,482,483],{"class":65}," run",[56,485,486],{"class":65}," hermes3",[56,488,489],{"class":69}," --ctx-size",[56,491,492],{"class":69}," 8192\n",[14,494,495],{},"Or set it in the Modelfile. But be aware: larger context means more VRAM usage. On 8GB cards, you're limited. On 16GB+, 8192 tokens is comfortable for most agent workloads.",[14,497,498],{},"If you're running complex multi-tool chains, this single setting is probably why your agent \"randomly\" stops working after a few turns.",[96,500,502],{"id":501},"memory-not-persisting-between-sessions","Memory not persisting between sessions",[14,504,505],{},"You told your Hermes agent something yesterday. Today it has no idea what you're talking about.",[14,507,508,509,512,513,516],{},"This usually means memory checkpointing isn't configured or the working directory changed between sessions. Hermes stores memory files relative to where it starts. If it launched from ",[18,510,511],{},"~/projects/"," yesterday and ",[18,514,515],{},"~/"," today, it's looking at different memory stores.",[14,518,519],{},"Fix: Always start Hermes from the same directory, or configure an absolute path for memory storage. And enforce checkpoints: finish one sub-goal, summarize, then continue to the next stage.",[14,521,522],{},"This is one of the places where the gap between self-hosted agents and managed platforms becomes real. On BetterClaw, persistent memory uses hybrid vector plus keyword search with automatic context management. No token bloat, no lost memories between sessions, no directory dependency. If you're spending more time debugging memory persistence than actually using your agent, that's a signal worth paying attention to. Free plan available, $19/month for Pro. BYOK with zero markup.",[39,524,526],{"id":525},"layer-5-gateway-and-platform-errors","Layer 5: Gateway and platform errors",[96,528,530],{"id":529},"telegram-or-discord-gateway-crashes","Telegram or Discord gateway crashes",[14,532,533],{},"Your agent works fine in CLI mode. You connect it to Telegram or Discord and it immediately crashes or behaves erratically.",[14,535,536,537,540,541,544],{},"Root cause: The gateway process was spawning inside the ",[18,538,539],{},"hermes-agent"," source directory, which loads development files like ",[18,542,543],{},"AGENTS.md"," and other data that shouldn't be in the runtime context. This was adding garbage to the agent's context and inflating token usage by 2-3x.",[14,546,547],{},[87,548],{"alt":549,"src":550},"Gateway Crashes, the working directory fix. The wrong way: the gateway is launched from the ~/hermes-agent source directory, so the source folder loads dev files like AGENTS.md into the runtime context, causing crashes, 2-3x token bloat and garbage context. The right way: the gateway is launched from ~/ the home directory, giving a stable gateway with normal token usage and clean context. cd ~ before launching, always","/img/blog/hermes-agent-gateway-working-directory-fix.jpg",[14,552,553,555,556,558],{},[109,554,111],{}," Make sure your gateway starts from your home directory (",[18,557,171],{},"), not from the Hermes source folder. This was patched in a recent update, so update first:",[47,560,562],{"className":49,"code":561,"language":51,"meta":52,"style":52},"hermes update\n",[18,563,564],{"__ignoreMap":52},[56,565,566,568],{"class":58,"line":59},[56,567,20],{"class":62},[56,569,570],{"class":65}," update\n",[14,572,573,574,577],{},"If you're still seeing bloat after updating, check your working directory before launching the gateway. ",[18,575,576],{},"cd ~"," first.",[96,579,581],{"id":580},"token-cost-explosion-on-telegram","Token cost explosion on Telegram",[14,583,584],{},"You're watching your API costs and notice Telegram conversations are 2-3x more expensive than the same conversation in CLI mode.",[14,586,587,588,590],{},"Same root cause as above. Telegram's message format, combined with the gateway loading extra context, inflates every message. Update Hermes, ensure you're launching from ",[18,589,171],{},", and monitor token usage for a few conversations to confirm the fix.",[96,592,594],{"id":593},"tirith-security-module-blocking-everything","Tirith security module blocking everything",[14,596,597],{},"Hermes has a built-in security module called Tirith. It's supposed to catch risky commands and ask for approval before executing them. But some users report it blocking commands outright with no approval prompt.",[14,599,600,601,604],{},"For example, ",[18,602,603],{},"curl | sh"," patterns get hard-blocked. No option to proceed. The community has requested an interactive approval flow, but for now the workaround is splitting your commands:",[47,606,608],{"className":49,"code":607,"language":51,"meta":52,"style":52},"curl https://example.com/script.sh -o script.sh\nbash script.sh\n",[18,609,610,624],{"__ignoreMap":52},[56,611,612,615,618,621],{"class":58,"line":59},[56,613,614],{"class":62},"curl",[56,616,617],{"class":65}," https://example.com/script.sh",[56,619,620],{"class":69}," -o",[56,622,623],{"class":65}," script.sh\n",[56,625,626,628],{"class":58,"line":183},[56,627,51],{"class":62},[56,629,623],{"class":65},[14,631,632],{},"Run the download and execution as separate steps, or use the terminal directly for commands Tirith won't approve through the gateway.",[39,634,636],{"id":635},"the-cold-start-tax-and-when-to-care-about-it","The cold start tax (and when to care about it)",[14,638,639],{},"One more thing that isn't technically an \"error\" but catches people off guard: Hermes 3 8B on Ollama takes 3-5 seconds to first token on a cold start. If you're used to cloud API response times (sub-second), this feels broken.",[14,641,642],{},"It's not broken. It's loading the model into VRAM. After the first response, subsequent queries are fast. If the cold start matters (for real-time applications, customer-facing agents, or high-frequency scheduling), either keep the model warm with periodic pings or use a cloud provider for latency-sensitive workflows.",[14,644,645],{},"For comparison, Hermes 3 8B outperforms Llama 3 8B Instruct by roughly 17 percentage points on tool-call success in local testing. The model is good. The infrastructure around it just needs tuning.",[39,647,649],{"id":648},"the-honest-question-is-the-debugging-worth-your-time","The honest question: is the debugging worth your time?",[14,651,652],{},"Every error on this page is fixable. None of them are deal-breakers. But they add up.",[14,654,655],{},"PATH configuration. Python version management. Ollama context tuning. Gateway working directories. Memory persistence debugging. Token cost monitoring. WSL2 setup on Windows.",[14,657,658],{},[87,659],{"alt":660,"src":661},"The Debugging Tax, self-hosted Hermes versus a managed platform across each layer. Setup: PATH config, Python version and WSL2 versus nothing to install. API keys: manual .env file editing versus OAuth one-click. Context: manual --ctx-size tuning versus built-in auto-management. Memory: directory-dependent and manual versus persistent and auto-indexed. Gateways: working-dir config and token-bloat debugging versus one-click across 15+ platforms. Time to a working agent: 3-8 hours (or a Saturday) versus 60 seconds. Both are valid; know which one you're signing up for","/img/blog/hermes-agent-debugging-tax-self-hosted-vs-managed.jpg",[14,663,664,665,669],{},"If you're a developer who enjoys tinkering with infrastructure and wants full control over every layer of your agent stack, Hermes is genuinely good. Zero CVEs. 95,000+ GitHub stars in under two months. Strong self-learning capabilities. The framework itself is solid. We compared the two head to head in ",[371,666,668],{"href":667},"/blog/betterclaw-vs-hermes","BetterClaw vs Hermes"," if you want the full breakdown.",[14,671,672,673,677],{},"But if you're looking at this list and thinking ",[674,675,676],"em",{},"I just want my agent to work",", that's a valid reaction. It's why managed platforms exist.",[14,679,680],{},"BetterClaw handles all five debug layers out of the box. No PATH issues because there's nothing to install. No API key files because OAuth handles it. No context overflow because smart context management is built in. No gateway configuration because 15+ chat platforms connect with one click. Free plan with 1 agent and every feature. $19/month per agent for Pro. Deploy in 60 seconds. Your call.",[14,682,683],{},"The best tool isn't the one with the most GitHub stars. It's the one that gets out of your way and lets you build the thing you actually care about.",[39,685,687],{"id":686},"frequently-asked-questions","Frequently Asked Questions",[96,689,691],{"id":690},"what-does-hermes-agent-not-working-usually-mean","What does \"Hermes Agent not working\" usually mean?",[14,693,694,695,697,698,701,702,704],{},"The most common cause is a PATH configuration issue after installation. Your shell doesn't know where the Hermes binary is located. Running ",[18,696,27],{}," (or ",[18,699,700],{},"~/.zshrc"," for Zsh) fixes this in most cases. If that doesn't work, run ",[18,703,79],{}," which catches about 80% of common issues automatically.",[96,706,708],{"id":707},"how-does-hermes-agent-compare-to-openclaw-for-reliability","How does Hermes Agent compare to OpenClaw for reliability?",[14,710,711,712,28],{},"Hermes has zero reported CVEs as of June 2026 and ships with container hardening and namespace isolation enabled by default. OpenClaw has a larger ecosystem (230K+ stars, 5,700+ skills on ClawHub) but has had multiple security advisories including CVE-2026-25253 (CVSS 8.8). For stability, Hermes is currently the more conservative choice. For integrations and skill variety, OpenClaw has a wider range. We cover the tradeoffs in detail in ",[371,713,715],{"href":714},"/blog/openclaw-vs-hermes","OpenClaw vs Hermes",[96,717,719],{"id":718},"how-do-i-fix-hermes-agent-tool-calling-failures","How do I fix Hermes Agent tool-calling failures?",[14,721,722,723,725,726,728,729,732],{},"First, verify ",[18,724,411],{}," includes ",[18,727,415],{}," so the model sees the tool schemas. Second, check that you're not exceeding 3 parallel tool schemas on Hermes 3 8B (use 70B for complex multi-tool setups). Third, implement JSON parse error handling per turn to prevent cascading failures. Run ",[18,730,731],{},"hermes tools list"," to confirm your tools are properly registered.",[96,734,736],{"id":735},"is-hermes-agent-free-to-use","Is Hermes Agent free to use?",[14,738,739],{},"Hermes itself is free and open-source (MIT license). But running it requires your own hardware or a VPS ($5-50/month), plus API costs for your LLM provider. Total cost depends on usage but typically runs $20-100/month including infrastructure. BetterClaw offers a free plan at $0/month with BYOK, or $19/agent/month for Pro with managed infrastructure and no setup required.",[96,741,743],{"id":742},"can-hermes-agent-run-on-windows","Can Hermes Agent run on Windows?",[14,745,746],{},"Not natively. As of June 2026, Hermes officially supports Linux, macOS, and WSL2 only. Windows users need to install WSL2 (Windows Subsystem for Linux) first, then run the Hermes install script inside the WSL2 terminal. This adds about 10-15 minutes to initial setup but works reliably once configured. Android users can use Termux.",[748,749,750],"style",{},"html pre.shiki code .s7eDp, html code.shiki .s7eDp{--shiki-default:#6F42C1}html pre.shiki code .sYBdl, html code.shiki .sYBdl{--shiki-default:#032F62}html pre.shiki code .sYu0t, html code.shiki .sYu0t{--shiki-default:#005CC5}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html pre.shiki code .sD7c4, html code.shiki .sD7c4{--shiki-default:#D73A49}html pre.shiki code .sgsFI, html code.shiki .sgsFI{--shiki-default:#24292E}",{"title":52,"searchDepth":183,"depth":183,"links":752},[753,754,760,764,769,773,778,779,780],{"id":41,"depth":183,"text":42},{"id":93,"depth":183,"text":94,"children":755},[756,757,758,759],{"id":98,"depth":197,"text":99},{"id":211,"depth":197,"text":212},{"id":255,"depth":197,"text":256},{"id":289,"depth":197,"text":290},{"id":299,"depth":183,"text":300,"children":761},[762,763],{"id":303,"depth":197,"text":304},{"id":362,"depth":197,"text":363},{"id":384,"depth":183,"text":385,"children":765},[766,767,768],{"id":397,"depth":197,"text":398},{"id":425,"depth":197,"text":426},{"id":440,"depth":197,"text":441},{"id":453,"depth":183,"text":454,"children":770},[771,772],{"id":457,"depth":197,"text":458},{"id":501,"depth":197,"text":502},{"id":525,"depth":183,"text":526,"children":774},[775,776,777],{"id":529,"depth":197,"text":530},{"id":580,"depth":197,"text":581},{"id":593,"depth":197,"text":594},{"id":635,"depth":183,"text":636},{"id":648,"depth":183,"text":649},{"id":686,"depth":183,"text":687,"children":781},[782,783,784,785,786],{"id":690,"depth":197,"text":691},{"id":707,"depth":197,"text":708},{"id":718,"depth":197,"text":719},{"id":735,"depth":197,"text":736},{"id":742,"depth":197,"text":743},"Troubleshooting","2026-06-08","Hermes Agent not working after install? Fix PATH errors, API failures, tool-calling issues, context overflow, and gateway crashes. Full 2026 guide.","md",false,"/img/blog/hermes-agent-not-working.jpg",null,{},true,"/blog/hermes-agent-not-working","11 min read",{"title":5,"description":789},"Hermes Agent Not Working? Fix Every Common Error","blog/hermes-agent-not-working",[802,803,804,805,806,807,808],"hermes agent not working","hermes agent error","hermes agent fix","hermes agent setup problems","hermes agent troubleshooting","hermes agent installation failed","hermes agent 2026","gly-NjEHbvQvROwBdUGK9utV7ZidtZPL6waTKLutQm0",[811,1271,1767],{"id":812,"title":813,"author":814,"body":815,"category":787,"date":788,"description":1255,"extension":790,"featured":791,"image":1256,"imageHeight":793,"imageWidth":793,"meta":1257,"navigation":795,"path":1258,"readingTime":797,"seo":1259,"seoTitle":1260,"stem":1261,"tags":1262,"updatedDate":788,"__hash__":1270},"blog/blog/ai-agent-slow-latency-fix.md","Why Is My AI Agent So Slow? Diagnosing Latency Step by Step",{"name":7,"role":8,"avatar":9},{"type":11,"value":816,"toc":1227},[817,820,823,826,829,832,835,839,842,845,851,855,858,861,867,873,879,882,885,889,892,895,898,901,904,907,911,914,918,924,934,940,946,950,953,956,959,962,965,968,974,980,986,992,995,999,1002,1005,1012,1015,1018,1024,1030,1036,1042,1046,1049,1052,1055,1058,1061,1065,1068,1071,1074,1078,1081,1087,1093,1099,1105,1111,1117,1120,1123,1127,1130,1136,1147,1153,1164,1167,1170,1190,1192,1196,1199,1203,1206,1210,1213,1217,1220,1224],[14,818,819],{},"We had an agent handling email triage. It read new emails, classified urgency, drafted responses for low-priority items, and flagged high-priority ones for human review.",[14,821,822],{},"In testing, it was fast. Sub-two-second responses. We were thrilled.",[14,824,825],{},"In production, it took 15 seconds per email. Sometimes 20. Users started complaining within the first hour.",[14,827,828],{},"My first instinct: the model is too slow, let's switch to something faster. So we swapped Claude Sonnet for GPT-5 Nano. Barely any improvement. Maybe half a second shaved off.",[14,830,831],{},"That's when it hit me. The model wasn't the bottleneck. The model was doing its job in under a second. Everything else around it was eating the other 14 seconds.",[14,833,834],{},"If your AI agent is slow, the model is almost never the real problem. The real problem lives in one of five places. Here's how to find it.",[39,836,838],{"id":837},"the-five-latency-layers-debug-in-this-order","The five latency layers (debug in this order)",[14,840,841],{},"Most people start debugging agent latency by looking at model benchmarks. \"Maybe I need a faster LLM.\" That's like diagnosing a slow website by replacing the database when the real problem is unoptimized SQL queries hitting the database 47 times per page load.",[14,843,844],{},"Agent latency has five layers. They compound multiplicatively, not additively. A problem at layer 3 makes layer 4 worse, which makes layer 5 catastrophic.",[14,846,847],{},[87,848],{"alt":849,"src":850},"The Five Latency Layers of an AI Agent, stacked to debug top to bottom without skipping: Layer 1 model speed (TTFT plus tok/s, most people blame this but it's rarely the real issue), Layer 2 context window size (every token in context costs processing time on every request), Layer 3 tool execution (external API calls, unmeasured and often the biggest offender), Layer 4 network plus API routing (geography adds up across 8-12 API calls per task), and Layer 5 multi-step compounding (every step multiplies all previous layers, the real killer). A problem at Layer 3 makes Layer 4 worse, and Layer 4 makes Layer 5 catastrophic","/img/blog/ai-agent-latency-five-layers.jpg",[39,852,854],{"id":853},"layer-1-model-speed-the-one-everyone-checks-first","Layer 1: Model speed (the one everyone checks first)",[14,856,857],{},"Time to first token (TTFT) measures how long after you send a prompt the model starts generating. Per-token latency (tok/s) measures how fast it generates once it starts.",[14,859,860],{},"Here's where things stand in 2026:",[14,862,863,866],{},[109,864,865],{},"Fastest TTFT:"," Claude Haiku 4.5 at around 597ms on medium prompts. Mistral Large and GPT-5.2 also hit sub-second consistently.",[14,868,869,872],{},[109,870,871],{},"Fastest throughput:"," Gemini 2.5 Flash at 146-173 tokens per second. Mercury 2 by Inception hits 789 tok/s but with quality tradeoffs. Gemini 3.5 Flash reaches 284 tok/s.",[14,874,875,878],{},[109,876,877],{},"Reasoning models are intentionally slow."," Models like o3, GPT-5, and Gemini Deep Think use chain-of-thought processing. They generate internal \"thinking\" tokens before the visible answer. TTFT can be 10-150 seconds. This isn't a bug. It's the architecture.",[14,880,881],{},"Here's the thing: for most agent workloads, model speed is not your bottleneck. A single LLM call takes about 800 milliseconds. If your agent is taking 15 seconds, the model used 800ms of that. You have 14.2 seconds of latency living somewhere else.",[14,883,884],{},"Before switching models, measure your actual model latency. Log the timestamp when you send the request and when the first token arrives. If it's under 2 seconds, your problem isn't the model.",[39,886,888],{"id":887},"layer-2-context-window-bloat-the-silent-killer","Layer 2: Context window bloat (the silent killer)",[14,890,891],{},"This is where most agent latency actually lives. And it's invisible unless you're counting tokens.",[14,893,894],{},"Every time your agent makes a request, it sends the entire conversation context to the model. System prompt. Conversation history. Tool definitions. Previous tool results. Memory context. All of it.",[14,896,897],{},"The numbers are wild. Research from Agenteer found that a single Jira integration adds roughly 17,000 tokens just for tool definitions. Across a typical agent setup with multiple integrations, 134,000 tokens (67% of a 200K context window) get consumed by definitions before the agent starts working.",[14,899,900],{},"That's the equivalent of walking into a restaurant and reading a 500-page menu before you can order water.",[14,902,903],{},"More tokens means more processing time. A 2,000-token prompt processes in under a second. A 100,000-token prompt with tool definitions, conversation history, and previous results takes significantly longer. The model has to read and attend to every token before generating a response.",[14,905,906],{},"Your context window is RAM, not storage. Everything in it costs processing time on every single request.",[96,908,910],{"id":909},"how-to-diagnose-it","How to diagnose it",[14,912,913],{},"Log your input token count for each request. If it's growing with every turn of the conversation, you have context bloat. If it starts high (above 20,000 tokens) even on the first turn, your tool definitions and system prompts are too heavy.",[96,915,917],{"id":916},"how-to-fix-it","How to fix it",[14,919,920,923],{},[109,921,922],{},"Don't load all tool definitions upfront."," Anthropic's own research showed that Opus 4's tool selection accuracy improved from 49% to 74% when the agent searched for relevant tools on demand instead of parsing all definitions at once. Fewer tools in context means faster processing and better accuracy.",[14,925,926,929,930,28],{},[109,927,928],{},"Summarize conversation history."," Instead of sending the full conversation, compress older turns into summaries. The Mem0 framework published 2026 benchmarks showing that a two-layer memory architecture (summarized context plus targeted retrieval) used 4x fewer tokens than full-context approaches while cutting latency by 91% and actually improving accuracy by 18.7 percentage points. This is the core of how ",[371,931,933],{"href":932},"/blog/how-ai-agent-memory-works","AI agent memory works",[14,935,936,939],{},[109,937,938],{},"Trim tool results aggressively."," A single MCP server call that returns a 50-field JSON blob when you only need 3 fields wastes thousands of tokens. Filter tool results before they enter context.",[14,941,942],{},[87,943],{"alt":944,"src":945},"Context Bloat, the before and after. Before optimization: 119K tokens and 17-second p95 latency, made up of a 2K system prompt, 17K tool definitions, 40K conversation history and 60K tool results. After optimization: 11K tokens and 1.4-second p95 latency, made up of a 2K system prompt, 3K active tools only, 2K summarized history and 4K filtered results. A 91% latency reduction from context management alone, per Mem0 2026","/img/blog/ai-agent-latency-context-bloat-before-after.jpg",[39,947,949],{"id":948},"layer-3-tool-execution-the-one-you-forgot-to-measure","Layer 3: Tool execution (the one you forgot to measure)",[14,951,952],{},"When your agent calls an external tool (send email, read CRM, query database, fetch webpage), the tool's execution time adds directly to the agent's response time. And most people never measure it.",[14,954,955],{},"A Gmail API call to fetch recent emails: 200-800ms. A HubSpot CRM lookup: 300-1,200ms. A web scraping call: 1-5 seconds. A database query on an unindexed table: could be anything from 50ms to 30 seconds.",[14,957,958],{},"If your agent makes 3 tool calls in sequence (which is common for multi-step tasks), and each takes an average of 1 second, that's 3 seconds of tool execution time before the model even starts thinking about the next step.",[96,960,910],{"id":961},"how-to-diagnose-it-1",[14,963,964],{},"Wrap every tool call with timing logs. You'll often find that one specific tool is responsible for 60-80% of total tool execution time.",[96,966,917],{"id":967},"how-to-fix-it-1",[14,969,970,973],{},[109,971,972],{},"Parallelize where possible."," If your agent needs data from Gmail and HubSpot, fetch both simultaneously instead of sequentially. This cuts tool execution time in half for independent calls.",[14,975,976],{},[87,977],{"alt":978,"src":979},"Parallelize Independent Tool Calls. Run sequentially, a Gmail fetch, a 900ms HubSpot lookup and an 800ms DB query total 2,300ms. Run in parallel, the same three calls (Gmail 600ms, HubSpot 900ms, DB 800ms) all run at once and total 900ms because the slowest one wins, 2.5x faster. If your agent needs Gmail and HubSpot, fetch both at once, not one then the other","/img/blog/ai-agent-latency-parallelize-tool-calls.jpg",[14,981,982,985],{},[109,983,984],{},"Cache repeated lookups."," If your agent queries the same CRM record multiple times in one conversation, cache the first result. Semantic caching, as Redis LangCache demonstrated, can reduce redundant API calls dramatically.",[14,987,988,991],{},[109,989,990],{},"Set timeouts."," A tool call that takes 30 seconds because of an external API issue shouldn't freeze your entire agent. Set aggressive timeouts (3-5 seconds) and have fallback behavior.",[14,993,994],{},"This is one of the areas where managed agent platforms have an advantage over self-hosted setups. On BetterClaw, integrations are pre-optimized with connection pooling, caching, and timeout handling built in. When you self-host, you're building all of that yourself.",[39,996,998],{"id":997},"layer-4-network-and-api-routing","Layer 4: Network and API routing",[14,1000,1001],{},"If your agent is on a VPS in Frankfurt and your users are in San Francisco, every API round-trip adds 100-200ms of network latency. For a single request, that's barely noticeable. For an agent that makes 8-12 API calls per task (LLM calls plus tool calls plus memory lookups), it adds up to 1-2 seconds of pure network overhead.",[96,1003,910],{"id":1004},"how-to-diagnose-it-2",[14,1006,1007,1008,1011],{},"Run ",[18,1009,1010],{},"ping"," to your LLM provider's API endpoint from your agent's server. If it's over 100ms, geography is costing you.",[14,1013,1014],{},"Compare agent response times from the same machine the agent runs on versus from your actual user location. The difference is network overhead.",[96,1016,917],{"id":1017},"how-to-fix-it-2",[14,1019,1020,1023],{},[109,1021,1022],{},"Deploy your agent close to your LLM provider's data centers."," Most major providers (OpenAI, Anthropic, Google) have US and EU endpoints. Match your agent's region to the provider's closest endpoint.",[14,1025,1026,1029],{},[109,1027,1028],{},"Use streaming."," Instead of waiting for the full response, stream tokens to the user as they're generated. This doesn't reduce total latency, but it reduces perceived latency dramatically. The user sees the response building in real-time instead of staring at a loading spinner for 8 seconds.",[14,1031,1032,1035],{},[109,1033,1034],{},"Minimize round-trips."," Every time your agent \"thinks\" (LLM call), \"acts\" (tool call), and \"observes\" (processes result), that's at minimum three network round-trips per step. Reducing the number of steps reduces total round-trip overhead proportionally.",[14,1037,1038],{},[87,1039],{"alt":1040,"src":1041},"Every Step Multiplies Every Other Layer, a line chart of total latency against number of agent steps. An optimized agent at 1.5 seconds per step reaches 15 seconds at 10 steps; a typical agent at 3 seconds per step reaches 30 seconds; and a bloated-context agent at 5 seconds per step reaches 50 seconds, a full minute per task and unusable for users. Most tasks should end by step 5. The fastest agent solves the problem in fewer steps, not faster steps","/img/blog/ai-agent-latency-every-step-multiplies.jpg",[39,1043,1045],{"id":1044},"layer-5-multi-step-compounding-the-multiplier-nobody-talks-about","Layer 5: Multi-step compounding (the multiplier nobody talks about)",[14,1047,1048],{},"Here's where agent latency gets genuinely painful. Every additional step in your agent's workflow multiplies all the previous layers.",[14,1050,1051],{},"A single LLM call: ~800ms. Totally fine.",[14,1053,1054],{},"An orchestrator-worker flow with a reflexion loop: 10-30 seconds. Stevens Institute research puts this as the primary engineering constraint for AI agents in 2026.",[14,1056,1057],{},"A 10-step agent task means 10 LLM calls, potentially 10 tool calls, context growing with every turn, and network overhead on every round-trip. If each step takes 1.5 seconds (fast!), your total task time is 15 seconds. If each step takes 3 seconds (normal), you're at 30 seconds. At 5 seconds per step (common with context bloat), you're looking at nearly a minute.",[14,1059,1060],{},"For user-facing applications, an orchestrator-worker flow with reflection can take 10-30 seconds. For customer support, this latency is often unacceptable.",[96,1062,1064],{"id":1063},"the-honest-math","The honest math",[14,1066,1067],{},"Let's say your agent handles a support ticket. Steps: (1) read the ticket, (2) look up customer in CRM, (3) check order history, (4) check knowledge base, (5) draft response, (6) format and send.",[14,1069,1070],{},"Six steps. Each step involves at least one LLM call (800ms), one tool call (500ms average), and context processing that grows each turn. Conservative estimate: 2 seconds per step = 12 seconds total. Realistic with context bloat: 4 seconds per step = 24 seconds total.",[14,1072,1073],{},"This is why smart agent design keeps step count low. The fastest agent isn't the one with the fastest model. It's the one that solves the problem in 3 steps instead of 8.",[39,1075,1077],{"id":1076},"the-60-second-diagnostic-checklist","The 60-second diagnostic checklist",[14,1079,1080],{},"When your agent is slow, run through this in order:",[14,1082,1083],{},[87,1084],{"alt":1085,"src":1086},"The 60-Second Agent Latency Diagnostic, a five-step checklist: 1, check model TTFT by logging request-to-first-token time, and if it's under 2 seconds the model isn't the problem; 2, count input tokens, and over 30K means context bloat, so check tool defs and history; 3, time each tool call by wrapping every tool with timing logs to find the slowest; 4, check geography by pinging your LLM provider from your server, and over 100ms means move closer; 5, count your steps, and over 5 for a typical task means redesign the workflow. Most slowness is layers 2 and 5 working together, so fix those first","/img/blog/ai-agent-latency-60-second-diagnostic.jpg",[14,1088,1089,1092],{},[109,1090,1091],{},"Check model TTFT."," Log the time between request sent and first token received. If it's under 2 seconds, the model isn't your problem.",[14,1094,1095,1098],{},[109,1096,1097],{},"Count input tokens."," If your input exceeds 30,000 tokens per request, you have context bloat. Check tool definitions, conversation history, and tool results.",[14,1100,1101,1104],{},[109,1102,1103],{},"Time each tool call."," Find the slowest one. It's probably responsible for most of your tool execution latency.",[14,1106,1107,1110],{},[109,1108,1109],{},"Check geography."," Ping your LLM provider from your agent's server. If it's over 100ms, move closer.",[14,1112,1113,1116],{},[109,1114,1115],{},"Count your steps."," If your agent takes more than 5 steps for a typical task, redesign the workflow to reduce steps.",[14,1118,1119],{},"Most agent slowness is layers 2 and 5 working together. Bloated context makes each step slower. More steps means more bloated context. It's a feedback loop that gets worse with every conversation turn.",[14,1121,1122],{},"This is exactly why we built smart context management into BetterClaw from day one. Token bloat is the number one production agent killer, and most self-hosted frameworks leave you to solve it yourself. On BetterClaw, context is automatically managed per agent, tool results are filtered before entering the window, and persistent memory uses hybrid vector plus keyword retrieval so your agent doesn't drag around dead conversation weight. Free plan with every feature. $19/month per agent for Pro. BYOK with zero inference markup.",[39,1124,1126],{"id":1125},"when-switching-models-actually-helps-and-when-it-doesnt","When switching models actually helps (and when it doesn't)",[14,1128,1129],{},"After all that, there are specific cases where the model genuinely is the bottleneck:",[14,1131,1132],{},[87,1133],{"alt":1134,"src":1135},"Model Switch vs Infrastructure Fix, how to decide. Switch the model when you're using a reasoning model like o3 or GPT-5 for a simple classification task, you need faster streaming for customer-facing responses, you're running local inference where hardware is the real ceiling, or TTFT is over 2 seconds after measuring. Fix infrastructure first when context exceeds 30K tokens per request, tool calls are sequential and unparallelized, the agent takes more than 5 steps per task, or response times vary wildly between requests. The model is 800ms of a 15-second problem; fix the other 14.2 seconds first","/img/blog/ai-agent-latency-model-switch-vs-infrastructure.jpg",[14,1137,1138,1141,1142,1146],{},[109,1139,1140],{},"You're using a reasoning model for a classification task."," If your agent is classifying email urgency (simple task) using o3 or GPT-5 (reasoning model), you're paying 10-30 seconds of \"thinking\" latency for a task that Haiku or Flash can handle in 600ms. Match model size to task complexity. The framework for ",[371,1143,1145],{"href":1144},"/blog/how-to-choose-llm-for-your-task","choosing the right LLM per task"," covers this directly.",[14,1148,1149,1152],{},[109,1150,1151],{},"You need streaming for user-facing interactions."," Some providers stream faster than others. Gemini 2.5 Flash at 173 tok/s finishes a 1,100-token response in under 7 seconds. Slower models might take 20+ seconds for the same output. For customer-facing agents, streaming speed matters.",[14,1154,1155,1158,1159,1163],{},[109,1156,1157],{},"Your agent runs on local hardware."," If you're running local inference on a Mac Mini or mid-range GPU, the model is genuinely slow (3-5 seconds to first token, 30-40 tok/s). Cloud APIs are 5-10x faster for agent workloads. We break down the ",[371,1160,1162],{"href":1161},"/blog/apple-silicon-vs-nvidia-ai-agents","hardware speed gap"," in detail.",[14,1165,1166],{},"For everything else, fix layers 2-5 first. You'll get more speed improvement from trimming 50,000 tokens out of your context window than from switching to a model that's 200ms faster on TTFT.",[14,1168,1169],{},"The difference between a frustrating agent and a fast one usually isn't the model. It's whether someone bothered to measure where the latency actually lives.",[14,1171,1172,1173,1179,1180,1184,1185,1189],{},"If you'd rather skip the latency debugging entirely, ",[371,1174,1178],{"href":1175,"rel":1176},"https://app.betterclaw.io/sign-in",[1177],"nofollow","give BetterClaw a look",". Context management, tool optimization, caching, and infrastructure are all handled. ",[371,1181,1183],{"href":1182},"/free-plan","Free plan"," with 1 agent and every feature. ",[371,1186,1188],{"href":1187},"/pricing","$19/month per agent"," on Pro. Your agent deploys in 60 seconds. On infrastructure we've already optimized for speed.",[39,1191,687],{"id":686},[96,1193,1195],{"id":1194},"what-causes-ai-agent-latency","What causes AI agent latency?",[14,1197,1198],{},"AI agent latency comes from five layers: model speed (time to first token and generation rate), context window size (more tokens means more processing time per request), tool execution time (external API calls like CRM, email, or database lookups), network round-trips between your agent and API endpoints, and multi-step compounding where each workflow step multiplies all previous delays. In most cases, context bloat and step count cause more slowness than the model itself.",[96,1200,1202],{"id":1201},"how-does-llm-latency-differ-between-providers-in-2026","How does LLM latency differ between providers in 2026?",[14,1204,1205],{},"Claude Haiku 4.5 leads on time to first token at around 597ms. Gemini 2.5 Flash leads on throughput at 146-173 tokens per second. Reasoning models (o3, GPT-5, Gemini Deep Think) are intentionally slow, often 10-150 seconds to first token due to chain-of-thought processing. For agent workloads, the fastest practical choices are Gemini Flash variants for throughput and Claude Haiku for TTFT.",[96,1207,1209],{"id":1208},"how-do-i-reduce-my-ai-agents-response-time","How do I reduce my AI agent's response time?",[14,1211,1212],{},"Start by logging input token counts. If they exceed 30,000 tokens, compress conversation history into summaries, load tool definitions on demand instead of all at once, and filter tool results before they enter context. Mem0's 2026 benchmarks showed that optimized context management cut latency by 91% while improving accuracy. After fixing context, parallelize independent tool calls and reduce total workflow steps.",[96,1214,1216],{"id":1215},"does-switching-to-a-faster-llm-model-fix-agent-latency","Does switching to a faster LLM model fix agent latency?",[14,1218,1219],{},"Usually not. A single LLM call takes about 800ms. If your agent takes 15 seconds total, the model accounts for roughly 5% of the latency. The other 95% is context processing, tool execution, network overhead, and multi-step compounding. Switch models only when you're using a reasoning model for simple tasks, need faster streaming for user-facing responses, or running local inference where hardware is the genuine bottleneck.",[96,1221,1223],{"id":1222},"is-managed-hosting-faster-than-self-hosted-ai-agents","Is managed hosting faster than self-hosted AI agents?",[14,1225,1226],{},"Generally yes, for three reasons: managed platforms pre-optimize tool integrations with connection pooling and caching, they handle context management automatically to prevent token bloat, and they deploy on infrastructure close to major LLM provider data centers. BetterClaw's managed infrastructure includes smart context management, optimized integrations, and zero setup overhead. Self-hosting gives you full control but requires you to solve every latency layer yourself.",{"title":52,"searchDepth":183,"depth":183,"links":1228},[1229,1230,1231,1235,1239,1243,1246,1247,1248],{"id":837,"depth":183,"text":838},{"id":853,"depth":183,"text":854},{"id":887,"depth":183,"text":888,"children":1232},[1233,1234],{"id":909,"depth":197,"text":910},{"id":916,"depth":197,"text":917},{"id":948,"depth":183,"text":949,"children":1236},[1237,1238],{"id":961,"depth":197,"text":910},{"id":967,"depth":197,"text":917},{"id":997,"depth":183,"text":998,"children":1240},[1241,1242],{"id":1004,"depth":197,"text":910},{"id":1017,"depth":197,"text":917},{"id":1044,"depth":183,"text":1045,"children":1244},[1245],{"id":1063,"depth":197,"text":1064},{"id":1076,"depth":183,"text":1077},{"id":1125,"depth":183,"text":1126},{"id":686,"depth":183,"text":687,"children":1249},[1250,1251,1252,1253,1254],{"id":1194,"depth":197,"text":1195},{"id":1201,"depth":197,"text":1202},{"id":1208,"depth":197,"text":1209},{"id":1215,"depth":197,"text":1216},{"id":1222,"depth":197,"text":1223},"AI agent taking 15+ seconds? The model isn't the bottleneck. Diagnose context bloat, tool lag, and step compounding with this 5-layer framework.","/img/blog/ai-agent-slow-latency-fix.jpg",{},"/blog/ai-agent-slow-latency-fix",{"title":813,"description":1255},"Why Is My AI Agent So Slow? Fix Latency Fast","blog/ai-agent-slow-latency-fix",[1263,1264,1265,1266,1267,1268,1269],"ai agent slow","llm latency","reduce ai agent latency","llm inference speed","ai agent performance","context window optimization","agent response time","6-FO2dQvPYr7_yc2UaJq_cL6Lr02qBls62Y9O1dFD4I",{"id":1272,"title":1273,"author":1274,"body":1275,"category":787,"date":1749,"description":1750,"extension":790,"featured":791,"image":1751,"imageHeight":793,"imageWidth":793,"meta":1752,"navigation":795,"path":1753,"readingTime":1754,"seo":1755,"seoTitle":1756,"stem":1757,"tags":1758,"updatedDate":1765,"__hash__":1766},"blog/blog/claude-cowork-not-working-windows.md","Claude Cowork Not Working on Windows? Every Known Bug and the Best Workaround in 2026",{"name":7,"role":8,"avatar":9},{"type":11,"value":1276,"toc":1724},[1277,1282,1285,1289,1296,1299,1310,1316,1325,1331,1335,1350,1354,1357,1361,1364,1368,1371,1375,1378,1382,1385,1389,1392,1398,1402,1405,1408,1411,1414,1417,1426,1430,1433,1436,1439,1446,1453,1457,1468,1471,1479,1483,1486,1489,1492,1495,1498,1501,1507,1511,1515,1518,1544,1547,1551,1558,1581,1584,1588,1591,1595,1598,1613,1619,1623,1626,1634,1638,1641,1653,1655,1660,1663,1668,1682,1687,1697,1702,1713,1718,1721],[14,1278,1279],{},[109,1280,1281],{},"Claude Cowork fails on Windows for five reasons: (1) the CoworkVMService stops after reboot or sleep, (2) the \"yukonSilver\" platform detection bug marks capable systems as unsupported, (3) Windows Home edition lacks the full Hyper-V stack Cowork needs, (4) network conflicts with VPNs or Docker on the 172.16.0.0/24 range, and (5) corrupted installs from the old Squirrel installer. Each has a different fix.",[14,1283,1284],{},"Cowork shipped on Windows on February 10, 2026, and went GA across all paying subscribers on April 9, 2026. The Claude Code GitHub repo has been collecting Windows-specific bugs since launch: cryptic \"yukonSilver not supported\" errors, missing Cowork tabs on fully capable machines, and a VM service that resists removal. We've tracked the major failure modes and what actually fixes each one. No fluff.",[39,1286,1288],{"id":1287},"try-this-first-restart-coworkvmservice","Try this first: restart CoworkVMService",[14,1290,1291,1292,1295],{},"Before anything else, check whether Cowork's background service is actually running. CoworkVMService ships with startup type ",[109,1293,1294],{},"Manual",", which means it stops after reboots, Windows updates, and sleep/wake cycles. Once it stops, Cowork hangs or fails to connect even though everything else looks fine. This is the most common Cowork issue on Windows and the fastest one to fix.",[14,1297,1298],{},"Open PowerShell as Administrator and run:",[47,1300,1304],{"className":1301,"code":1302,"language":1303,"meta":52,"style":52},"language-powershell shiki shiki-themes github-light","sc start CoworkVMService\n","powershell",[18,1305,1306],{"__ignoreMap":52},[56,1307,1308],{"class":58,"line":59},[56,1309,1302],{},[14,1311,1312,1313,1315],{},"If that fixes Cowork until the next reboot, make it stick by switching the service to automatic startup (mind the space after ",[18,1314,165],{},"):",[47,1317,1319],{"className":1301,"code":1318,"language":1303,"meta":52,"style":52},"sc config CoworkVMService start= auto\n",[18,1320,1321],{"__ignoreMap":52},[56,1322,1323],{"class":58,"line":59},[56,1324,1318],{},[14,1326,76,1327,1330],{},[18,1328,1329],{},"sc start"," returns \"service not found,\" skip to the install-related sections below — your Cowork installation may be broken. Otherwise, restart Claude Desktop and check the Cowork tab.",[39,1332,1334],{"id":1333},"check-your-system-before-you-debug","Check your system before you debug",[14,1336,1337,1338,1341,1342,1345,1346,1349],{},"Anthropic ships a downloadable ",[109,1339,1340],{},"Cowork readiness checker"," linked from the \"Get started with Claude Cowork\" article in Anthropic's help center (separate utility, not part of Claude Desktop). Run it first — it reports whether your machine has the Hyper-V components Cowork needs. Caveat: on Windows 11 Home it can falsely report \"ready\" because Windows 11 internally still reports as ",[18,1343,1344],{},"10.0"," and the checker misidentifies the OS (GitHub #50621). If the checker says ready but Cowork won't load, run ",[18,1347,1348],{},"Get-Service vmms"," in PowerShell — if that service doesn't exist, you're on Home and Cowork won't work.",[39,1351,1353],{"id":1352},"the-five-ways-cowork-breaks-on-windows","The Five Ways Cowork Breaks on Windows",[14,1355,1356],{},"The problems aren't random. They fall into five distinct patterns, and knowing which one you're hitting is half the battle.",[96,1358,1360],{"id":1359},"_1-the-missing-tab-yukonsilver-bug","1. The Missing Tab (yukonSilver bug)",[14,1362,1363],{},"You install Claude Desktop, open it, and the Cowork tab simply isn't there. Only \"Chat\" shows up. This is the \"yukonSilver not supported\" bug, tracked in GitHub issues #25136, #32004, and #32837. Claude's internal platform detection incorrectly marks your system as incompatible, even when all virtualization features are enabled.",[96,1365,1367],{"id":1366},"_2-the-infinite-setup-spinner","2. The Infinite Setup Spinner",[14,1369,1370],{},"The Cowork tab appears, but clicking it shows \"Setting up Claude's workspace\" with a loading bar stuck at 80 to 90%. It never completes. Users have reported leaving it running for 12+ hours with no progress. No error message. Just spinning.",[96,1372,1374],{"id":1373},"_3-the-api-connection-failure","3. The API Connection Failure",[14,1376,1377],{},"The workspace starts but can't reach Claude's API. You get \"Cannot connect to Claude API from workspace\" or its Japanese equivalent. This was a day-one launch bug on Windows 11 Home and has resurfaced multiple times since.",[96,1379,1381],{"id":1380},"_4-the-network-conflict","4. The Network Conflict",[14,1383,1384],{},"Cowork uses a hardcoded network range (172.16.0.0/24) for its internal NAT. If your home network, corporate VPN, or another VM tool uses the same range, Cowork's VM can't reach the internet. Worse, it can break your WSL2 and Docker networking in the process.",[96,1386,1388],{"id":1387},"_5-the-update-regression","5. The Update Regression",[14,1390,1391],{},"Auto-updates have introduced Cowork-breaking regressions more than once. The most-reported example was v1.1.5749 on March 9, 2026, which broke working installs and required a patch release to recover. Anthropic has since shipped multiple updates; if you're stuck on a known-bad version, updating to the current Claude Desktop release is usually the fix.",[14,1393,1394],{},[87,1395],{"alt":1396,"src":1397},"The five ways Claude Cowork breaks on Windows: missing tab, infinite spinner, API failure, network conflict, and update regression","/img/blog/claude-cowork-not-working-windows-five-bugs.jpg",[39,1399,1401],{"id":1400},"the-windows-home-problem-that-anthropic-still-hasnt-documented","The Windows Home Problem That Anthropic Still Hasn't Documented",[14,1403,1404],{},"This is where it gets messy.",[14,1406,1407],{},"Claude Cowork runs inside a lightweight Hyper-V virtual machine on your Windows machine. That's how it creates its sandboxed environment for file access and code execution. The problem? Windows 11 Home doesn't include the full Hyper-V stack.",[14,1409,1410],{},"Home edition has Virtual Machine Platform and Windows Hypervisor Platform. But it's missing the vmms (Virtual Machine Management) service that Cowork's VM requires. Without it, the VM either fails silently or throws a cryptic \"Plan9 mount failed: bad address\" error.",[14,1412,1413],{},"At least seven separate GitHub issues have been filed by Windows Home users who spent hours troubleshooting before discovering that their Windows edition simply can't run Cowork. One user explicitly noted they \"subscribed to Max specifically to use this feature\" and only discovered the incompatibility after paying.",[14,1415,1416],{},"A documentation request (GitHub issue #27906) was filed in February asking Anthropic to add this information clearly. That issue is now closed and the help center's deployment article has been updated to be more explicit about edition requirements.",[14,1418,1419,1420,1422,1423,1425],{},"The quickest check is to open PowerShell and run ",[18,1421,1348],{},". If the service isn't found, Cowork won't work without upgrading to Windows 11 Pro or Enterprise. Don't rely on Anthropic's readiness checker alone — GitHub #50621 documents that it falsely reports \"ready\" on Windows 11 Home because Windows 11 internally still reports its version as ",[18,1424,1344],{}," and the checker misclassifies the OS. A handful of community reports describe partial functionality on Home, but the official position is that Cowork requires the Pro/Enterprise Hyper-V stack.",[39,1427,1429],{"id":1428},"the-yukonsilver-bug-and-why-your-pro-machine-still-fails","The \"yukonSilver\" Bug and Why Your Pro Machine Still Fails",[14,1431,1432],{},"Stay with me here, because this one is especially frustrating.",[14,1434,1435],{},"Even if you're running Windows 11 Pro with every virtualization feature enabled (Hyper-V, VMP, WHP, WSL2), you might still see the Cowork tab missing entirely. The logs will show \"yukonSilver not supported (status=unsupported)\" followed by the VM bundle cleanup routine running instead of the actual VM boot.",[14,1437,1438],{},"\"yukonSilver\" is Claude's internal codename for its VM configuration on Windows. The bug is in the platform detection logic: it incorrectly classifies fully capable x64 Windows 11 Pro systems as unsupported.",[14,1440,1441,1442,1445],{},"But that's not even the real problem. The installer also creates a Windows service called CoworkVMService, and this service sometimes becomes impossible to remove. Running ",[18,1443,1444],{},"sc.exe delete CoworkVMService"," as Administrator returns \"Access denied.\" The service blocks clean reinstalls and creates a circular failure where you can't fix the problem and you can't start fresh.",[14,1447,1448,1449,1452],{},"The documented workaround from community debugging: manually run ",[18,1450,1451],{},"Add-AppxPackage"," as the target user to install the MSIX package correctly for your account. It's a PowerShell command that most of Cowork's target audience (non-developers) would never discover on their own.",[96,1454,1456],{"id":1455},"squirrel-vs-msix-which-installer-do-you-have","Squirrel vs. MSIX: which installer do you have?",[14,1458,1459,1460,1463,1464,1467],{},"Anthropic switched Claude Desktop on Windows from a Squirrel ",[18,1461,1462],{},".exe"," installer to an MSIX/Microsoft Store package around February 10-13, 2026. If you installed Claude Desktop before that, you have the Squirrel build, and the in-app \"Reinstall\" button can silently fail (tracked in GitHub issues #25162, #25385, #26457; error code ",[18,1465,1466],{},"0x80073CFA"," in some logs). The fix is a manual uninstall via \"Add or remove programs,\" followed by downloading the fresh MSIX from the official Claude download page. Note that MSIX installs also require Windows \"Sideload apps\" / \"Trusted App Installs\" to be enabled — without it, the MSIX install fails before it starts.",[14,1469,1470],{},"As one developer debugging the issue put it: \"Cowork is marketed at the people least equipped to debug it when it breaks.\"",[14,1472,1473,1474,1478],{},"If you've been running into similar infrastructure headaches with AI agents and want something that works out of the box, our ",[371,1475,1477],{"href":1476},"/compare/self-hosted","comparison of self-hosted vs managed OpenClaw deployments"," covers why some teams are moving away from local setups entirely.",[39,1480,1482],{"id":1481},"the-network-bug-that-breaks-docker-too","The Network Bug That Breaks Docker Too",[14,1484,1485],{},"Here's what nobody tells you about Cowork's networking on Windows.",[14,1487,1488],{},"Cowork creates its own Hyper-V virtual switch and NAT network. It's separate from WSL2's networking and separate from Docker Desktop's networking. Three different tenants sharing the same hypervisor, each with their own plumbing.",[14,1490,1491],{},"The specific failure: Cowork creates an HNS (Host Network Service) network called \"cowork-vm-nat\" but sometimes fails to create the corresponding WinNAT rule. The HNS network exists, but there's no NAT translation. The VM boots, but it has no internet access.",[14,1493,1494],{},"And in a particularly fun bug, Cowork's virtual network has been reported to permanently break WSL2's internet connectivity until you manually find and delete the offending network configuration using PowerShell HNS diagnostic tools.",[14,1496,1497],{},"The fix, discovered by community members, involves stopping all Claude processes, killing the Cowork VM via hcsdiag, removing the broken HNS network, and recreating it on a non-conflicting subnet like 172.24.0.0/24 or 10.200.0.0/24.",[14,1499,1500],{},"This is three PowerShell commands for someone who knows what they're doing. For someone who just wanted to organize their Downloads folder with AI, it's a wall.",[14,1502,1503],{},[87,1504],{"alt":1505,"src":1506},"Cowork network conflict diagram showing Hyper-V NAT, WSL2, and Docker competing on the same subnet","/img/blog/claude-cowork-not-working-windows-network-conflict.jpg",[39,1508,1510],{"id":1509},"what-actually-fixes-each-bug-quick-reference","What Actually Fixes Each Bug (Quick Reference)",[96,1512,1514],{"id":1513},"missing-cowork-tab-yukonsilver-bug","Missing Cowork Tab (yukonSilver bug)",[14,1516,1517],{},"First, confirm you're not on Windows Home. If you're on Pro or Enterprise and still don't see the tab, fully uninstall Claude Desktop, remove the leftover service, and clear residual files before reinstalling:",[47,1519,1521],{"className":1301,"code":1520,"language":1303,"meta":52,"style":52},"sc.exe stop CoworkVMService\nsc.exe delete CoworkVMService\nRemove-Item -Recurse \"$env:APPDATA\\Claude\"\nRemove-Item -Recurse \"$env:LOCALAPPDATA\\Packages\\Claude_*\"\n",[18,1522,1523,1528,1533,1538],{"__ignoreMap":52},[56,1524,1525],{"class":58,"line":59},[56,1526,1527],{},"sc.exe stop CoworkVMService\n",[56,1529,1530],{"class":58,"line":183},[56,1531,1532],{},"sc.exe delete CoworkVMService\n",[56,1534,1535],{"class":58,"line":197},[56,1536,1537],{},"Remove-Item -Recurse \"$env:APPDATA\\Claude\"\n",[56,1539,1541],{"class":58,"line":1540},4,[56,1542,1543],{},"Remove-Item -Recurse \"$env:LOCALAPPDATA\\Packages\\Claude_*\"\n",[14,1545,1546],{},"Then reinstall fresh from the official Claude download page.",[96,1548,1550],{"id":1549},"infinite-setup-spinner","Infinite Setup Spinner",[14,1552,1553,1554,1557],{},"Two common causes here. First, the VM download itself. Look in ",[18,1555,1556],{},"%APPDATA%\\Claude\\vm_bundles\\"," — if the directory is empty or incomplete, your download was interrupted and a clean reinstall usually resolves it.",[14,1559,1560,1561,1564,1565,1568,1569,1572,1573,1576,1577,1580],{},"Second, the ",[109,1562,1563],{},"cross-drive storage path bug"," (GitHub #36642, #30584, #37754). Cowork writes ",[18,1566,1567],{},"rootfs.vhdx"," to ",[18,1570,1571],{},"C:\\Windows\\Temp"," first and then tries to rename it into its final location. If Windows \"Where new content is saved\" sends user data to a non-C: drive, that rename crosses devices and Node.js throws ",[18,1574,1575],{},"EXDEV: cross-device link not permitted",". Symptom: the spinner hangs forever with no visible error. Fix: open ",[109,1578,1579],{},"Settings → System → Storage → Advanced storage settings → Where new content is saved",", switch \"New apps\" back to the C: drive, and retry.",[14,1582,1583],{},"If the spinner persists on Windows Home, it's the Hyper-V incompatibility and there's no fix short of upgrading your edition.",[96,1585,1587],{"id":1586},"api-connection-failure","API Connection Failure",[14,1589,1590],{},"Disable your VPN temporarily (fully quit, don't just disconnect). Check whether your network uses the 172.16.0.0/24 range. If Chat works but Cowork doesn't, the problem is the VM's network stack, not your internet. Update to the latest Claude Desktop — v1.1.4328 or higher specifically addressed early API connection bugs.",[96,1592,1594],{"id":1593},"network-conflict","Network Conflict",[14,1596,1597],{},"Check whether Cowork's HNS network exists but the NAT rule doesn't:",[47,1599,1601],{"className":1301,"code":1600,"language":1303,"meta":52,"style":52},"Get-NetNat\nGet-HnsNetwork | Where-Object {$_.Name -eq \"cowork-vm-nat\"}\n",[18,1602,1603,1608],{"__ignoreMap":52},[56,1604,1605],{"class":58,"line":59},[56,1606,1607],{},"Get-NetNat\n",[56,1609,1610],{"class":58,"line":183},[56,1611,1612],{},"Get-HnsNetwork | Where-Object {$_.Name -eq \"cowork-vm-nat\"}\n",[14,1614,76,1615,1618],{},[18,1616,1617],{},"Get-NetNat"," is empty but the HNS query returns a result, you're in the \"missing NAT rule\" state. Remove the broken network and recreate it on a non-conflicting subnet like 172.24.0.0/24 or 10.200.0.0/24.",[96,1620,1622],{"id":1621},"update-regression-v115749","Update Regression (v1.1.5749)",[14,1624,1625],{},"If Cowork broke after the March 9 update, there's no user-side fix. Update to the latest Claude Desktop release — Anthropic has shipped multiple patches since.",[14,1627,1628,1629,1633],{},"If all of this sounds like a lot of infrastructure debugging for a tool that's supposed to \"just work,\" that's because it is. This is the kind of operational friction we built ",[371,1630,1632],{"href":1631},"/","BetterClaw"," to eliminate. Your OpenClaw agent runs on managed infrastructure: no local VMs, no Hyper-V, no NAT conflicts. $19/month, BYOK, first deploy in ~60 seconds.",[39,1635,1637],{"id":1636},"cowork-vs-a-managed-agent-pick-what-matches-your-job","Cowork vs. a managed agent: pick what matches your job",[14,1639,1640],{},"Cowork is a desktop co-pilot. It runs a local Hyper-V VM, which is why every Windows edition quirk, network conflict, and update regression becomes a potential failure point. If you need a co-pilot you sit beside, that trade-off makes sense.",[14,1642,1643,1644,1648,1649,28],{},"If you need an always-on agent that handles tasks across messaging platforms while your computer is asleep, the architecture has to be different. ",[371,1645,1647],{"href":1646},"/openclaw-hosting","Managed OpenClaw hosting"," runs your agent on cloud infrastructure with Slack, Discord, WhatsApp, and 15+ channels. No local VM, no Hyper-V, no PowerShell on a Tuesday night. $19/agent/month, BYOK, first deploy in ~60 seconds. ",[371,1650,1652],{"href":1175,"rel":1651},[1177],"Start free",[39,1654,687],{"id":686},[14,1656,1657],{},[109,1658,1659],{},"Why is Claude Cowork not working on my Windows machine?",[14,1661,1662],{},"Top causes: CoworkVMService stopped after reboot or sleep, Windows Home edition missing the full Hyper-V stack, the \"yukonSilver\" platform detection bug, network conflicts with VPNs or Docker on 172.16.0.0/24, or a corrupted install from the old Squirrel package. Check your Windows edition, then your VM service state, then the Claude Code GitHub issues for your exact error.",[14,1664,1665],{},[109,1666,1667],{},"How do I restart CoworkVMService on Windows?",[14,1669,1670,1671,1674,1675,1678,1679,1681],{},"Open PowerShell as Administrator and run ",[18,1672,1673],{},"sc start CoworkVMService"," to start it for the current session. To make it survive reboots, run ",[18,1676,1677],{},"sc config CoworkVMService start= auto"," (mind the space after ",[18,1680,165],{},"). The service ships with startup type Manual, so it stops after reboots, Windows updates, and sleep/wake cycles. This is the single most common Cowork fix on Windows.",[14,1683,1684],{},[109,1685,1686],{},"Does Claude Cowork work on Windows 11 Home?",[14,1688,1689,1690,1693,1694,1696],{},"No, officially. Cowork requires the Hyper-V ",[18,1691,1692],{},"vmms"," service, which Home editions lack. Anthropic's readiness checker can falsely report Home as \"ready\" (it misidentifies the OS), so don't trust it alone — run ",[18,1695,1348],{}," in PowerShell. If it's missing, upgrade to Windows 11 Pro or Enterprise.",[14,1698,1699],{},[109,1700,1701],{},"How do I fix the \"yukonSilver not supported\" error in Claude Cowork?",[14,1703,1704,1705,1708,1709,1712],{},"This is a platform detection bug on Claude's side, still open as of May 2026. The workaround: fully uninstall Claude Desktop, stop and delete CoworkVMService via elevated PowerShell, clear ",[18,1706,1707],{},"%APPDATA%\\Claude"," and the ",[18,1710,1711],{},"%LOCALAPPDATA%\\Packages\\Claude_*"," folder, then reinstall fresh from the official download.",[14,1714,1715],{},[109,1716,1717],{},"Is Claude Cowork on Windows stable enough for daily use in 2026?",[14,1719,1720],{},"Cowork went GA in April 2026, but Windows is still the rougher platform. The yukonSilver bug remains open, the CoworkVMService Manual-startup behavior catches users after every reboot, and update regressions appear periodically. Fine for desktop tasks if your system is compatible. For workloads where downtime means lost work, a managed agent is more reliable.",[748,1722,1723],{},"html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}",{"title":52,"searchDepth":183,"depth":183,"links":1725},[1726,1727,1728,1735,1736,1739,1740,1747,1748],{"id":1287,"depth":183,"text":1288},{"id":1333,"depth":183,"text":1334},{"id":1352,"depth":183,"text":1353,"children":1729},[1730,1731,1732,1733,1734],{"id":1359,"depth":197,"text":1360},{"id":1366,"depth":197,"text":1367},{"id":1373,"depth":197,"text":1374},{"id":1380,"depth":197,"text":1381},{"id":1387,"depth":197,"text":1388},{"id":1400,"depth":183,"text":1401},{"id":1428,"depth":183,"text":1429,"children":1737},[1738],{"id":1455,"depth":197,"text":1456},{"id":1481,"depth":183,"text":1482},{"id":1509,"depth":183,"text":1510,"children":1741},[1742,1743,1744,1745,1746],{"id":1513,"depth":197,"text":1514},{"id":1549,"depth":197,"text":1550},{"id":1586,"depth":197,"text":1587},{"id":1593,"depth":197,"text":1594},{"id":1621,"depth":197,"text":1622},{"id":1636,"depth":183,"text":1637},{"id":686,"depth":183,"text":687},"2026-03-27","Claude Cowork broken on Windows? Covers 5 failure modes: yukonSilver bug, Windows Home limits, VM service stops, network conflicts, and update regressions. Step-by-step fixes.","/img/blog/claude-cowork-not-working-windows.jpg",{},"/blog/claude-cowork-not-working-windows","14 min read",{"title":1273,"description":1750},"Claude Cowork Not Working on Windows? 5 Fixes (2026)","blog/claude-cowork-not-working-windows",[1759,1760,1761,1762,1763,1764],"Claude Cowork not working Windows","Cowork Windows bugs","yukonSilver error","Claude Cowork Windows fix","Cowork Hyper-V","Cowork Windows Home","2026-05-19","Sq06kygerdAX6Y2Mxv3OPy2yjFOapLcXpUWGeKLwStI",{"id":1768,"title":1769,"author":1770,"body":1771,"category":787,"date":1765,"description":2181,"extension":790,"featured":791,"image":2182,"imageHeight":793,"imageWidth":793,"meta":2183,"navigation":795,"path":2184,"readingTime":2185,"seo":2186,"seoTitle":2187,"stem":2188,"tags":2189,"updatedDate":1765,"__hash__":2197},"blog/blog/hermes-response-truncated-fix.md","Hermes Agent \"Response Truncated Due to Output Length Limit\": 5 Causes and Fixes",{"name":7,"role":8,"avatar":9},{"type":11,"value":1772,"toc":2166},[1773,1776,1779,1790,1793,1796,1800,1806,1812,1825,1842,1858,1871,1875,1881,1887,1895,1905,1913,1917,1920,1923,1929,1945,1951,1955,1961,1967,1977,1995,1998,2002,2008,2014,2023,2028,2032,2038,2049,2059,2071,2082,2085,2093,2095,2099,2105,2109,2131,2135,2149,2153,2159,2163],[14,1774,1775],{},"The agent starts generating. Mid-sentence, it stops. \"Response truncated due to output length limit.\" The output is useless. The conversation is broken. Here are five causes from real GitHub issues and the fix for each.",[14,1777,1778],{},"A Chinese user posted on X during Labor Day weekend: \"My Hermes keeps throwing 'Response truncated due to output length limit.' I've given up on it. Let it starve.\"",[14,1780,1781,1782,1785,1786,1789],{},"That's the vibe. The error is maddening because it looks like a simple limit you should be able to increase. But ",[18,1783,1784],{},"max_tokens"," in ",[18,1787,1788],{},"config.yaml"," had no effect until recently. The compression system has a math bug that prevents it from firing. And the hardcoded output limits can drain your OpenRouter credits before the model generates a single token.",[14,1791,1792],{},"GitHub issue #7237 documents the core complaint: \"This truncates the output mid-stream, breaks the conversation flow, and prevents users from receiving complete, usable answers.\"",[14,1794,1795],{},"Here are five causes, ranked by how often they're the actual problem.",[39,1797,1799],{"id":1798},"cause-1-max_tokens-from-configyaml-never-reaches-the-api-confirmed-bug","Cause 1: max_tokens from config.yaml never reaches the API (confirmed bug)",[14,1801,1802],{},[87,1803],{"alt":1804,"src":1805},"Hermes max_tokens config bug: value set in config.yaml but never reaches API request","/img/blog/hermes-response-truncated-fix-config-bug.jpg",[14,1807,1808,1811],{},[109,1809,1810],{},"GitHub issue #4404:"," \"model.max_tokens in config.yaml has no effect. The setting is never passed to AIAgent.\"",[14,1813,1814,1817,1818,1821,1822,1824],{},[109,1815,1816],{},"What happens:"," You set ",[18,1819,1820],{},"model.max_tokens: 8192"," in your config. The agent ignores it. The API request goes out without ",[18,1823,1784],{},". The provider uses its default (often 2,048 or 4,096). Your response gets truncated at a limit you didn't set.",[14,1826,1827,1830,1831,1834,1835,1838,1839,1841],{},[109,1828,1829],{},"The bug confirmed:"," A developer patched the code to log what ",[18,1832,1833],{},"_build_api_kwargs()"," actually sends. Result: ",[18,1836,1837],{},"self.max_tokens=None",". The config value exists but the code path from ",[18,1840,1788],{}," → AIAgent → API request is broken.",[14,1843,1844,1846,1847,1850,1851,1854,1855,1857],{},[109,1845,111],{}," A community PR exists on ",[18,1848,1849],{},"fix/model-max-tokens-config"," branch. If you're on v0.13.0+, check if the fix is merged. If not, the workaround: set ",[18,1852,1853],{},"HERMES_MAX_TOKENS=8192"," as an environment variable in ",[18,1856,342],{},". The env var path works even when the config path doesn't.",[14,1859,1860,1863,1864,1867,1868,1870],{},[109,1861,1862],{},"The frustrating truth:"," The most common cause of truncation is a config value that looks correct but is silently ignored. Always verify with ",[18,1865,1866],{},"hermes chat -q \"write a 500-word essay\" --verbose"," and check the API request logs for the actual ",[18,1869,1784],{}," value sent.",[39,1872,1874],{"id":1873},"cause-2-provider-default-output-limit-is-too-low-especially-ollama","Cause 2: Provider default output limit is too low (especially Ollama)",[14,1876,1877,1880],{},[109,1878,1879],{},"What happens with Ollama:"," The default context window is 2,048 tokens. Not 8,192. Not the model's maximum. Two thousand and forty-eight. A 500-word response easily exceeds this.",[14,1882,1883,1886],{},[109,1884,1885],{},"The fix for Ollama:"," Create a Modelfile that sets the correct limits:",[47,1888,1893],{"className":1889,"code":1891,"language":1892},[1890],"language-text","FROM hermes3:8b\nPARAMETER num_ctx 8192\nPARAMETER num_predict 1024\n","text",[18,1894,1891],{"__ignoreMap":52},[14,1896,1897,1900,1901,1904],{},[18,1898,1899],{},"num_ctx"," is the total context window. ",[18,1902,1903],{},"num_predict"," is the max output tokens. Without these, Ollama defaults to 2,048 total, leaving almost no room for output after the system prompt and conversation history consume their share.",[14,1906,1907,1908,1912],{},"For the complete model configuration guide, our ",[371,1909,1911],{"href":1910},"/blog/openclaw-best-practices","best practices post"," covers per-model output limit configuration.",[39,1914,1916],{"id":1915},"cause-3-context-window-full-no-room-left-for-output","Cause 3: Context window full (no room left for output)",[14,1918,1919],{},"Here's where most people get it wrong.",[14,1921,1922],{},"The context window is shared between input and output. If your model has a 64K context window and your conversation history + system prompt + tool definitions consume 60K tokens, only 4K tokens remain for the response. The model starts generating, hits the ceiling at 4K, and truncates.",[14,1924,1925],{},[87,1926],{"alt":1927,"src":1928},"Context window math: input plus output share the same budget, leaving little room when conversation history is large","/img/blog/hermes-response-truncated-fix-context-math.jpg",[14,1930,1931,1933,1934,1937,1938,1941,1942,28],{},[109,1932,111],{}," Run ",[18,1935,1936],{},"/usage"," in your chat to see current context consumption. If you're above 70%, run ",[18,1939,1940],{},"/compress"," to summarize the conversation history and free space. Or start a new session with ",[18,1943,1944],{},"hermes chat --new",[14,1946,1947,1948,1950],{},"The official FAQ confirms: \"Use ",[18,1949,1940],{}," regularly during long sessions. It summarizes the conversation history and reduces token usage significantly while preserving context.\"",[39,1952,1954],{"id":1953},"cause-4-compression-never-triggers-the-math-bug","Cause 4: Compression never triggers (the math bug)",[14,1956,1957,1960],{},[109,1958,1959],{},"GitHub issue #14690 (P1, open):"," \"Context auto-compression never triggers when context_length equals MINIMUM_CONTEXT_LENGTH (64000).\"",[14,1962,1963,1966],{},[109,1964,1965],{},"The bug:"," The compression threshold calculation uses 64,000 as an absolute floor. If your model's context is 64K (common for local models with parallel slots), the threshold becomes 100% of the context window. Compression can't trigger because the API errors out before the threshold is reached.",[14,1968,1969,1972,1973,1976],{},[109,1970,1971],{},"The math:"," ",[18,1974,1975],{},"threshold_tokens = max(int(64000 * 0.7), 64000) = max(44800, 64000) = 64000",". Threshold = 100%. Compression never fires. Context grows until the API rejects the request. Response gets truncated.",[14,1978,1979,1981,1982,1785,1985,1987,1988,1991,1992,1994],{},[109,1980,111],{}," Set ",[18,1983,1984],{},"model.context_length",[18,1986,1788],{}," to a value above 64,000 (e.g., ",[18,1989,1990],{},"128000",") so the threshold calculation produces a meaningful percentage. Or manually run ",[18,1993,1940],{}," before the context fills.",[14,1996,1997],{},"If debugging config values that are silently ignored, Ollama defaults that nobody documents, context window math bugs, and compression thresholds that never trigger sounds like more framework internals than agent building, BetterClaw's smart context management handles all of this at the platform level. No max_tokens configuration. No compression commands. No context math. The platform manages output limits and context automatically. Free tier with 1 agent and BYOK. $19/month per agent for Pro.",[39,1999,2001],{"id":2000},"cause-5-hardcoded-max_tokens-reserves-too-many-credits-on-openrouter","Cause 5: Hardcoded max_tokens reserves too many credits on OpenRouter",[14,2003,2004],{},[87,2005],{"alt":2006,"src":2007},"OpenRouter credit reservation: Hermes requests max_tokens=64000, OpenRouter reserves full amount as collateral, balance insufficient","/img/blog/hermes-response-truncated-fix-openrouter-credits.jpg",[14,2009,2010,2013],{},[109,2011,2012],{},"GitHub issue #22879:"," \"Hermes hardcodes max_tokens to each model's maximum output (e.g., 64000 for Claude Sonnet/Haiku 4.5). OpenRouter reserves the full max_tokens as collateral before allowing the call.\"",[14,2015,2016,2018,2019,2022],{},[109,2017,1816],{}," You have $10 in OpenRouter credits. Hermes requests ",[18,2020,2021],{},"max_tokens=64000",". OpenRouter reserves $10+ worth of credits upfront. Your balance can't cover the reservation. OpenRouter returns HTTP 402 (\"requires more credits\"). The actual response would be 50-500 tokens. The reservation is 64,000.",[14,2024,2025,2027],{},[109,2026,111],{}," This is tagged as a feature request (#22879). Until it's resolved, add more credits to your OpenRouter account (enough to cover the worst-case reservation), or switch to a direct provider (Anthropic direct doesn't pre-reserve credits).",[39,2029,2031],{"id":2030},"the-diagnostic-checklist","The diagnostic checklist",[14,2033,2034],{},[87,2035],{"alt":2036,"src":2037},"Four-step Hermes truncation diagnostic checklist: verbose log check, usage command, Ollama Modelfile, context_length config","/img/blog/hermes-response-truncated-fix-checklist.jpg",[14,2039,2040,1972,2043,2045,2046,2048],{},[109,2041,2042],{},"Step 1:",[18,2044,1866],{}," ... Check the API request for actual ",[18,2047,1784],{}," value.",[14,2050,2051,1972,2054,2056,2057,28],{},[109,2052,2053],{},"Step 2:",[18,2055,1936],{}," in an active chat ... How much context is consumed? If above 70%, run ",[18,2058,1940],{},[14,2060,2061,2064,2065,2067,2068,2070],{},[109,2062,2063],{},"Step 3:"," Check your Ollama Modelfile ... Is ",[18,2066,1899],{}," set? Is ",[18,2069,1903],{}," set? Defaults are too low.",[14,2072,2073,2076,2077,1785,2079,2081],{},[109,2074,2075],{},"Step 4:"," Check ",[18,2078,1984],{},[18,2080,1788],{}," ... Is it 64,000? If so, the compression bug (#14690) applies.",[14,2083,2084],{},"The truncation error is Hermes's way of saying \"the model ran out of room.\" But \"ran out of room\" has five different causes, and the error message doesn't tell you which one. Config values that are ignored. Provider defaults that are undocumented. Context that fills silently. Compression that can't fire. Credit reservations that block the call. Same error. Five different problems.",[14,2086,2087,2088,2092],{},"If you want an agent where context management is handled automatically and you never see \"response truncated,\" ",[371,2089,2091],{"href":1175,"rel":2090},[1177],"give BetterClaw a try",". Free tier with 1 agent and BYOK. $19/month per agent for Pro. Smart context management. No truncation. No compression commands. The agent speaks. The response completes.",[39,2094,687],{"id":686},[96,2096,2098],{"id":2097},"what-does-response-truncated-due-to-output-length-limit-mean-in-hermes","What does \"Response truncated due to output length limit\" mean in Hermes?",[14,2100,2101,2102,2104],{},"It means the model's response was cut off before completion. The five causes: ",[18,2103,1784],{}," config not being sent to the API (bug #4404), provider default output limit too low (Ollama defaults to 2,048), context window full with no room for output, compression math bug preventing auto-compression (#14690), or OpenRouter credit reservation blocking the call (#22879).",[96,2106,2108],{"id":2107},"how-do-i-increase-the-output-length-in-hermes-agent","How do I increase the output length in Hermes Agent?",[14,2110,2111,2112,1785,2114,2116,2117,2120,2121,2124,2125,2127,2128,2130],{},"Set ",[18,2113,1853],{},[18,2115,342],{}," (the config.yaml path may not work due to bug #4404). For Ollama, create a Modelfile with ",[18,2118,2119],{},"PARAMETER num_ctx 8192"," and ",[18,2122,2123],{},"PARAMETER num_predict 1024",". For cloud providers, verify the ",[18,2126,1784],{}," value in verbose logs. Use ",[18,2129,1940],{}," regularly to free context space during long sessions.",[96,2132,2134],{"id":2133},"why-does-compress-not-work-in-hermes","Why does /compress not work in Hermes?",[14,2136,2137,2138,2141,2142,1785,2144,1987,2146,2148],{},"If your model's ",[18,2139,2140],{},"context_length"," equals 64,000 (the MINIMUM_CONTEXT_LENGTH constant), auto-compression never triggers due to a math bug (#14690, P1, open). The threshold calculation produces 100% of the context window, which is unreachable. Fix: set ",[18,2143,1984],{},[18,2145,1788],{},[18,2147,1990],{},") so the threshold calculation works correctly.",[96,2150,2152],{"id":2151},"why-does-hermes-use-all-my-openrouter-credits-on-one-message","Why does Hermes use all my OpenRouter credits on one message?",[14,2154,2155,2156,2158],{},"Hermes hardcodes ",[18,2157,1784],{}," to each model's maximum output (64,000 for Claude models). OpenRouter pre-reserves the full amount as credit collateral before allowing the call. Your $10 balance can't cover a 64K-token reservation even though the actual response would be 50 tokens. Fix: add more credits, or switch to a direct provider that doesn't pre-reserve. Issue #22879 tracks making this configurable.",[96,2160,2162],{"id":2161},"does-betterclaw-have-the-same-truncation-problems","Does BetterClaw have the same truncation problems?",[14,2164,2165],{},"No. BetterClaw's smart context management handles output limits, context compression, and provider-specific configurations at the platform level. There's no max_tokens to configure, no compression threshold bug, and no credit reservation mismatch. The platform manages the context window so responses complete without truncation. Free tier with 1 agent and BYOK. $19/month per agent for Pro.",{"title":52,"searchDepth":183,"depth":183,"links":2167},[2168,2169,2170,2171,2172,2173,2174],{"id":1798,"depth":183,"text":1799},{"id":1873,"depth":183,"text":1874},{"id":1915,"depth":183,"text":1916},{"id":1953,"depth":183,"text":1954},{"id":2000,"depth":183,"text":2001},{"id":2030,"depth":183,"text":2031},{"id":686,"depth":183,"text":687,"children":2175},[2176,2177,2178,2179,2180],{"id":2097,"depth":197,"text":2098},{"id":2107,"depth":197,"text":2108},{"id":2133,"depth":197,"text":2134},{"id":2151,"depth":197,"text":2152},{"id":2161,"depth":197,"text":2162},"Hermes \"Response truncated due to output length limit\" has 5 causes. Config bug, Ollama defaults, full context, compression math, OpenRouter credits. Fixes here.","/img/blog/hermes-response-truncated-fix.jpg",{},"/blog/hermes-response-truncated-fix","9 min read",{"title":1769,"description":2181},"Hermes \"Response Truncated\": 5 Causes Fixed (2026)","blog/hermes-response-truncated-fix",[2190,2191,2192,2193,2194,2195,2196],"Hermes response truncated","Hermes output length limit","Hermes Agent truncated fix","Hermes max_tokens","Hermes context window","Hermes compression bug","Hermes OpenRouter credits","9bJbGY_pISq93_jHnNmziIrOD9ipRxMD3d6ux8djbtY",1781005193688]