29 models claim to be "free." Seven actually work for agent tasks. Here's the ranked list of free models that work with OpenClaw, with daily limits, quality notes, and the catch for each.
Heads-up (May 19, 2026): Anthropic's April 4, 2026 policy change stopped Claude Pro/Max subscriptions from covering third-party tools like OpenClaw (pay-as-you-go required). That ban was reversed on May 13, 2026. Claude usage in OpenClaw resumes via the new "Agent SDK credit" system on June 15, 2026 — monthly non-rollover credits ($20-$200) billed at API rates. Until then, the free picks below are the working path; after June 15, you'll have Claude back as an option.
The community answers in the Discord were a mess. People recommended models that require a credit card (Gemini is "free" — with a payment method on file). People recommended local models without mentioning the $800 in hardware. People recommended OpenRouter free tiers without mentioning the 200 requests/day cap. Not all free is the same kind of free.
Here are the seven free models that genuinely work with OpenClaw, ranked by quality, daily capacity, and how "free" they actually are.
At a glance
| # | Model | Provider | Daily limit | Context | Credit card? | Best for |
|---|---|---|---|---|---|---|
| 1 | Gemini 2.5 Flash | Google AI Studio | 1,500 req/day | 1M tokens | No | Primary, highest volume |
| 2 | DeepSeek V4 Flash :free | OpenRouter | ~200 req/day | 1M tokens | No | Best free quality |
| 3 | Llama 3.3 70B | Groq | 1,000 req/day | 128K | No | Fastest inference |
| 4 | Qwen3 32B | Groq | 14,400 req/day | 32K | No | Heartbeats + high-volume routing |
| 5 | DeepSeek V4 Flash (direct) | DeepSeek | 5M one-time tokens | 1M tokens | No | Few months of light use |
| 6 | Qwen3 (local) | Ollama | Unlimited | 32K | No (hardware required) | Privacy + offline |
| 7 | Gemma 3 27B :free | OpenRouter | ~200 req/day | 128K | No | Fallback / structured tasks |
The post body below covers each entry in detail. The "recommended stack" section near the end shows how to combine 1, 2, and 4 for a resilient 1,700+ req/day setup at $0/month.
1. Google Gemini 2.5 Flash (the Undisputed Free Champion)
Daily capacity: 1,500 requests/day. 15 requests/minute. 1 million tokens per minute.
How to get it: Sign up at ai.google.dev. No credit card. API key is instant.
Why it's #1: Nothing else comes close on volume. 1,500 requests/day covers a moderate-use personal agent entirely. The quality competes with GPT-5.4 Mini on most tasks. The 1M token context window is the largest free context available. Multimodal support (images, audio, video) included.
The catch: Google's terms allow using free-tier prompts for model training. If data privacy matters, this is a real trade-off. Quality is adequate for routine tasks but noticeably below Claude and GPT-5.5 for complex reasoning.
For OpenClaw: Set your provider to Google AI and model to gemini-2.5-flash. Works out of the box. For the complete model configuration guide, our model comparison covers how to set up each provider.

2. DeepSeek V4 Flash via OpenRouter ( Endpoint)
Daily capacity: Approximately 200 requests/day. 20 requests/minute.
How to get it: Sign up at openrouter.ai. No credit card. Use model ID deepseek/deepseek-v4-flash:free.
Why it's #2: DeepSeek V4 Flash is genuinely good. 284B params (13B active), 1M context, competitive with Claude Sonnet on routine tasks. Through OpenRouter's free tier, you get it at zero cost.
The catch: 200 requests/day is enough for light personal use only. Free requests are deprioritized during peak traffic, so latency can spike unpredictably. The :free tier could change without notice.

3. Llama 3.3 70B via Groq (Fastest Free Inference)
Daily capacity: 1,000 requests/day. 30 requests/minute.
How to get it: Sign up at console.groq.com. No credit card. Instant API key.
Why it's #3: Speed. Groq's LPU hardware delivers 300+ tokens per second. The agent responds before you finish reading the previous message. Llama 3.3 70B is a strong open-weight model with good instruction following.
The catch: 6,000 tokens per minute limit (total across all requests). This is tight for agents that send long system prompts. You'll hit the TPM limit before the RPM limit on most OpenClaw configurations. Keep your SOUL.md short.
The top 3 rule: Gemini 2.5 Flash for volume (1,500/day). DeepSeek V4 Flash for quality (best model available free). Groq Llama for speed (300+ t/s). Stack all three as primary, fallback, and heartbeat model for the most resilient free setup.

4. Qwen3 32B via Groq (Highest Daily Capacity)
Daily capacity: 14,400 requests/day. 60 requests/minute.
How to get it: Same Groq account. Use model qwen3-32b.
Why it's #4: 14,400 requests/day is the highest free capacity of any model. Good for high-volume heartbeats and simple tasks. Qwen3 handles FAQ, classification, and routing well.
The catch: Quality is below the top 3 on complex reasoning. The 32B model is smaller than Llama 70B. Best used for heartbeat routing (48/day at zero cost) and simple tasks, not as primary conversational model.

5. DeepSeek V4 Flash (5M Token Grant, Direct API)
Capacity: 5 million tokens total (one-time grant on signup).
How to get it: Sign up at platform.deepseek.com. No credit card.
Why it's #5: Same excellent V4 Flash model as #2, but through DeepSeek's direct API with better reliability (no OpenRouter deprioritization). 5M tokens covers 2-11 months of light use depending on message volume.
The catch: One-time grant, not renewable. When it runs out, V4 Flash costs $0.14/$0.28 per million tokens (still nearly free, but not zero). For the complete guide to running a $0/month agent, our free agent setup post covers how to stretch the grant.
If configuring multiple free providers, managing fallback chains, and debugging rate limits across Gemini, Groq, OpenRouter, and DeepSeek sounds like more API juggling than you want, BetterClaw supports all of them from a dropdown. Paste one API key. Select the model. The platform handles routing and fallback. Free tier with 1 agent and BYOK. $19/month per agent for Pro.

6. Qwen3 via Ollama (Unlimited, Local, Hardware-Dependent)
Capacity: Unlimited. Runs on your hardware.
How to get it: Install Ollama (ollama.com). Run ollama pull qwen3. No API key. No account. No cost beyond electricity.
Why it's #6: Completely private. No data leaves your machine. No rate limits. No daily caps. Runs whatever model your hardware can fit.
Hardware requirements at a glance:
| Model | Realistic min hardware | Typical speed |
|---|---|---|
qwen3:8b (Q4) | 8GB RAM (tight) or 16GB comfortable; or 8GB GPU | ~5-15 tok/s CPU, ~40 tok/s on RTX 4060 |
qwen3:14b (Q4) | 16GB RAM minimum | ~3-8 tok/s CPU, faster with GPU offload |
qwen3:32b (Q4) | 32GB RAM or 24GB VRAM | Usable only with GPU acceleration |
qwen3-coder:30b (Q4) | 24GB+ VRAM ideal; ~250GB at full precision | GPU-required for usable speed |
The catch: Speed depends entirely on your hardware. Without a GPU, anything past 8B is too slow for conversational agents. You also need a machine running 24/7 (your laptop, a Mac mini, or a VPS) for the agent to be always-on — a VPS that fits these models starts around $20-40/month, which sometimes erases the "free" savings.

7. Gemma 3 27B via OpenRouter ( Endpoint)
Daily capacity: Approximately 200 requests/day. 20 requests/minute.
How to get it: Same OpenRouter account as #2. Use model google/gemma-3-27b:free.
Why it's #7: Google's open-weight model. Good for classification, extraction, and structured tasks. Smaller than Llama 70B but faster on the free tier.
The catch: Same OpenRouter free tier limitations (deprioritized, variable latency, 200/day). Quality is below the top 3 for conversational tasks. Best as a fallback model, not a primary.

Models we tested and rejected
The "29 models tested" claim isn't rhetorical. Here's a representative sample of options that didn't make the cut and why:
| Model / provider | Why it failed |
|---|---|
| Mistral Small (La Plateforme free) | Tool calling unreliable on multi-step OpenClaw agent flows; quota burns fast on free credit |
| Cohere Command-R (trial) | Credit card required to enable API access despite "free trial" framing |
| Together.ai free credits | $1 free credit equivalent runs out in under a day on Llama 3.x routing |
| Hugging Face Inference free | RPM too tight (often single-digit) for agent loops |
| Cloudflare Workers AI free | Tool-call format support varies by model; many free options error on OpenClaw skill calls |
| OpenAI free tier | Discontinued mid-2025; no permanent free tier remains in May 2026 |
| GLM-5.1 / GLM-5-Turbo (Z.ai trial) | Models work, but community reports thinking-loop and gibberish-output behavior on agent workflows |
| Kimi K2 free tier | 1,000 req/day cap, but token-based billing (input + output + cached) drains quota faster than expected on long-context agent sessions |
The seven that made the list all share three properties: no credit card to sign up, working tool-call format on OpenClaw, and a daily cap that survives a normal personal agent's workload.
Data privacy: what each free provider does with your prompts
The cost of "free" sometimes shows up in the privacy column. Quick reference:
| Provider | Data handling on free tier |
|---|---|
| Google AI Studio (Gemini) | Free-tier prompts may be used to improve Google's models (training). Paid Vertex AI does not. |
OpenRouter :free | Routes to underlying providers; data handling varies per upstream. The free endpoint specifically can log requests for moderation. |
| Groq | Inputs are not used to train Groq's hosted models, but Groq serves third-party open-weights — check the model's own license/terms. |
| DeepSeek (direct API) | Stored on DeepSeek's infrastructure in China. Data residency is the relevant concern for compliance-sensitive workloads, not training. |
| Ollama (local) | Nothing leaves your machine. The clearest privacy story of the seven. |
If you're processing proprietary code, customer PII, or anything subject to compliance, default to Ollama or a paid (non-training) tier for those workloads, and reserve free API tiers for low-sensitivity work.
The free model strategy that actually works
Don't pick one. Stack three.
The recommended free stack
- Primary — Gemini 2.5 Flash (1,500 req/day, good quality)
- Fallback — DeepSeek V4 Flash via OpenRouter
:free(~200 req/day, kicks in when Gemini hits its cap)- Heartbeat — Qwen3 32B via Groq (48 heartbeats/day out of a 14,400/day budget)
Total daily capacity: 1,700+ requests across three providers Monthly cost: $0Credit card: none required on any of the three
The quality trade-off is real. Claude Opus 4.7 ($5/$25 per million tokens) and GPT-5.5 ($5/$30 per million tokens) are measurably better on complex multi-step reasoning. But for personal agents handling Q&A, email drafts, scheduling, and FAQ, the free stack lands somewhere around 80-85% of Claude quality on the 80% of tasks that don't need Claude. That's the community consensus since the Anthropic ban.
How to set up the recommended stack in OpenClaw
A minimal models.providers config that wires up all three:
models:
providers:
google:
apiKey: "${env:GOOGLE_API_KEY}"
models:
- id: gemini-2.5-flash
contextWindow: 1000000
openrouter:
apiKey: "${env:OPENROUTER_API_KEY}"
models:
- id: deepseek/deepseek-v4-flash:free
contextWindow: 1000000
groq:
apiKey: "${env:GROQ_API_KEY}"
models:
- id: qwen3-32b
contextWindow: 32768
agent:
model:
primary: google/gemini-2.5-flash
fallback: openrouter/deepseek/deepseek-v4-flash:free
heartbeat: groq/qwen3-32b
Then export the three keys (GOOGLE_API_KEY, OPENROUTER_API_KEY, GROQ_API_KEY — get them at ai.google.dev, openrouter.ai, and console.groq.com respectively, all without a credit card), restart the gateway, and run openclaw doctor --deep to verify all three providers respond.
What happens when you hit the limit mid-conversation
The behaviour depends on whether fallback is configured. With the stack above:
- Primary hits its cap — the gateway logs a 429 from Google, automatically routes the next request to the OpenRouter
:freeDeepSeek fallback, and continues the conversation. The user usually doesn't notice. - Both primary and fallback are exhausted (rare on this stack) — OpenClaw returns the 429 to the chat surface and pauses. Wait for the per-minute window to reset (60-90 seconds typically), or upgrade one provider to paid.
- Heartbeat-only provider is exhausted — heartbeats stop firing until the next reset; conversations still flow through the primary/fallback path.
Without a fallback defined, the agent just errors out when the primary hits its cap. That's why stacking three providers matters even though Gemini's 1,500/day looks like more than enough.
If you'd rather not juggle three API keys and a YAML config, BetterClaw supports all of them from a dropdown. Paste one key per provider, pick the model, the platform handles routing and fallback. Free plan with 1 agent and BYOK — use any of these seven models at $0. $19/month per agent for Pro when you need more. Start free.
Frequently Asked Questions
What is the best free model for OpenClaw in 2026?
Google Gemini 2.5 Flash is the best overall free model for OpenClaw. It offers 1,500 requests/day with no credit card, a 1M token context window, and quality competitive with GPT-5.4 Mini. For higher quality at lower daily volume, DeepSeek V4 Flash via OpenRouter's free tier (:free endpoint) provides 200 requests/day with better reasoning capability.
Can I run OpenClaw for free without a credit card?
Yes. Three providers offer free API access with no credit card: Google AI Studio (Gemini 2.5 Flash, 1,500/day), Groq (Llama 3.3 70B, 1,000/day), and OpenRouter (29+ free models, ~200/day). DeepSeek also gives 5M free tokens on signup without a credit card. Combined with BetterClaw's free tier (1 agent, hosting included, BYOK), you can run a complete agent at $0/month.
How many messages can a free OpenClaw agent handle per day?
With stacked free tiers: 1,700+ messages/day (Gemini 1,500 + OpenRouter 200). With a single provider: 200-1,500/day depending on which free tier you use. Groq's Qwen3 32B offers 14,400/day but with lower quality. For comparison, a typical personal agent processes 20-50 messages/day, well within any single free tier.
Are free models good enough for real agent tasks?
For routine tasks (Q&A, FAQ, email drafts, scheduling): yes. Free models deliver 80-85% of Claude quality on predictable, well-defined tasks. For complex reasoning, creative writing, and multi-step research: no. Claude Opus 4.7 ($5/$25/M) and GPT-5.5 ($5/$30/M) are measurably better. Most personal agents handle routine tasks 80%+ of the time.
What's the catch with free AI models?
Three catches: daily rate limits (200-1,500 requests/day), data privacy (Google AI Studio may use your prompts for training), and latency (OpenRouter free tiers are deprioritized during peak hours). Local models via Ollama avoid all three catches but require $400-2,000+ in hardware. The cheapest paid option after free tiers is DeepSeek V4 Flash at $0.14/$0.28/M tokens.
How do I set up multiple free models in OpenClaw?
In your models.providers config, define one block per provider (Google, OpenRouter, Groq) each with its own API key and model list. Then in agent.model, set primary to the Gemini model, fallback to the OpenRouter :free model, and heartbeat to the Groq Qwen3-32B model. See the "How to set up the recommended stack" section above for a working YAML block. The three keys come from ai.google.dev, openrouter.ai, and console.groq.com respectively — none requires a credit card. Run openclaw doctor --deep after restart to verify all three respond.
Can I use free models on BetterClaw without a credit card?
Yes. BetterClaw's free plan includes 1 agent with BYOK. Bring your Gemini, DeepSeek, Groq, or OpenRouter key (any of the seven providers above) and the platform handles routing and fallback from a dropdown — no YAML to maintain, no credit card required for the free plan. Pro at $19/month per agent adds multiple agents and managed extras.




