Pick a provider
Anthropic / OpenAI / OpenRouter / xAI / Ollama / Claude Code CLI — which one for what.
BYOK (Bring Your Own Key) — your key lives in the OS keychain (Windows Credentials / macOS Keychain / libsecret on Linux). Hypex never proxies your requests and never sees your key. Hypex Managed is the exception: sign in to your Hypex account and Claude/GPT calls go through our proxy with your credits — no API key needed.
Which to pick
| Provider | Best for | Cost | Privacy |
|---|---|---|---|
| Ollama | Fully local, offline, unlimited | $0 | 100 % local |
| Hypex Managed | Sign-in and chat with Claude — zero-config | $ (metered, deducted from your Hypex credits) | Proxied through our worker |
| Anthropic | Best reasoning, agent work (BYOK) | $$$ ($3/$15 per Mtok Sonnet) | Cloud |
| xAI (Grok) | Fast reasoning, 95 % prompt-cache hit | $$ (grok-4-fast $0.2/$0.5 per Mtok) | Cloud |
| OpenAI | GPT-5, multimodal | $$$ | Cloud |
| OpenRouter | 300+ models through one key, price routing | Varies | Cloud (via OpenRouter) |
| Claude Code CLI | Use Claude Max subscription | Included in Max | Cloud (Anthropic) |
Hypex Managed — no API key flow
Select Hypex Managed (Claude) in Ctrl+Shift+P → Hypex: Select Model Provider. The first call checks your credits balance (visible on /account); if you have none, you'll get a 402 no_credits response — top up via Stripe or ask an admin.
Cost is deducted per turn at the actual token count (input + output + cache read/write), using the real Anthropic price table. No model-specific markup at the moment — you pay the API rate.
Recommended models per provider
Anthropic
claude-opus-4-7— largest, best for architecture / hard debuggingclaude-sonnet-4-6— balanced defaultclaude-haiku-4-5— fast + cheap, good for ghost completions
xAI (Grok)
grok-4-fast-reasoning— recommended, 95 % cache hit rategrok-4— flagship
OpenAI
gpt-5— frontiergpt-5-mini— fast, good for inline completionsgpt-4o— multimodal
OpenRouter
qwen/qwen3-coder— Alibaba's best code modeldeepseek/deepseek-chat-v3.1— strong reasoning, cheapanthropic/claude-sonnet-4-6— Claude via OpenRouterz-ai/glm-4.6— Zhipu GLMmoonshotai/kimi-k2— Kimi 2 (1M context)
Ollama (local, unlimited)
Run Hypex: Diagnose Hardware & Recommend Local Model to let Hypex pick one for your machine automatically. Manual picks:
| Model | Size on disk | Min RAM / VRAM |
|---|---|---|
gemma3n:2b | ~1.3 GB | 4 GB / — |
gemma3n:4b | ~2.6 GB | 8 GB / 4 GB |
gemma5:4b newest | ~2.8 GB | 16 GB / 8 GB |
gemma5:12b newest | ~7 GB | 32 GB / 16 GB |
qwen3-coder:7b | ~4.5 GB | 16 GB / 8 GB |
phi4-mini:latest | ~2.3 GB | 8 GB / 4 GB |
gemma4:31b | ~19 GB | 64 GB / 24 GB |
Gemma 5 (April 2026) is Google's newest open model and beats Gemma 4 31B on code at under half the VRAM. Gemma 3n is the edge-optimized line for laptops.
Claude Code CLI
- No API key needed — uses your Claude Max CLI auth.
- Requires
claudebinary on PATH. - Model is whatever the CLI picks.
Switching providers
Hypex: Select Model Provider → pick. Hypex auto-picks a sensible default model for the new provider, so you don't hit "unknown model" errors after a swap.
Prompt caching
Every provider's caching is enabled where it exists:
- Anthropic — ephemeral cache breakpoints on system + tools + last user message
- xAI —
prompt_tokens_details.cached_tokens - OpenRouter — same field
- OpenAI — implicit caching when server supports it
Per-turn cost pill in the chat shows cache hits inline: $0.0006 · 3021 in / 45 out (154 cached).
Base URL overrides
For air-gapped / proxy setups:
hypex.openrouterBaseUrlhypex.xaiBaseUrlhypex.ollamaBaseUrl— accepts root or/v1suffix, auto-normalizes
Point ollamaBaseUrl at a remote GPU box and the IDE works exactly the same — everything OpenAI-compatible (llama.cpp, LM Studio, vLLM) works.