/ docs

Pick a provider

Anthropic / OpenAI / OpenRouter / xAI / Ollama / Claude Code CLI — which one for what.

▶

Pick a provider — walkthrough

Video walkthrough coming soon.

Pick a provider — walkthrough

BYOK (Bring Your Own Key) — your key lives in the OS keychain (Windows Credentials / macOS Keychain / libsecret on Linux). Hypex never proxies your requests and never sees your key. Hypex Managed is the exception: sign in to your Hypex account and Claude/GPT calls go through our proxy with your credits — no API key needed.

Which to pick

Provider	Best for	Cost	Privacy
Ollama	Fully local, offline, unlimited	$0	100 % local
Hypex Managed	Sign-in and chat with Claude — zero-config	$ (metered, deducted from your Hypex credits)	Proxied through our worker
Anthropic	Best reasoning, agent work (BYOK)	$$$ ($3/$15 per Mtok Sonnet)	Cloud
xAI (Grok)	Fast reasoning, 95 % prompt-cache hit	$$ (grok-4-fast $0.2/$0.5 per Mtok)	Cloud
OpenAI	GPT-5, multimodal	$$$	Cloud
OpenRouter	300+ models through one key, price routing	Varies	Cloud (via OpenRouter)
Claude Code CLI	Use Claude Max subscription	Included in Max	Cloud (Anthropic)

Hypex Managed — no API key flow

Select Hypex Managed (Claude) in Ctrl+Shift+P → Hypex: Select Model Provider. The first call checks your credits balance (visible on /account); if you have none, you'll get a 402 no_credits response — top up via Stripe or ask an admin.

Cost is deducted per turn at the actual token count (input + output + cache read/write), using the real Anthropic price table. No model-specific markup at the moment — you pay the API rate.

Recommended models per provider

Anthropic

claude-opus-4-7 — largest, best for architecture / hard debugging
claude-sonnet-4-6 — balanced default
claude-haiku-4-5 — fast + cheap, good for ghost completions

xAI (Grok)

grok-4-fast-reasoning — recommended, 95 % cache hit rate
grok-4 — flagship

OpenAI

gpt-5 — frontier
gpt-5-mini — fast, good for inline completions
gpt-4o — multimodal

OpenRouter

qwen/qwen3-coder — Alibaba's best code model
deepseek/deepseek-chat-v3.1 — strong reasoning, cheap
anthropic/claude-sonnet-4-6 — Claude via OpenRouter
z-ai/glm-4.6 — Zhipu GLM
moonshotai/kimi-k2 — Kimi 2 (1M context)

Ollama (local, unlimited)

Run Hypex: Diagnose Hardware & Recommend Local Model to let Hypex pick one for your machine automatically. Manual picks:

Model	Size on disk	Min RAM / VRAM
`gemma3n:2b`	~1.3 GB	4 GB / —
`gemma3n:4b`	~2.6 GB	8 GB / 4 GB
`gemma5:4b` newest	~2.8 GB	16 GB / 8 GB
`gemma5:12b` newest	~7 GB	32 GB / 16 GB
`qwen3-coder:7b`	~4.5 GB	16 GB / 8 GB
`phi4-mini:latest`	~2.3 GB	8 GB / 4 GB
`gemma4:31b`	~19 GB	64 GB / 24 GB

Gemma 5 (April 2026) is Google's newest open model and beats Gemma 4 31B on code at under half the VRAM. Gemma 3n is the edge-optimized line for laptops.

Claude Code CLI

No API key needed — uses your Claude Max CLI auth.
Requires claude binary on PATH.
Model is whatever the CLI picks.

Switching providers

Hypex: Select Model Provider → pick. Hypex auto-picks a sensible default model for the new provider, so you don't hit "unknown model" errors after a swap.

Prompt caching

Every provider's caching is enabled where it exists:

Anthropic — ephemeral cache breakpoints on system + tools + last user message
xAI — prompt_tokens_details.cached_tokens
OpenRouter — same field
OpenAI — implicit caching when server supports it

Per-turn cost pill in the chat shows cache hits inline: $0.0006 · 3021 in / 45 out (154 cached).

Base URL overrides

For air-gapped / proxy setups:

hypex.openrouterBaseUrl
hypex.xaiBaseUrl
hypex.ollamaBaseUrl — accepts root or /v1 suffix, auto-normalizes

Point ollamaBaseUrl at a remote GPU box and the IDE works exactly the same — everything OpenAI-compatible (llama.cpp, LM Studio, vLLM) works.