/ docs

Pick a provider

Anthropic / OpenAI / OpenRouter / xAI / Ollama / Claude Code CLI — which one for what.

Pick a provider — walkthrough
Video walkthrough coming soon.
Pick a provider — walkthrough

BYOK (Bring Your Own Key) — your key lives in the OS keychain (Windows Credentials / macOS Keychain / libsecret on Linux). Hypex never proxies your requests and never sees your key. Hypex Managed is the exception: sign in to your Hypex account and Claude/GPT calls go through our proxy with your credits — no API key needed.

Which to pick

ProviderBest forCostPrivacy
OllamaFully local, offline, unlimited$0100 % local
Hypex ManagedSign-in and chat with Claude — zero-config$ (metered, deducted from your Hypex credits)Proxied through our worker
AnthropicBest reasoning, agent work (BYOK)$$$ ($3/$15 per Mtok Sonnet)Cloud
xAI (Grok)Fast reasoning, 95 % prompt-cache hit$$ (grok-4-fast $0.2/$0.5 per Mtok)Cloud
OpenAIGPT-5, multimodal$$$Cloud
OpenRouter300+ models through one key, price routingVariesCloud (via OpenRouter)
Claude Code CLIUse Claude Max subscriptionIncluded in MaxCloud (Anthropic)

Hypex Managed — no API key flow

Select Hypex Managed (Claude) in Ctrl+Shift+PHypex: Select Model Provider. The first call checks your credits balance (visible on /account); if you have none, you'll get a 402 no_credits response — top up via Stripe or ask an admin.

Cost is deducted per turn at the actual token count (input + output + cache read/write), using the real Anthropic price table. No model-specific markup at the moment — you pay the API rate.

Recommended models per provider

Anthropic

  • claude-opus-4-7 — largest, best for architecture / hard debugging
  • claude-sonnet-4-6 — balanced default
  • claude-haiku-4-5 — fast + cheap, good for ghost completions

xAI (Grok)

  • grok-4-fast-reasoning — recommended, 95 % cache hit rate
  • grok-4 — flagship

OpenAI

  • gpt-5 — frontier
  • gpt-5-mini — fast, good for inline completions
  • gpt-4o — multimodal

OpenRouter

  • qwen/qwen3-coder — Alibaba's best code model
  • deepseek/deepseek-chat-v3.1 — strong reasoning, cheap
  • anthropic/claude-sonnet-4-6 — Claude via OpenRouter
  • z-ai/glm-4.6 — Zhipu GLM
  • moonshotai/kimi-k2 — Kimi 2 (1M context)

Ollama (local, unlimited)

Run Hypex: Diagnose Hardware & Recommend Local Model to let Hypex pick one for your machine automatically. Manual picks:

ModelSize on diskMin RAM / VRAM
gemma3n:2b~1.3 GB4 GB / —
gemma3n:4b~2.6 GB8 GB / 4 GB
gemma5:4b newest~2.8 GB16 GB / 8 GB
gemma5:12b newest~7 GB32 GB / 16 GB
qwen3-coder:7b~4.5 GB16 GB / 8 GB
phi4-mini:latest~2.3 GB8 GB / 4 GB
gemma4:31b~19 GB64 GB / 24 GB

Gemma 5 (April 2026) is Google's newest open model and beats Gemma 4 31B on code at under half the VRAM. Gemma 3n is the edge-optimized line for laptops.

Claude Code CLI

  • No API key needed — uses your Claude Max CLI auth.
  • Requires claude binary on PATH.
  • Model is whatever the CLI picks.

Switching providers

Hypex: Select Model Provider → pick. Hypex auto-picks a sensible default model for the new provider, so you don't hit "unknown model" errors after a swap.

Prompt caching

Every provider's caching is enabled where it exists:

  • Anthropic — ephemeral cache breakpoints on system + tools + last user message
  • xAIprompt_tokens_details.cached_tokens
  • OpenRouter — same field
  • OpenAI — implicit caching when server supports it

Per-turn cost pill in the chat shows cache hits inline: $0.0006 · 3021 in / 45 out (154 cached).

Base URL overrides

For air-gapped / proxy setups:

  • hypex.openrouterBaseUrl
  • hypex.xaiBaseUrl
  • hypex.ollamaBaseUrl — accepts root or /v1 suffix, auto-normalizes

Point ollamaBaseUrl at a remote GPU box and the IDE works exactly the same — everything OpenAI-compatible (llama.cpp, LM Studio, vLLM) works.