Top-tier open-weight model in its active-parameter band for agentic code and repo-scale reasoning. Native function-calling and tool-use, optimised for instruction-following without thinking mode. Pairs with the Vision Specialist when a query mixes screenshots of code with text. Leads its size class on SWE-bench Verified.
A nested mixture of mixtures.
One AI model — itself a Mixture-of-Experts — directing five other AI models, each one also a Mixture-of-Experts. The lead model decides which specialists should answer each query, weighs its confidence in each, and synthesises their outputs into a single coherent response. One MoE on top, five MoEs underneath, one answer at the bottom.
Mechanically: a Qwen3-class sparse MoE Conductor with 128 internal experts dispatches every query across five specialist MoE models. Each specialist then runs its own internal token-level gating across its own experts. Their outputs are combined, weighted, and synthesised by the Conductor. A mixture of experts, directing a mixture of mixtures of experts.
Two orchestration tiers of gating, one inherited.
The framework's job is coarse-grained routing across four specialist clusters (L0) plus an always-on fast path. Every selected MoE model then performs its own learned token-level gating (L2) — inherited automatically from choosing sparse-activation experts. This is hierarchical MoE lifted from the layer level to the system level.
The top-level gate.
The Conductor is the only model that reads every query. It is itself a sparse Mixture-of-Experts with 128 internal experts and top-8 routing — so even the routing decision is computed by an internal expert gate, not a dense forward pass. Chosen specifically because its ~3.3B active parameters make it cheap enough to call on every request without dominating latency. It emits a structured softmax distribution across the specialist clusters — never a hard pick.
Six specialists, all open-weight.
Every specialist except the fast-path is itself a Mixture-of-Experts. The pool is drawn from the state of the open-weight ecosystem (Qwen3 family in the current rotation) and rotated as better options ship. What is stable is the shape of the pool: role, architecture class, and the behaviour each role must guarantee — specific checkpoints are an implementation detail that evolves with the frontier.
Hybrid Gated DeltaNet + Gated Attention MoE in a 3:1 layer ratio, tuned for chain-of-thought reasoning. Despite 80B total parameters only ~3B activate per token, making it cheaper per request than its size implies. Runs an explicit thinking pass before committing to an answer — the reasoning trace stays internal unless a user asks to see it.
Dedicated vision-language specialist with the vision encoder kept intact at full precision. Handles images, OCR, charts, screenshots, and visual Q&A natively — no external OCR pipeline sits between the image and the response. A dedicated role means vision queries never compete with text queries for the same weights.
Built for RAG payloads, full codebases, long transcripts, multi-document summaries. Native 256K context extended to 1M via YaRN scaling. Shares its weight topology with the Conductor but serves a completely different role — retrieval and long-doc synthesis, not classification.
Deliberately dense, not MoE — dense models have lower first-token latency, which matters for the parallel shadow role. Fires on every request alongside L0. For trivial queries the Conductor routes directly to this path and skips synthesis. For complex queries its output is attached to the synthesis prompt as additional context, never used as a blind fallback.
Four routable clusters, one always-on.
The five specialists are grouped into five clusters. Four are L0-routable (code, reasoning, vision, long-ctx). The fifth — fast — is never routed to; it simply fires on every request, outside the capacity budget. This keeps cost accounting clean: capacity is bounded on the routable side, and the shadow path is a constant.
Six MoE techniques, lifted to orchestration.
Six well-known techniques from Mixture-of-Experts serving literature, adapted honestly to system-level orchestration. Each maps a genuine parallel from the training / inference literature onto a concrete system behavior.
Private. Open-weight. User-funded.
Kryven is funded directly by its users. Subscription revenue pays for the three things that make the product better: stronger models, tighter latency, and cleaner product experience. No ads, no data sales, no upstream vendor deciding what you can or can't ask.