Pillar 02 — Rails

The plumbing for serious AI work.

Frontier models are extraordinary. The systems they live inside are usually duct-taped together. We build the durable, well-documented infrastructure that makes agentic software act like real software.

Boring is good. Boring stays up at 3 a.m.

The Model Context Protocol, command-line tooling, integration adapters, and gateway layers are the boring, load-bearing parts of an AI stack. They are not glamorous, and they are not where most of the attention is going. They are also where most production AI systems fall over.

The next wave of useful AI software will be defined less by the model at the center and more by the quality of the rails around it. The rails are currently underbuilt. We are filling that gap.

What we work on

Eight focus areas in the rails layer.

01

Project memory.

Persistent context that survives sessions, models, and team turnover. The rails remember the why behind decisions, not just the what.

02

MCP servers and registries.

Robust, well-documented MCP servers and the discovery layer around them. Built to be operated, not just demoed.

03

CLI tools for AI workflows.

Command-line interfaces that let developers compose agentic operations the way they compose any other software.

04

Integration adapters.

Connecting agents to the rest of an organization's software without the usual brittleness. Adapters that survive deploys.

05

Observability and evaluation.

Production-grade visibility into what agents are actually doing. Evaluation that runs in CI, not just on a deck.

06

Tool registries and discovery.

The catalog layer for the growing universe of agent tools. Discoverable, versioned, and trustworthy by default.

07

Agent operations.

Observability, debugging, and replay for agentic systems. The boring tooling that makes production-grade work possible.

08

Multi-agent coordination.

The traffic control that keeps multiple agents from stepping on each other. Less collision, more flow.

Featured incubations

A glimpse at the workbench.

Cost Control — token budget for the week A spend-monitoring dashboard: a single headline budget meter at 60% and a small per-model breakdown table below it. Hairline strokes; one Spectrum-purple accent on the active budget bar. FIG · 04 COST CONTROL Token budget · this week. Run as hot as you like — until the line says stop. SPEND TO DATE $4,218 / $7,000 SOFT CAP · 80% 60.3% USED · 3.2 DAYS LEFT · PROJECTED $6,140 PROJECTED · 7-DAY CAP MON THU SUN MODEL TOKENS SPEND VS LAST WEEK SHARE model · pro 12.4 M $1,840 – 18% 44% model · lite 38.1 M $1,144 – 9% 27% embeddings 214 M $842 + 4% 20% open-source · local 96 M $392 – 22% 9% AUDITED PER CALL · POLICY: STOP-AT-CAP, RE-ROUTE TO HAIKU ABOVE 80%
Infrastructure

Cost Control

Contextual cost auditing and budgeting that keeps AI expenses in check while letting your agents run at full tilt.

In incubation
MCP Gateway — one entrance, many servers, strict rules Editorial systems diagram on a light Canvas surface. Multiple AI clients on the left route through a single MCP gateway that enforces auth, governance, rate limiting and audit. Behind the gateway, several independent MCP servers expose tools. Single Spectrum-purple accent on the gateway. FIG · 02 MCP GATEWAY One entrance. Many servers. Strict rules. CLIENTS · 3 AGENT agent · desktop AGENT agentctl · cli APP internal · ops MCP · GATEWAY gainwix/gateway AUTH · IDENTITY OAuth · per-agent tokens · SSO POLICY · RBAC Per-tool allow-lists, scopes RATE · BUDGET Quotas, soft caps, retries AUDIT · TRACE Every call logged · replayable HEALTHY · 4,182 RPM · 0.6% REJECTED MCP SERVERS · 6 DB postgres · prod FS files · s3 bucket GIT acme/api CAL calendar DOCS notion · ops API stripe · pending review EVERY CALL · AUTHENTICATED, AUTHORISED, RATE-LIMITED, AUDITED ONE GATEWAY · ONE LEDGER
Infrastructure

MCP Gateway

A secure and scalable MCP gateway with built-in observability and governance.

In incubation
Agent runs — today An agent operations dashboard rendered as a real product screenshot: dark Ink background, list of agent runs with timestamps, durations, replay icons. One row highlighted with a thin Aurora-teal underline. AGENT-OPS · TODAY Agent runs · today. FILTER · ALL AGENTS STATUS · ANY WINDOW · LAST 12H 14 RUNS · 1.8K STEPS AGENT TASK STARTED DURATION STATUS REPLAY researcher Pull Q3 competitor pricing 09:04:12 42s done drafter Compose Q3 brief 09:06:31 2m 14s done reviewer Review Q3 brief §1 09:08:48 0m 38s running researcher Source check · pricing 09:12:02 31s retry · 2 drafter Tighten §2 pricing 09:18:55 1m 02s queued orchestrator Compose §3 / risks scheduled PRESS R TO REPLAY · ↑↓ TO NAVIGATE · ⏎ TO OPEN
Infrastructure

Agent Ops

Observability and replay for production agentic systems.

In incubation
agentctl — command-line for agent runs A clean terminal window on a dark surface running an agentctl workflow. Single Aurora-teal accent on the prompt symbol $. ~/projects/release-brief — agentctl $ agentctl run drafter --task "Q3 release brief" › loading agent · drafter@1.4.2 › context · 12 docs · 4 prior runs › plan · 5 steps { "run": "0a8c2f", "agent": "drafter", "steps": [ { "name": "outline", "status": "done" }, { "name": "section_1", "status": "done" }, { "name": "section_2", "status": "running" } › step 3/5 · section_2 · pricing 62% · 0:34 elapsed · ~0:20 left › step 4/5 · review · queued › step 5/5 · package · pending $ agentctl tail --follow agentctl 1.4.2 · configured for studio/main · audited CTRL-C TO QUIT
Infrastructure

agentctl

A command-line interface for composing agent operations.

In incubation
Evaluation harness — benchmark dashboard A data-viz dashboard rendered on a dark Ink surface: two thin evaluation line charts and a small benchmark table. Hairline gridlines on dark; one metric highlighted in Spectrum-light purple. FIG · 03 EVAL HARNESS Six benchmarks. Two weeks. One thread. A · FAITHFULNESS 0.872 + 4.2% · 14 D 0.90 0.85 0.80 0.75 0.70 14 D AGO 7 D TODAY B · TIME TO USEFUL ANSWER 1.84 s – 31% · 14 D · headline metric 3.0 s 2.5 s 2.0 s 1.5 s 1.0 s 14 D AGO 7 D TODAY RUN · 0a8c2f · 14:02 UTC summarization 0.91 ↗ + 2.1% passes tool-use 0.84 → – 0.3% passes citation 0.79 ↗ + 5.0% passes refusals 0.98 → + 0.1% passes
Infrastructure

Eval Harness

Evaluation that runs in CI and surfaces regressions before they ship.

In incubation

Building infrastructure?

If you are building the rails for AI — or running into the limits of someone else's — we would like to compare notes.

Contact the studio