OpenAI
GPT-class models for general reasoning, function calling and structured output. Default for most agent loops where speed and tool reliability matter.
- ChatGPT
- GPT-5
- OpenAI Codex
- DALL·E
- Whisper
- Sora
OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, Kimi, GLM, Llama, Mistral — plus image gen, video gen and voice. Routed per task, not per vendor relationship. We pick the model that wins the step. You see one approval queue.
Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.
Your AI employee routes across the closed-source frontier — OpenAI (ChatGPT, GPT-5, Codex, DALL·E, Whisper, Sora), Anthropic (Claude, Claude Code), Google DeepMind (Gemini, Veo, Imagen), xAI (Grok), Cohere — and the open-weight frontier — Meta (Llama), Mistral, DeepSeek, Alibaba (Qwen), Moonshot (Kimi), Zhipu (GLM).
For media, it routes through Midjourney, Black Forest Labs (Flux), Ideogram and Stable Diffusion for image; Runway, Sora, Pika, Luma and Kling for video; ElevenLabs, Cartesia, Deepgram and AssemblyAI for voice.
The point is not which logo we use. The point is that the choice is made per step, by us, and changes the day the leaderboard does — without touching your workflow.
Six layers: closed-source frontier labs, open-weight model labs, image generation, video generation, voice and speech, and the developer surfaces our team and your AI employee actually run in. Every layer rotates. Your workflow doesn’t.
Background: what “managed AI” means in practice.
The big four. We mix-and-match because no single provider wins every task. Reasoning, coding, long-context recall, tool use and multilingual nuance each have a different leader, and that leaderboard moves every quarter.
GPT-class models for general reasoning, function calling and structured output. Default for most agent loops where speed and tool reliability matter.
Claude models for long-context document work, careful instruction-following and code review. Strong fit for review-before-send approval gates.
Gemini for very long context windows (1M+ tokens), Workspace-native tasks and multimodal pipelines. Veo for video, Imagen for image, Gemma open-weight for self-host.
Grok models for real-time-feed-aware tasks where current-day information from public posts changes the answer.
Best-in-class embeddings and rerankers for retrieval. The quiet half of every accurate RAG pipeline.
Open-weight models we self-host (or rent on serverless inference) when a workflow needs zero data egress, EU sovereignty, GPU economics or a fallback path that isn’t a US API call. The Chinese labs are now genuinely competitive on reasoning and code.
Llama is the open-weight workhorse — biggest community, deepest fine-tuning ecosystem, easiest to run on customer infrastructure.
European frontier lab. Strong open-weight options for customers who need EU data residency or want a sovereign-cloud fallback path.
Open-weight reasoning models that hit GPT-class quality at a fraction of the unit cost. Used for high-volume background jobs where latency forgives a few extra steps.
Top-tier open-weight family for multilingual reasoning, code and vision-language. Often the strongest non-English option.
Kimi-class models with extreme long-context windows and competitive reasoning. Useful when the workflow needs to reason over hundreds of pages in one shot.
GLM family from Zhipu AI. Strong bilingual reasoning and code; popular for cost-sensitive Asian-market deployments.
Marketing assets, product mockups, hero imagery and on-the-fly visual drafts. We route by style: photoreal vs. illustration vs. typography-heavy vs. brand-consistent.
Aesthetic ceiling for moodboards and brand-led imagery. Default for hero shots and concept work.
Flux models for fast photoreal generation with reliable prompt adherence. Strong API economics for batch jobs.
Best-in-class typography rendering — actually spells the words right. Used for posters, labels and any image with copy in it.
Stable Diffusion family. Open-weight option for customers who need image generation inside their own VPC.
Short-form video, product demos, talking-head explainers and B-roll. Generative video is now reliable enough for ad creative and onboarding loops — picked per use case.
The most-used end-to-end video tool. Gen-3 for generation, plus the timeline and editor primitives to actually finish a cut.
OpenAI’s text-to-video model. Strongest physics + temporal coherence on cinematic clips.
Fast iteration on social-format clips and effects. Good fit for short-form ad creative pipelines.
Dream Machine and Ray for camera-aware generation with controllable motion paths.
Chinese frontier video lab. Strong character consistency and motion realism on 10-second clips.
Speech-to-text for meeting transcripts and call analytics, text-to-speech for outbound voicebots and IVR replacement. Latency and naturalness vary widely by vendor — picked per call type.
Default text-to-speech for natural English narration, voice cloning and multilingual delivery. Studio-grade output.
Sub-100ms text-to-speech for real-time voice agents. The latency leader when a customer is on the other end of a phone call.
Streaming speech-to-text with strong word-error-rate on noisy phone audio. Default for live transcription.
Async transcription with speaker diarisation, sentiment and entity tagging. Used for call-recording analysis pipelines.
The surfaces our team and our agents actually run in. Picked for tool-calling reliability, edit precision and audit trails — not for chat aesthetics.
OpenAI’s consumer + API surface. Used inside the loop for fast drafts, JSON-mode tool calls, and operator-friendly transcripts.
Anthropic’s assistant surface. Default for nuanced writing, contract review and any task where “did it actually follow the brief” matters.
Anthropic’s CLI coding agent. We use it daily to ship the platform itself — including the page you’re reading.
Google’s assistant. Wins on Workspace context — Gmail, Drive, Calendar — without a brittle integration layer.
Cited-answer search surface. Used for up-to-date research steps where the agent needs to footnote its sources before drafting.
Editor-side coding agent. Pairs well with Claude Code for refactor-heavy work where AST awareness beats raw chat speed.
Where the platform code lives. Copilot, Actions and PR review are all part of the daily loop.
Source of truth for open-weight model evaluations and the registry we pull from when self-hosting Llama, Qwen, DeepSeek, Kimi or GLM.
+ THE LEADERBOARD MOVES EVERY QUARTER.
When a new model lands and beats the incumbent on your workflow, we route to it. You don’t re-procure, re-contract or re-train your team. That’s the job.
Multi-model
Routing is per-step, not per-customer. Reasoning steps go to Claude, GPT-5 or Kimi. High-volume classification goes to a smaller, faster open-weight model. Long-context recall goes to Gemini. Embeddings go to Cohere. Image gen goes to Flux or Midjourney. Voice goes to Cartesia or ElevenLabs. Your agent sees one interface; the cost sheet sees the cheapest answer that meets the bar.
Lab-independent
A workflow built on a single vendor breaks the day that vendor’s next release regresses on your task. We abstract the model behind the workflow so we can swap providers in a week, not a quarter, when the leaderboard shifts. Open-weight options are a real fallback, not a slide.
Approval-gated
The model is never the last step. A reviewer sees the draft, the source citations and the diff before anything sends. The same gate works whether the underlying call hit Claude, GPT, Gemini, DeepSeek, Qwen or a self-hosted Llama.
Tell us the workflow. We will tell you which models we would route it through this quarter, and what we would swap if next quarter looks different.
Map my workflowYour AI employee routes across the closed-source frontier (OpenAI GPT-5, Anthropic Claude, Google Gemini, xAI Grok, Cohere) and the open-weight frontier (Meta Llama, Mistral, DeepSeek, Qwen, Moonshot Kimi, Zhipu GLM). For media, it routes to Midjourney, Flux, Ideogram or Stable Diffusion for image; Runway, Sora, Pika, Luma or Kling for video; ElevenLabs, Cartesia, Deepgram or AssemblyAI for voice. The choice is per step.
No single lab leads on every task. Reasoning, code editing, long-context recall, retrieval reranking, image generation and voice latency each have a different best-in-class option, and the leaderboard shifts every quarter. Routing per-step is cheaper and more accurate than picking one vendor and hoping.
They are best at different things. OpenAI’s GPT-5 leads on tool-calling and structured output. Anthropic’s Claude leads on long-form writing, code review and instruction-following. Google’s Gemini leads on long-context recall and Workspace-native tasks. A production AI employee uses all three on different steps of the same workflow.
Yes — heavily. DeepSeek-V3 and DeepSeek-R1 are part of the routing menu for high-volume background jobs. Qwen3 is the default for multilingual workflows. Kimi K2 handles very long context. GLM is an option for cost-sensitive Asian-market deployments. Llama and Mistral are options when the customer requires self-hosted inference inside their own VPC. We pull weights from Hugging Face and host on your cloud or ours.
All of the above. Image generation routes through Midjourney, Black Forest Labs (Flux), Ideogram or Stable Diffusion. Video routes through Runway, Sora, Pika, Luma or Kling. Voice routes through ElevenLabs (TTS), Cartesia (low-latency real-time TTS), Deepgram and AssemblyAI (STT).
Yes. If you already have OpenAI, Anthropic, Google, Azure OpenAI or Hugging Face contracts you want the workflow to run under, we wire them in directly. Otherwise the work runs through pooled provider accounts and you pay for the workflow, not the tokens.
A wrapper exposes one model behind a UI. A managed AI employee wraps a workflow — model selection per step, retrieval, memory, tool calls into your stack, an approval queue, and weekly tuning by Rebotify operators. The model is the cheapest part of the stack to swap, and it should stay that way.
See where these models plug in: Integrations directory.
See the operating model: What managed AI means.
Dev-tools comparison: Claude Code vs Cursor vs Codex.
Latest model news: DeepSeek V4 — when to route to it.
See the comparison: ChatGPT vs an AI employee.
Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.