AI stack

The frontier and open-weight AI models behind your AI employee.

OpenAI, Anthropic, Google Gemini, DeepSeek, Qwen, Kimi, GLM, Llama, Mistral — plus image gen, video gen and voice. Routed per task, not per vendor relationship. We pick the model that wins the step. You see one approval queue.

Email Mia See the integrations

Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.

DIRECT ANSWER

Which AI models does your AI employee use?

Your AI employee routes across the closed-source frontier — OpenAI (ChatGPT, GPT-5, Codex, DALL·E, Whisper, Sora), Anthropic (Claude, Claude Code), Google DeepMind (Gemini, Veo, Imagen), xAI (Grok), Cohere — and the open-weight frontier — Meta (Llama), Mistral, DeepSeek, Alibaba (Qwen), Moonshot (Kimi), Zhipu (GLM).

For media, it routes through Midjourney, Black Forest Labs (Flux), Ideogram and Stable Diffusion for image; Runway, Sora, Pika, Luma and Kling for video; ElevenLabs, Cartesia, Deepgram and AssemblyAI for voice.

The point is not which logo we use. The point is that the choice is made per step, by us, and changes the day the leaderboard does — without touching your workflow.

THE STACK

32+ AI vendors and surfaces, one approval queue.

Six layers: closed-source frontier labs, open-weight model labs, image generation, video generation, voice and speech, and the developer surfaces our team and your AI employee actually run in. Every layer rotates. Your workflow doesn’t.

Background: what “managed AI” means in practice.

Frontier closed-source labs05

The big four. We mix-and-match because no single provider wins every task. Reasoning, coding, long-context recall, tool use and multilingual nuance each have a different leader, and that leaderboard moves every quarter.

OpenAI

GPT-class models for general reasoning, function calling and structured output. Default for most agent loops where speed and tool reliability matter.

ChatGPT
GPT-5
OpenAI Codex
DALL·E
Whisper
Sora

Anthropic

Claude models for long-context document work, careful instruction-following and code review. Strong fit for review-before-send approval gates.

Claude
Claude Code
Sonnet 4.6
Opus 4.7

Google DeepMind

Gemini for very long context windows (1M+ tokens), Workspace-native tasks and multimodal pipelines. Veo for video, Imagen for image, Gemma open-weight for self-host.

Gemini 2.5
Gemini Flash
Gemma
Veo
Imagen

xAI

Grok models for real-time-feed-aware tasks where current-day information from public posts changes the answer.

Grok

Cohere

Best-in-class embeddings and rerankers for retrieval. The quiet half of every accurate RAG pipeline.

Command
Embed
Rerank

Open-weight model labs06

Open-weight models we self-host (or rent on serverless inference) when a workflow needs zero data egress, EU sovereignty, GPU economics or a fallback path that isn’t a US API call. The Chinese labs are now genuinely competitive on reasoning and code.

Meta · Llama

Llama is the open-weight workhorse — biggest community, deepest fine-tuning ecosystem, easiest to run on customer infrastructure.

Llama 4

Mistral

European frontier lab. Strong open-weight options for customers who need EU data residency or want a sovereign-cloud fallback path.

Mistral Large
Mixtral
Codestral

DeepSeek

Open-weight reasoning models that hit GPT-class quality at a fraction of the unit cost. Used for high-volume background jobs where latency forgives a few extra steps.

DeepSeek-V3
DeepSeek-R1

Alibaba · Qwen

Top-tier open-weight family for multilingual reasoning, code and vision-language. Often the strongest non-English option.

Qwen3
Qwen-VL
Qwen Coder

Moonshot · Kimi

Kimi-class models with extreme long-context windows and competitive reasoning. Useful when the workflow needs to reason over hundreds of pages in one shot.

Kimi K2
Kimi Long-Context

Zhipu · GLM

GLM family from Zhipu AI. Strong bilingual reasoning and code; popular for cost-sensitive Asian-market deployments.

GLM-4.6
CodeGeeX

Image generation04

Marketing assets, product mockups, hero imagery and on-the-fly visual drafts. We route by style: photoreal vs. illustration vs. typography-heavy vs. brand-consistent.

Midjourney

Aesthetic ceiling for moodboards and brand-led imagery. Default for hero shots and concept work.

Black Forest Labs · Flux

Flux models for fast photoreal generation with reliable prompt adherence. Strong API economics for batch jobs.

Flux 1.1 Pro
Flux Kontext

Ideogram

Best-in-class typography rendering — actually spells the words right. Used for posters, labels and any image with copy in it.

Stability AI

Stable Diffusion family. Open-weight option for customers who need image generation inside their own VPC.

Stable Diffusion 3.5

Video generation05

Short-form video, product demos, talking-head explainers and B-roll. Generative video is now reliable enough for ad creative and onboarding loops — picked per use case.

Runway

The most-used end-to-end video tool. Gen-3 for generation, plus the timeline and editor primitives to actually finish a cut.

Gen-3
Gen-4

OpenAI · Sora

OpenAI’s text-to-video model. Strongest physics + temporal coherence on cinematic clips.

Pika

Fast iteration on social-format clips and effects. Good fit for short-form ad creative pipelines.

Luma · Dream Machine

Dream Machine and Ray for camera-aware generation with controllable motion paths.

Kuaishou · Kling

Chinese frontier video lab. Strong character consistency and motion realism on 10-second clips.

Voice & speech04

Speech-to-text for meeting transcripts and call analytics, text-to-speech for outbound voicebots and IVR replacement. Latency and naturalness vary widely by vendor — picked per call type.

ElevenLabs

Default text-to-speech for natural English narration, voice cloning and multilingual delivery. Studio-grade output.

Cartesia

Sub-100ms text-to-speech for real-time voice agents. The latency leader when a customer is on the other end of a phone call.

Sonic

Deepgram

Streaming speech-to-text with strong word-error-rate on noisy phone audio. Default for live transcription.

Nova-3

AssemblyAI

Async transcription with speaker diarisation, sentiment and entity tagging. Used for call-recording analysis pipelines.

Assistants & developer tools08

The surfaces our team and our agents actually run in. Picked for tool-calling reliability, edit precision and audit trails — not for chat aesthetics.

ChatGPT

OpenAI’s consumer + API surface. Used inside the loop for fast drafts, JSON-mode tool calls, and operator-friendly transcripts.

Claude

Anthropic’s assistant surface. Default for nuanced writing, contract review and any task where “did it actually follow the brief” matters.

Claude Code

Anthropic’s CLI coding agent. We use it daily to ship the platform itself — including the page you’re reading.

Gemini

Google’s assistant. Wins on Workspace context — Gmail, Drive, Calendar — without a brittle integration layer.

Perplexity

Cited-answer search surface. Used for up-to-date research steps where the agent needs to footnote its sources before drafting.

Cursor

Editor-side coding agent. Pairs well with Claude Code for refactor-heavy work where AST awareness beats raw chat speed.

GitHub

Where the platform code lives. Copilot, Actions and PR review are all part of the daily loop.

Hugging Face

Source of truth for open-weight model evaluations and the registry we pull from when self-hosting Llama, Qwen, DeepSeek, Kimi or GLM.

+ THE LEADERBOARD MOVES EVERY QUARTER.

When a new model lands and beats the incumbent on your workflow, we route to it. You don’t re-procure, re-contract or re-train your team. That’s the job.

HOW WE PICK

Three rules, applied per step.

Multi-model

Right model per task, not per vendor relationship.

Routing is per-step, not per-customer. Reasoning steps go to Claude, GPT-5 or Kimi. High-volume classification goes to a smaller, faster open-weight model. Long-context recall goes to Gemini. Embeddings go to Cohere. Image gen goes to Flux or Midjourney. Voice goes to Cartesia or ElevenLabs. Your agent sees one interface; the cost sheet sees the cheapest answer that meets the bar.

Lab-independent

Frontier moves every quarter — your workflow shouldn’t.

A workflow built on a single vendor breaks the day that vendor’s next release regresses on your task. We abstract the model behind the workflow so we can swap providers in a week, not a quarter, when the leaderboard shifts. Open-weight options are a real fallback, not a slide.

Approval-gated

Every model output passes a human-readable check.

The model is never the last step. A reviewer sees the draft, the source citations and the diff before anything sends. The same gate works whether the underlying call hit Claude, GPT, Gemini, DeepSeek, Qwen or a self-hosted Llama.

Tell us the workflow. We will tell you which models we would route it through this quarter, and what we would swap if next quarter looks different.

Map my workflow

FAQ

About the AI stack.

Which AI models does my AI employee use?

Your AI employee routes across the closed-source frontier (OpenAI GPT-5, Anthropic Claude, Google Gemini, xAI Grok, Cohere) and the open-weight frontier (Meta Llama, Mistral, DeepSeek, Qwen, Moonshot Kimi, Zhipu GLM). For media, it routes to Midjourney, Flux, Ideogram or Stable Diffusion for image; Runway, Sora, Pika, Luma or Kling for video; ElevenLabs, Cartesia, Deepgram or AssemblyAI for voice. The choice is per step.

Why use multiple AI providers instead of just OpenAI or just Anthropic?

No single lab leads on every task. Reasoning, code editing, long-context recall, retrieval reranking, image generation and voice latency each have a different best-in-class option, and the leaderboard shifts every quarter. Routing per-step is cheaper and more accurate than picking one vendor and hoping.

OpenAI vs Anthropic vs Google — which one is best?

They are best at different things. OpenAI’s GPT-5 leads on tool-calling and structured output. Anthropic’s Claude leads on long-form writing, code review and instruction-following. Google’s Gemini leads on long-context recall and Workspace-native tasks. A production AI employee uses all three on different steps of the same workflow.

Do you support open-source models like DeepSeek, Qwen, Kimi and GLM?

Yes — heavily. DeepSeek-V3 and DeepSeek-R1 are part of the routing menu for high-volume background jobs. Qwen3 is the default for multilingual workflows. Kimi K2 handles very long context. GLM is an option for cost-sensitive Asian-market deployments. Llama and Mistral are options when the customer requires self-hosted inference inside their own VPC. We pull weights from Hugging Face and host on your cloud or ours.

Can my AI employee generate images, video and voice — or only text?

All of the above. Image generation routes through Midjourney, Black Forest Labs (Flux), Ideogram or Stable Diffusion. Video routes through Runway, Sora, Pika, Luma or Kling. Voice routes through ElevenLabs (TTS), Cartesia (low-latency real-time TTS), Deepgram and AssemblyAI (STT).

Can I bring my own API keys?

Yes. If you already have OpenAI, Anthropic, Google, Azure OpenAI or Hugging Face contracts you want the workflow to run under, we wire them in directly. Otherwise the work runs through pooled provider accounts and you pay for the workflow, not the tokens.

How does this differ from a thin ChatGPT wrapper?

A wrapper exposes one model behind a UI. A managed AI employee wraps a workflow — model selection per step, retrieval, memory, tool calls into your stack, an approval queue, and weekly tuning by Rebotify operators. The model is the cheapest part of the stack to swap, and it should stay that way.

See where these models plug in: Integrations directory.

See the operating model: What managed AI means.

Dev-tools comparison: Claude Code vs Cursor vs Codex.

Latest model news: DeepSeek V4 — when to route to it.

See the comparison: ChatGPT vs an AI employee.

48-HOUR START

One workflow. One named AI employee. Routed across the right models.

Email Mia

Mia is our AI employee. Email her — she’ll book your 15-minute call. That’s the demo.