Zhipu Coding Plan × Oh My OpenCode: Multi-Model Orchestration Setup Guide
Why Bother
When it comes to writing code with AI, the gap between single-model and multi-model approaches keeps widening. No matter how strong a single model is, it can’t compete with a team of specialized models working in parallel.
Oh My OpenCode (OmO for short) is a multi-model orchestration plugin in the OpenCode ecosystem, with 11 Agents each having distinct responsibilities and 48 Hooks spanning the entire lifecycle. Zhipu’s Coding Plan provides access to the full GLM model series. Combining the two allows you to assign different models by role — strong coders for coding, strong reasoners for reasoning, free models for busywork.
This article documents my complete configuration process.
GLM Model Family
The coding-related models currently available via Zhipu Coding Plan:
| Model | What It’s For |
|---|---|
| GLM-5 | Open-source flagship, 744B MoE, 200K context, SWE-bench 77.8% |
| GLM-5-turbo | Closed-source, optimized specifically for Agent workflows on top of GLM-5, tool call error rate reduced from 2-6% to 0.67%, ~36% faster |
| GLM-5.1 | Post-training optimized version, coding score increased from 35.4 to 45.3 (+28%), equivalent to ~94.6% of Claude Opus 4.6 |
| GLM-4.7 | Solid reasoning quality, max variant supports extended thinking |
| GLM-4.7-flash | Speed-prioritized 4.7 variant |
| GLM-5v-turbo | Multimodal, capable of image understanding |
These three “5-series” models are easy to confuse. Here’s the relationship:
| |
Selection is straightforward: pick 5.1 for coding, 5-turbo for stable Agent operations, 4.7 for reasoning, 5v-turbo for image understanding, and 4.7-flash for speed.
OmO’s Agent Architecture
OmO’s approach is straightforward: each Agent gets its own system prompt, tool permissions, and model. No one-size-fits-all.
Agents
| Agent | What It Does | What Model It Needs |
|---|---|---|
| Sisyphus | Main orchestrator, task decomposition and scheduling | Best coding |
| Prometheus | Planner, requirement clarification and plan creation | Long-chain stability, reliable tool calling |
| Oracle | Architecture advisor, read-only analysis | Deep reasoning |
| Librarian | Document and API retrieval | Comprehension ability |
| Explore | Codebase search | Fast |
| Metis | Pre-planning consultation, finding blind spots | Deep reasoning |
| Momus | Plan review | Deep reasoning |
| Atlas | Todo management | Lightweight is fine |
| Multimodal-Looker | Screenshots, PDF analysis | Must handle images |
| Sisyphus-Junior | Concrete implementation | Routed by task type |
Categories
When Sisyphus delegates tasks, it doesn’t specify a model directly — it specifies a Category, and the Category automatically routes to the corresponding model:
| Category | When It’s Used |
|---|---|
visual-engineering | Frontend UI, CSS |
ultrabrain | Difficult logic, architecture decisions |
deep | Autonomous research + end-to-end implementation |
artistry | Creative solutions |
quick | Small changes |
unspecified-low | Low-complexity busywork |
unspecified-high | High-complexity busywork |
writing | Documentation |
Configuration File
The final oh-my-openagent.json:
| |
Why This Configuration
Sisyphus → GLM-5.1
The main orchestrator is the entry point for everything — coding ability can’t be compromised. GLM-5.1 scores 45.3 on coding, a 28% improvement over GLM-5’s 35.4. The gap is significant.
Prometheus → GLM-5-turbo
The planner’s job is long-chain task decomposition, and the key requirement is reliable tool calling. GLM-5 occasionally gets stuck in thought loops; GLM-5-turbo specifically addresses this, reducing tool call error rates from 2-6% to 0.67%. The worst thing for a planner is the Agent getting stuck mid-execution.
Oracle / Librarian → GLM-4.7
Oracle is a read-only architecture advisor; Librarian searches docs and looks up APIs. These positions need comprehension rather than coding ability — GLM-4.7 is sufficient.
Metis / Momus → GLM-4.7 (max)
Pre-planning consultation and plan review both require deep reasoning. variant: "max" enables extended thinking mode, allowing the model to think more deeply.
Explore → Qwen3.6 Plus (free)
Code search is a high-frequency operation that can run dozens of times a day — a free model is sufficient. Qwen3.6 has decent code understanding ability.
Atlas → Minimax M2.5 (free)
Todo management — nothing special, free is fine.
Multimodal-Looker → GLM-5v-turbo
The only Agent that needs to process images — no other choice.
Sisyphus-Junior → GPT-5-nano (free)
It doesn’t make decisions itself; it routes to specific models via Categories, so a lightweight base model is sufficient.
Categories
visual-engineeringuses 5v-turbo because frontend tasks often involve looking at screenshots and design mockupsultrabrainuses 4.7 max — difficult logic needs extended thinkingdeepuses Nemotron-3 Super (NVIDIA’s free model) — autonomous research consumes a lot of tokens, the free model can handle itartistryuses 5.1 — creative problems also need strong coding abilityquickandunspecified-lowuse GPT-5-nano — minor tasks don’t warrant a strong modelunspecified-highuses Minimax M2.5writinguses 4.7-flash — documentation writing just needs speed
On Prefixes
The config uses three types of prefixes:
zai-coding-plan/*— Provided by Zhipu Coding Plan subscriptionzhipuai-coding-plan/*— Zhipu direct API connectionopencode/*— OpenCode platform free models
The underlying models are the same — the difference is in billing. Coding Plan has quotas but lower per-unit cost; direct API has no limits but charges per token. I tried to use free models for high-frequency Agents and Coding Plan for core tasks.
In Practice
For daily coding, I type ultrawork — Sisyphus (5.1) takes over, decomposes the task, dispatches Explore (Qwen for code search) and Librarian (4.7 for docs) in parallel, aggregates the results, and hands off to Junior for implementation. In everyday refactoring and bug fixing, 5.1 feels comparable to Opus. Mixed Chinese-English codebases are handled without issues.
For complex planning, I use /start-work to trigger Prometheus (5-turbo). It asks a few questions to clarify requirements, then produces a structured plan. Previously, running GLM-5 for planning would occasionally result in thought loops; since switching to 5-turbo, this hasn’t happened.
Architecture reviews don’t need manual triggering — when facing difficult decisions or two consecutive failed fixes, Sisyphus automatically consults Oracle (4.7). Oracle is read-only, providing analysis and recommendations without modifying code. 4.7’s reasoning depth on module partitioning and interface design is sufficient.
For multimodal scenarios, Multimodal-Looker (5v-turbo) handles screenshots and PDFs. Combined with the visual-engineering category, frontend tasks can generate code directly from design mockups.
A Few Things to Note
GLM-5.1’s 94.6% Opus equivalent score is Z.ai’s own measurement; third-party verification is still in progress. However, GLM-5’s SWE-bench 77.8% has been externally validated, so the number isn’t unreasonable.
GLM-5-turbo is closed-source, unlike GLM-5 which uses the MIT license. Z.ai says improvements will eventually be merged into the open-source version.
Coding Plan has request quotas. For high-frequency usage, route core tasks through the subscription and busywork through free models.
Summary
The core idea is one sentence: choose models by role.
- Code writers use 5.1
- Agent workflow runners use 5-turbo
- Reasoning and reviewers use 4.7 (max)
- Image processors use 5v-turbo
- Busywork uses free models
Not every role needs the strongest model — just match them correctly.