Zhipu Coding Plan × Oh My OpenCode: Multi-Model Orchestration Setup Guide

April 5, 2026 AI Tools AI Programming, Multi-Model Orchestration 1264 words 6 min read

🔊

Why Bother

When it comes to writing code with AI, the gap between single-model and multi-model approaches keeps widening. No matter how strong a single model is, it can’t compete with a team of specialized models working in parallel.

Oh My OpenCode (OmO for short) is a multi-model orchestration plugin in the OpenCode ecosystem, with 11 Agents each having distinct responsibilities and 48 Hooks spanning the entire lifecycle. Zhipu’s Coding Plan provides access to the full GLM model series. Combining the two allows you to assign different models by role — strong coders for coding, strong reasoners for reasoning, free models for busywork.

This article documents my complete configuration process.

GLM Model Family

The coding-related models currently available via Zhipu Coding Plan:

Model	What It’s For
GLM-5	Open-source flagship, 744B MoE, 200K context, SWE-bench 77.8%
GLM-5-turbo	Closed-source, optimized specifically for Agent workflows on top of GLM-5, tool call error rate reduced from 2-6% to 0.67%, ~36% faster
GLM-5.1	Post-training optimized version, coding score increased from 35.4 to 45.3 (+28%), equivalent to ~94.6% of Claude Opus 4.6
GLM-4.7	Solid reasoning quality, `max` variant supports extended thinking
GLM-4.7-flash	Speed-prioritized 4.7 variant
GLM-5v-turbo	Multimodal, capable of image understanding

These three “5-series” models are easy to confuse. Here’s the relationship:

text
1
2
3
GLM-5          ← Open-source base, can do everything but occasionally erratic
  ├── 5-turbo  ← Optimized for Agents: fast, stable, reliable tool calling
  └── 5.1      ← Enhanced for coding: code quality improved by 28%

Selection is straightforward: pick 5.1 for coding, 5-turbo for stable Agent operations, 4.7 for reasoning, 5v-turbo for image understanding, and 4.7-flash for speed.

OmO’s Agent Architecture

OmO’s approach is straightforward: each Agent gets its own system prompt, tool permissions, and model. No one-size-fits-all.

Agents

Agent	What It Does	What Model It Needs
Sisyphus	Main orchestrator, task decomposition and scheduling	Best coding
Prometheus	Planner, requirement clarification and plan creation	Long-chain stability, reliable tool calling
Oracle	Architecture advisor, read-only analysis	Deep reasoning
Librarian	Document and API retrieval	Comprehension ability
Explore	Codebase search	Fast
Metis	Pre-planning consultation, finding blind spots	Deep reasoning
Momus	Plan review	Deep reasoning
Atlas	Todo management	Lightweight is fine
Multimodal-Looker	Screenshots, PDF analysis	Must handle images
Sisyphus-Junior	Concrete implementation	Routed by task type

Category	When It’s Used
`visual-engineering`	Frontend UI, CSS
`ultrabrain`	Difficult logic, architecture decisions
`deep`	Autonomous research + end-to-end implementation
`artistry`	Creative solutions
`quick`	Small changes
`unspecified-low`	Low-complexity busywork
`unspecified-high`	High-complexity busywork
`writing`	Documentation

Configuration File

The final oh-my-openagent.json:

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
{
  "$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-openagent/dev/assets/oh-my-opencode.schema.json",
  "agents": {
    "sisyphus": {
      "model": "zhipuai-coding-plan/glm-5.1"
    },
    "oracle": {
      "model": "zhipuai-coding-plan/glm-4.7"
    },
    "librarian": {
      "model": "zhipuai-coding-plan/glm-4.7"
    },
    "explore": {
      "model": "opencode/qwen3.6-plus-free"
    },
    "multimodal-looker": {
      "model": "zhipuai-coding-plan/glm-5v-turbo"
    },
    "prometheus": {
      "model": "zhipuai-coding-plan/glm-5-turbo"
    },
    "metis": {
      "model": "zhipuai-coding-plan/glm-4.7",
      "variant": "max"
    },
    "momus": {
      "model": "zhipuai-coding-plan/glm-4.7",
      "variant": "max"
    },
    "atlas": {
      "model": "opencode/minimax-m2.5-free"
    },
    "sisyphus-junior": {
      "model": "opencode/gpt-5-nano"
    }
  },
  "categories": {
    "visual-engineering": {
      "model": "zhipuai-coding-plan/glm-5v-turbo"
    },
    "ultrabrain": {
      "model": "zhipuai-coding-plan/glm-4.7",
      "variant": "max"
    },
    "deep": {
      "model": "opencode/nemotron-3-super-free"
    },
    "artistry": {
      "model": "zhipuai-coding-plan/glm-5.1"
    },
    "quick": {
      "model": "opencode/gpt-5-nano"
    },
    "unspecified-low": {
      "model": "opencode/gpt-5-nano"
    },
    "unspecified-high": {
      "model": "opencode/minimax-m2.5-free"
    },
    "writing": {
      "model": "zhipuai-coding-plan/glm-4.7-flash"
    }
  }
}

Why This Configuration

Sisyphus → GLM-5.1

The main orchestrator is the entry point for everything — coding ability can’t be compromised. GLM-5.1 scores 45.3 on coding, a 28% improvement over GLM-5’s 35.4. The gap is significant.

Prometheus → GLM-5-turbo

The planner’s job is long-chain task decomposition, and the key requirement is reliable tool calling. GLM-5 occasionally gets stuck in thought loops; GLM-5-turbo specifically addresses this, reducing tool call error rates from 2-6% to 0.67%. The worst thing for a planner is the Agent getting stuck mid-execution.

Oracle / Librarian → GLM-4.7

Oracle is a read-only architecture advisor; Librarian searches docs and looks up APIs. These positions need comprehension rather than coding ability — GLM-4.7 is sufficient.

Metis / Momus → GLM-4.7 (max)

Pre-planning consultation and plan review both require deep reasoning. variant: "max" enables extended thinking mode, allowing the model to think more deeply.

Explore → Qwen3.6 Plus (free)

Code search is a high-frequency operation that can run dozens of times a day — a free model is sufficient. Qwen3.6 has decent code understanding ability.

Atlas → Minimax M2.5 (free)

Todo management — nothing special, free is fine.

Multimodal-Looker → GLM-5v-turbo

The only Agent that needs to process images — no other choice.

Sisyphus-Junior → GPT-5-nano (free)

It doesn’t make decisions itself; it routes to specific models via Categories, so a lightweight base model is sufficient.

On Prefixes

The config uses three types of prefixes:

zai-coding-plan/* — Provided by Zhipu Coding Plan subscription
zhipuai-coding-plan/* — Zhipu direct API connection
opencode/* — OpenCode platform free models

The underlying models are the same — the difference is in billing. Coding Plan has quotas but lower per-unit cost; direct API has no limits but charges per token. I tried to use free models for high-frequency Agents and Coding Plan for core tasks.

In Practice

For daily coding, I type ultrawork — Sisyphus (5.1) takes over, decomposes the task, dispatches Explore (Qwen for code search) and Librarian (4.7 for docs) in parallel, aggregates the results, and hands off to Junior for implementation. In everyday refactoring and bug fixing, 5.1 feels comparable to Opus. Mixed Chinese-English codebases are handled without issues.

For complex planning, I use /start-work to trigger Prometheus (5-turbo). It asks a few questions to clarify requirements, then produces a structured plan. Previously, running GLM-5 for planning would occasionally result in thought loops; since switching to 5-turbo, this hasn’t happened.

Architecture reviews don’t need manual triggering — when facing difficult decisions or two consecutive failed fixes, Sisyphus automatically consults Oracle (4.7). Oracle is read-only, providing analysis and recommendations without modifying code. 4.7’s reasoning depth on module partitioning and interface design is sufficient.

For multimodal scenarios, Multimodal-Looker (5v-turbo) handles screenshots and PDFs. Combined with the visual-engineering category, frontend tasks can generate code directly from design mockups.

A Few Things to Note

GLM-5.1’s 94.6% Opus equivalent score is Z.ai’s own measurement; third-party verification is still in progress. However, GLM-5’s SWE-bench 77.8% has been externally validated, so the number isn’t unreasonable.

GLM-5-turbo is closed-source, unlike GLM-5 which uses the MIT license. Z.ai says improvements will eventually be merged into the open-source version.

Coding Plan has request quotas. For high-frequency usage, route core tasks through the subscription and busywork through free models.

Summary

The core idea is one sentence: choose models by role.

Code writers use 5.1
Agent workflow runners use 5-turbo
Reasoning and reviewers use 4.7 (max)
Image processors use 5v-turbo
Busywork uses free models

Not every role needs the strongest model — just match them correctly.