Domestic LLM Resource and Cost Comparison: GLM-5 / Kimi K2.5 / MiniMax M2.7
Overview
This article compares the resource requirements and usage costs of three major domestic LLMs, helping developers choose the right solution for their scenarios.
| Model | Vendor | Architecture | Minimum Deployable VRAM | API Available |
|---|---|---|---|---|
| GLM-5 | Zhipu AI | Dense (multiple versions) | 24GB (8B) | ✅ |
| Kimi K2.5 | Moonshot AI | MoE (undisclosed) | 24GB (lightweight) | ✅ |
| MiniMax M2.7 | MiniMax | MoE 230B | Not yet open-sourced | ✅ |
GLM-5 (Zhipu AI)
Versions & Hardware Requirements
GLM-5 offers 4 parameter versions, making it the widest-coverage domestic LLM currently available.
GLM-5-8B — Ideal for small to medium scenarios
Minimum: CPU 16-core/32GB + RTX 3090 (24GB); Recommended: CPU 32-core/64GB + RTX 4090 or A10 (24GB); Quantized: 16GB VRAM after 4-bit quantization; Context 128K, text-only.
GLM-5-40B — Enterprise workhorse
Minimum: Single A100 (80GB); Recommended: H100 (80GB) or 2×A100 (80GB); Context 128K, supports text/multimodal.
GLM-5-120B — Large-scale inference
Minimum/Recommended: 4×A100 or 4×H100 (80GB×4); Context 256K, supports text/multimodal.
GLM-5-700B — Ultra-large scale (megacorp only)
Minimum: 8×H100 (80GB); Recommended: 16×H100 (80GB); Context 512K+, supports text/multimodal.
Software environment: Linux (Ubuntu 20.04+ / CentOS 7+), requires CUDA 11.8+, Python 3.8+, PyTorch 2.0+. Only 8B supports Windows.
Cost
| Mode | 8B | 40B | 120B | 700B |
|---|---|---|---|---|
| Hardware Purchase | $2,800-5,500 | $14,000-21,000 | $55,000-83,000 | $280,000-415,000 |
| Annual Maintenance | ~$280 | $1,400-2,800 | $7,000-11,000 | $41,000-69,000 |
| Cloud Rental | $0.40-0.70/h | $2.80-4.20/h | $11-17/h | $69-111/h |
| API Input | $0.0014-0.0028/1K tokens | $0.008-0.017/1K tokens | $0.028-0.055/1K tokens | Undisclosed |
| API Output | $0.004-0.008/1K tokens | $0.025-0.050/1K tokens | $0.083-0.166/1K tokens | Undisclosed |
Kimi K2.5 (Moonshot AI)
Versions & Hardware Requirements
Kimi K2.5 uses a MoE architecture with partially undisclosed parameters, currently available in two versions.
Lightweight — Local deployable
Minimum: RTX 3090/4090 (24GB, 1.8-bit quantized) + 64GB RAM + 240GB disk; Recommended: B200 or higher + 256GB RAM + 375GB disk; Context 256K, supports text/image.
Standard — API only
Not yet open-sourced, available only via API; Context 256K, supports text/image.
Cost
| Mode | Lightweight | Standard |
|---|---|---|
| Hardware Purchase | $2,800-4,200 (4090+256GB RAM) | Not yet open-sourced |
| Annual Maintenance | ~$415 | Not yet open-sourced |
| Cloud Rental | $0.70-1.10/h (4090 instance) | Not yet open-sourced |
| API Official Input | $0.10/1K tokens | $0.10/1K tokens |
| API Official Output | $0.55/1K tokens | $0.55/1K tokens |
| API Third-party Input | $0.033/1K tokens | $0.033/1K tokens |
| API Third-party Output | $0.22/1K tokens | $0.22/1K tokens |
MiniMax M2.7 (MiniMax)
Versions & Hardware Requirements
MiniMax M2.7 uses a MoE architecture with 230B total parameters (10B activated), currently available only via API.
Basic — Text-only, 200K context Advanced — Text-only, 200K context
Both versions are not open-sourced and cannot be deployed locally.
Cost
| Mode | Basic | Advanced |
|---|---|---|
| API Input | $0.0005/1K tokens | $0.0014/1K tokens |
| API Output | $0.0017/1K tokens | $0.004/1K tokens |
Comprehensive Comparison
Monthly Cost for 1 Million Token Calls
| Model | Official API | Discounted/Third-party API | Monthly Local Deployment (3-yr depreciation) |
|---|---|---|---|
| GLM-5-8B | $5.50-11 | — | ~$14-28 |
| GLM-5-40B | $33-66 | — | ~$415-690 |
| Kimi K2.5 | $650 | $255 | ~$28-42 |
| MiniMax M2.7-Basic | $2.20 | — | — |
Recommendations
Personal/small team lightweight apps: Recommend MiniMax M2.7-Basic — Extremely low API pricing at only ~$2.20/month for 1M tokens, ideal for text-only scenarios.
Multimodal apps (image recognition, etc.): Recommend Kimi K2.5-Lightweight — Supports local deployment (24GB VRAM is sufficient), long context at no extra cost, and third-party API offers good value.
Enterprise complex reasoning: Recommend GLM-5-40B or MiniMax M2.7-Advanced — GLM-5 supports customized training, while MiniMax API offers excellent value.
Ultra-large scale customization: Recommend GLM-5-120B / 700B — Full pipeline customization, only suitable for enterprises with ample compute.
Summary
Best value: MiniMax M2.7-Basic, with API pricing at just 1/20th of GLM-5-40B.
Best multimodal choice: Kimi K2.5 Lightweight, supporting local deployment and image input.
Full scenario coverage: GLM-5 from 8B to 700B, meeting all scale requirements.
For non-customized needs, prefer API — pay-as-you-go without hardware and maintenance costs.