VictoriaMetrics Stream Aggregation: Three-Year Review and Current Status (2026)
Introduction
It’s been exactly three years since the previous article Applying VictoriaMetrics Stream Aggregation for Metrics was published in March 2023. In these three years, the VictoriaMetrics ecosystem has undergone tremendous changes—let’s revisit the issues raised in that blog post, see what the official project has resolved, and where our stream-metrics-route project stands today.
I. Problems We Encountered Three Years Ago
Let’s quickly recap the core issue list from the 2023 blog post:
| # | Problem | 2023 Status |
|---|---|---|
| P1 | Collection gap issue | Network jitter or performance issues causing time gaps, stream aggregation difference calculation inflated |
| P2 | Single-point compute limits for massive data | Stream aggregation has no historical state, excellent performance but single-instance bottleneck exists |
| P3 | Distributed task allocation | Which compute node should data be assigned to? |
| P4 | Out-of-order discarding for same-dimension metrics | Same-dimension metrics computed by multiple nodes with different time windows cause later values to be discarded |
| P5 | Resource balancing | Resource balancing in distributed computing |
| P6 | Task ID dimension explosion | Stream aggregation inserts node IDs into each aggregated time series, dimension labels increase with horizontal scaling |
To address these issues, we developed stream-metrics-route, a Go-based distributed stream aggregation gateway.
II. Three Years Later, How Has the Official Project Done?
I reviewed VictoriaMetrics changelogs from v1.86 to v1.138.0 and the official documentation. Let’s take stock of the official project’s efforts over these three years:
2.1 Perfectly Resolved ✅
Issues P3, P5: Distributed Task Allocation & Resource Balancing
Official solution: vmagent now natively supports -remoteWrite.shardByURL with consistent hashing sharding!
Starting from v1.86, native support for shardByURL was introduced. v1.138.0 (2026-03) went further, upgrading the data distribution algorithm from round-robin to consistent hashing, significantly reducing data redistribution ratios during node changes Changelog.
vmagent’s hash sharding architecture evolution:
flowchart LR
subgraph Collect["Collection Layer"]
A1@{ shape: doc, label: "Prometheus Agent 1" }
A2@{ shape: doc, label: "Prometheus Agent 2" }
end
subgraph VMAgent["vmagent Cluster"]
direction TB
B1(vmagent-0)
B2(vmagent-1)
B3(vmagent-2)
end
subgraph Shard["Sharding Logic"]
C@{ shape: diam, label: "Consistent Hashing" }
end
subgraph Storage["Storage Layer"]
D1@{ shape: cyl, label: "vmstorage-0" }
D2@{ shape: cyl, label: "vmstorage-1" }
D3@{ shape: cyl, label: "vmstorage-2" }
end
A1 -->|remote write| B1
A2 -->|remote write| B2
B1 --> C
B2 --> C
B3 --> C
C -->|shard 0| D1
C -->|shard 1| D2
C -->|shard 2| D3
classDef storage fill:#e8f5e9,stroke:#4caf50
class D1,D2,D3 storageThe VictoriaMetrics blog provides specific algorithm implementation and sharding deployment recommendations. Combined with VictoriaMetrics Operator, it also supports managing shards via
shardCount.
Issue P2: Single-Node Compute Scaling
vmagent now supports horizontal scaling (sharding) with replicas + shardCount, with HA support. See Issue #5573 discussion.
Out-of-Order / Delayed Data Accuracy (P1 Partial Mitigation)
v1.112.0 (2025-02) was a key release, adding Aggregation Windows! This provides dual-window buffering for histogram and rate calculations—flushes aren’t immediate but delayed by a samples_lag time, significantly improving accuracy for delayed data, at the cost of doubled memory (maintaining two aggregation windows simultaneously).
How Aggregation Windows Work:
sequenceDiagram
autonumber
participant C as Collector
participant V as vmagent
participant S as VictoriaMetrics
rect rgba(76,175,80,0.1)
Note over C,V: Data collection phase
C->>V: sample1 @T0
V->>V: Write to window A (current)
end
rect rgba(255,152,0,0.1)
Note over C,V: Delayed data arrives
C->>V: sample2 @T1 (delayed)
V->>V: Write to window B (previous)
end
Note over V: Dual-window parallel buffering
rect rgba(33,150,243,0.1)
Note over V,S: Aggregation output
V->>S: Aggregation result A @T2
V->>S: Aggregation result B @T3
endOfficial docs: Streaming aggregation - Aggregation windows
2.2 Still Unresolved ❌
True Distributed Stream Aggregation Coordination
vmagent’s stream aggregation is single-instance aggregation. There is no coordination mechanism between instances—if the same metric is aggregated by two vmagent instances, duplicate or conflicting data is produced. The official recommendation is to use without/by labels to divide instance responsibilities, rather than providing cross-instance distributed coordination.
Task ID Dimension Explosion (P6)
Official vmagent still inserts internal labels (such as _aggr related labels) into aggregated time series, but lacks a stream_task_id pre-marking + dimension control design.
III. stream-metrics-route: Current Status and Value
stream-metrics-route Core Code Review:
| File | Role |
|---|---|
| router.go | Routing core, filters metrics based on relabel rules |
| remotecluster.go | Dual hashmod scheduling core! |
| remotewrite.go | remote write HTTP client |
| kafka.go | Kafka producer |
Core Algorithm (remotecluster.go):
| |
stream-metrics-route Irreplaceability Analysis
Conclusion: stream-metrics-route is still needed in 2026! But its positioning should shift from “full stream aggregation gateway” to “metric distribution routing gateway + Kafka integration layer.” Core differentiated value:
- Dual hashmod scheduling + stream_task_id pre-injection: Tags metrics with
stream_task_idat the gateway layer; all subsequent nodes route consistently by this ID—this solves dimension control earlier at the data entry point than the official approach - Multi-backend async distribution: Supports async distribution to Kafka and remote write, solving the “synchronous forwarding blocking the time window” issue mentioned in the blog
- Native Prometheus relabeling integration
IV. Recommended Hybrid Architecture 2026
Recommended architecture:
flowchart LR
subgraph Collect["Collection Layer"]
Prometheus@{ shape: doc, label: "Prometheus Agent Cluster" }
KafkaProducer@{ shape: doc, label: "Business System Metrics" }
end
subgraph Route["Routing Layer"]
SMR@{ shape: hex, label: "stream-metrics-route" }
end
subgraph Aggregate["Aggregation Layer"]
vmagent0(vmagent-0)
vmagent1(vmagent-1)
vmagent2(vmagent-2)
end
subgraph Storage["Storage Layer"]
Victoria@{ shape: cyl, label: "VictoriaMetrics" }
end
subgraph Consume["Consumption Layer"]
vmalert(vmalert)
Grafana(Grafana)
end
Prometheus --> SMR
KafkaProducer --> SMR
SMR -->|task_id=0| vmagent0
SMR -->|task_id=1| vmagent1
SMR -->|task_id=2| vmagent2
SMR --> Kafka@{ shape: cyl, label: "Kafka Topic" }
vmagent0 --> Victoria
vmagent1 --> Victoria
vmagent2 --> Victoria
Victoria --> vmalert
Victoria --> Grafana
classDef route fill:#fff3e0,stroke:#ff9800
classDef storage fill:#e8f5e9,stroke:#4caf50
classDef consume fill:#e3f2fd,stroke:#2196f3
class SMR route
class Victoria,Kafka storage
class vmalert,Grafana consumeKey Configuration Recommendations
vmagent version requirement: >= v1.112.0, enable aggregation windows:
| |
Deployment: See deploy.yaml example
V. Evolution Recommendations
5.1 Short-term Recommendations
| Action | Description |
|---|---|
| Upgrade vmagent to >= v1.112.0 | Enable enable_windows: true to improve histogram aggregation accuracy |
| Evaluate whether stream-metrics-route is still needed | If there’s no Kafka requirement or high-cardinality stream_task_id control requirement, consider migrating away |
5.2 Medium-term Recommendations
| Action | Description |
|---|---|
| stream-metrics-route as front-end routing layer only | Retain hashmod task allocation + Kafka distribution |
| Disable raw metric persistence | Only write stream-aggregated results to storage, reducing storage volume |
| Add metadata management module | The ruler-handle-process mentioned in the blog (dynamic Record Rule by dimension) is worth self-developing or contributing |
5.3 Long-term Recommendations
| Action | Description |
|---|---|
| Contribute stream_task_id dimension control mechanism upstream | If this design is proven in production |
| Improve monitoring metrics | Add stream-aggregation-related business metrics (queue depth per routing rule, distribution latency) |
Summary
| Dimension | Conclusion |
|---|---|
| Blog problem resolution rate | ~50% (2/4 core problems resolved through official upgrades, 2/4 still need self-developed solutions or maintaining status quo) |
| Is stream-metrics-route still needed? | Still needed, positioning adjusted to “metric distribution routing gateway + Kafka integration layer” |
| Recommended architecture | Prometheus → stream-metrics-route → vmagent v1.112.0+ → VictoriaMetrics Storage |
References
- VictoriaMetrics vmagent Documentation
- Streaming Aggregation Official Documentation
- Changelog 2024-2026
- VictoriaMetrics Blog - How vmagent Works
- VictoriaMetrics Operator - VMAgent
- Issue #5573 - HA vmagent Deployment Recommendations
- stream-metrics-route GitHub Repository
Three years have passed, and the VictoriaMetrics ecosystem has matured significantly, but we still need to keep moving forward!