Architecture & Design

Hybrid Cloud Cross-Region Monitoring System Governance: Autonomous + Unified Dual-Core Architecture Practice

January 10, 2022

In the context of global business expansion and large-scale hybrid cloud deployment, cross-IDC, cross-border, multi-cloud heterogeneous monitoring governance has become a core challenge for stability assurance. Traditional monitoring solutions either rely on expensive dedicated line upgrades that intrude on business architecture, or cannot balance node autonomy with global unification. Meanwhile, as a non-revenue infrastructure, the monitoring system must strictly control resource usage without allowing capability degradation. This article breaks down a practical cross-region monitoring system governance solution from a real internet company, explaining how to achieve elastic scaling, cross-border coverage, node autonomy, and data unification for the monitoring system without modifying business architecture or incurring business cross-domain costs.

Continue reading →

From Bottleneck Breakthrough to Platform Governance — The Full Evolution of an Internet Company's Monitoring Platform Architecture

January 10, 2022

In the context of rapid internet business expansion, multi-cloud deployment, and exponential asset growth, the monitoring platform is a critical infrastructure for ensuring service stability. This article provides a complete review of a major internet company’s monitoring platform evolution from 2019 to 2021 — from solving legacy monitoring performance bottlenecks, to implementing cross-cloud distributed monitoring, to cloud-native platform governance — presenting the full transformation of the monitoring system from 0 to 1 build → large-scale expansion → platform governance.

Continue reading →

Black-box Probing Monitoring System Architecture Design and Practice for Internet Companies

August 31, 2021

In the full-link monitoring system of internet services, white-box monitoring focuses on proactively uncovering potential issues and predicting risks, while black-box monitoring is fault-oriented, rapidly detecting problems that have already occurred online. The two work together to form a complete monitoring closed loop. Most internet companies have long had a monitoring blind spot for public network services and the user-side last mile. User-side faults often only trigger investigation after users report issues. The black-box probing monitoring system was designed precisely to solve this industry pain point.

Continue reading →

Monitoring System Enterprise Architecture Evolution — Probing Monitoring

December 12, 2020

Recap In “Monitoring System Enterprise Architecture Evolution — Cross-Region Hybrid Cloud”, the monitoring system had gradually matured and evolved toward enterprise-level capabilities. This chapter briefly describes the construction of the probing capability during this period. Below is the development history of this system. During the construction of the monitoring platform, internal monitoring collection alone was insufficient to meet enterprise business needs. Before planning APM development, remote probing with black-box monitoring was also incorporated as a subsystem.

Continue reading →

Monitoring System Enterprise Architecture Evolution — Cross-Region Hybrid Cloud

October 12, 2020

Recap In “Monitoring System Enterprise Architecture Evolution — First Steps with Prometheus”, the monitoring system had already been upgraded from a single-node architecture to a single IDC distributed architecture. The content of the previous article applies to both VM-based and container-based deployments. Prometheus is a product of the cloud-native era and is commonly used alongside Kubernetes, but Prometheus itself can also replace traditional monitoring solutions like Zabbix in non-Kubernetes environments. In this article, we begin to use Kubernetes deployment to upgrade the entire monitoring system architecture, making it more flexible for cross-region hybrid cloud business scenarios.

Continue reading →

Large Enterprise Email System Architecture Design and Full Mail Flow Analysis

June 10, 2020

As enterprise digitalization scales up, large organizations demand extreme capabilities from email systems: independent deployment, high availability, global interoperability, security protection, and load balancing. This article breaks down the practical architecture of a dedicated large enterprise email system, covering overall design, physical/logical deployment, core service systems, and the full send/receive mail flow, providing a reference technical solution for enterprise-level email architecture implementation. I. Overall System Architecture Design Large enterprise email systems adopt a layered architecture of “frontend gateway layer + load balancing layer + core service layer + backend independent mail system”, balancing security isolation, traffic scheduling, and business independence. The overall architecture is as follows:

Continue reading →

Monitoring System Enterprise Architecture Evolution — First Steps with Prometheus

December 12, 2019

Prometheus is an open-source monitoring and time series database system that has gained widespread adoption in recent years. The official architecture diagram is shown below:

Continue reading →