Enterprise Monitoring Architecture Evolution Series

9 posts
1
Monitoring Collection Notes
· 2 min read

MySQL Monitoring

MySQL Privilege Best Practices

Privilege control is primarily for security reasons, so follow these best practices:

  1. Grant only the minimum privileges needed to prevent users from doing harm. For example, if a user only needs to query, just grant SELECT privileges, not UPDATE, INSERT, or DELETE.
  2. Restrict the login host when creating users, typically to a specific IP or internal network IP range.
  3. Delete users without passwords after initializing the database. The installation automatically creates some users with no passwords by default.
  4. Set passwords that meet complexity requirements for each user.
  5. Periodically clean up unnecessary users. Revoke privileges or delete users.

Example:

2
Monitoring System Enterprise Architecture Evolution — First Steps with Prometheus
· 8 min read

Prometheus is an open-source monitoring and time series database system that has gained widespread adoption in recent years. The official architecture diagram is shown below:

3
Eyes On You: From SRE Principles to Prometheus Monitoring System Implementation
· 6 min read

In the context of distributed internet services, high concurrency, and multi-cloud deployment, SRE (Site Reliability Engineering) has become a core role in ensuring service availability, and the monitoring system serves as SRE’s “eyes.” This article starts from SRE core principles, deconstructs the pain points of modern monitoring systems, technology stack selection, Prometheus core principles, and alerting best practices, presenting a practical enterprise-grade monitoring system construction methodology.

SRE Core Principles: Stability is the #1 Metric

SRE’s core is ensuring continuous service stability through engineering practices, focusing on capacity planning, cluster maintenance, fault tolerance, load balancing, and monitoring system construction. There are only 3 core measurement metrics:

4
Monitoring System Enterprise Architecture Evolution — Cross-Region Hybrid Cloud
· 5 min read

Recap

In “Monitoring System Enterprise Architecture Evolution — First Steps with Prometheus”, the monitoring system had already been upgraded from a single-node architecture to a single IDC distributed architecture. The content of the previous article applies to both VM-based and container-based deployments. Prometheus is a product of the cloud-native era and is commonly used alongside Kubernetes, but Prometheus itself can also replace traditional monitoring solutions like Zabbix in non-Kubernetes environments. In this article, we begin to use Kubernetes deployment to upgrade the entire monitoring system architecture, making it more flexible for cross-region hybrid cloud business scenarios.

5
Monitoring System Enterprise Architecture Evolution — Probing Monitoring
· 3 min read

Recap

In “Monitoring System Enterprise Architecture Evolution — Cross-Region Hybrid Cloud”, the monitoring system had gradually matured and evolved toward enterprise-level capabilities. This chapter briefly describes the construction of the probing capability during this period. Below is the development history of this system. During the construction of the monitoring platform, internal monitoring collection alone was insufficient to meet enterprise business needs. Before planning APM development, remote probing with black-box monitoring was also incorporated as a subsystem.

6
Black-box Probing Monitoring System Architecture Design and Practice for Internet Companies
· 5 min read

In the full-link monitoring system of internet services, white-box monitoring focuses on proactively uncovering potential issues and predicting risks, while black-box monitoring is fault-oriented, rapidly detecting problems that have already occurred online. The two work together to form a complete monitoring closed loop. Most internet companies have long had a monitoring blind spot for public network services and the user-side last mile. User-side faults often only trigger investigation after users report issues. The black-box probing monitoring system was designed precisely to solve this industry pain point.

7
From Bottleneck Breakthrough to Platform Governance — The Full Evolution of an Internet Company's Monitoring Platform Architecture
· 6 min read

In the context of rapid internet business expansion, multi-cloud deployment, and exponential asset growth, the monitoring platform is a critical infrastructure for ensuring service stability. This article provides a complete review of a major internet company’s monitoring platform evolution from 2019 to 2021 — from solving legacy monitoring performance bottlenecks, to implementing cross-cloud distributed monitoring, to cloud-native platform governance — presenting the full transformation of the monitoring system from 0 to 1 build → large-scale expansion → platform governance.

8
Hybrid Cloud Cross-Region Monitoring System Governance: Autonomous + Unified Dual-Core Architecture Practice
· 6 min read

In the context of global business expansion and large-scale hybrid cloud deployment, cross-IDC, cross-border, multi-cloud heterogeneous monitoring governance has become a core challenge for stability assurance. Traditional monitoring solutions either rely on expensive dedicated line upgrades that intrude on business architecture, or cannot balance node autonomy with global unification. Meanwhile, as a non-revenue infrastructure, the monitoring system must strictly control resource usage without allowing capability degradation.

This article breaks down a practical cross-region monitoring system governance solution from a real internet company, explaining how to achieve elastic scaling, cross-border coverage, node autonomy, and data unification for the monitoring system without modifying business architecture or incurring business cross-domain costs.

9
Eyes On You: The 2022 Productization Journey of a Multi-Cloud Heterogeneous Monitoring Platform
· 6 min read

In the context of multi-cloud deployment, global networking, and exponential growth in service scale within internet businesses, monitoring platforms have long surpassed the basic “metric collection + alert notification” positioning, becoming the core infrastructure for ensuring end-to-end stability. This article, based on the real evolution of a large-scale internet enterprise monitoring platform, dissects the complete planning and implementation approach for upgrading the monitoring platform from large-scale coverage to productization, usability, and intelligence in 2022.