Observability

Eyes On You: From SRE Principles to Prometheus Monitoring System Implementation

In the context of distributed internet services, high concurrency, and multi-cloud deployment, SRE (Site Reliability Engineering) has become a core role in ensuring service availability, and the monitoring system serves as SRE’s “eyes.” This article starts from SRE core principles, deconstructs the pain points of modern monitoring systems, technology stack selection, Prometheus core principles, and alerting best practices, presenting a practical enterprise-grade monitoring system construction methodology. SRE Core Principles: Stability is the #1 Metric SRE’s core is ensuring continuous service stability through engineering practices, focusing on capacity planning, cluster maintenance, fault tolerance, load balancing, and monitoring system construction. There are only 3 core measurement metrics:

Continue reading →

Monitoring Collection Notes

MySQL Monitoring MySQL Privilege Best Practices Privilege control is primarily for security reasons, so follow these best practices: Grant only the minimum privileges needed to prevent users from doing harm. For example, if a user only needs to query, just grant SELECT privileges, not UPDATE, INSERT, or DELETE. Restrict the login host when creating users, typically to a specific IP or internal network IP range. Delete users without passwords after initializing the database. The installation automatically creates some users with no passwords by default. Set passwords that meet complexity requirements for each user. Periodically clean up unnecessary users. Revoke privileges or delete users. Example:

Continue reading →