<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>SRE on Mi&amp;Bee Blog</title><link>/en/tags/sre/</link><description>Recent content in SRE on Mi&amp;Bee Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>蓝宝石的傻话</managingEditor><lastBuildDate>Sat, 20 Jun 2020 00:00:00 +0000</lastBuildDate><atom:link href="/en/tags/sre/rss.xml" rel="self" type="application/rss+xml"/><item><title>Eyes On You: From SRE Principles to Prometheus Monitoring System Implementation</title><link>/en/posts/telemetry/prometheus-first/</link><pubDate>Sat, 20 Jun 2020 00:00:00 +0000</pubDate><guid>/en/posts/telemetry/prometheus-first/</guid><description>&lt;p&gt;In the context of distributed internet services, high concurrency, and multi-cloud deployment, &lt;strong&gt;SRE (Site Reliability Engineering)&lt;/strong&gt; has become a core role in ensuring service availability, and the &lt;strong&gt;monitoring system&lt;/strong&gt; serves as SRE&amp;rsquo;s &amp;ldquo;eyes.&amp;rdquo; This article starts from SRE core principles, deconstructs the pain points of modern monitoring systems, technology stack selection, Prometheus core principles, and alerting best practices, presenting a practical enterprise-grade monitoring system construction methodology.&lt;/p&gt;
&lt;h2 id="sre-core-principles-stability-is-the-1-metric"&gt;SRE Core Principles: Stability is the #1 Metric&lt;/h2&gt;
&lt;p&gt;SRE&amp;rsquo;s core is &lt;strong&gt;ensuring continuous service stability through engineering practices&lt;/strong&gt;, focusing on capacity planning, cluster maintenance, fault tolerance, load balancing, and monitoring system construction. There are only 3 core measurement metrics:&lt;/p&gt;</description></item></channel></rss>