security-collector-exporter：监控 Linux 的安全审计

May 14, 2026 可观测性 Prometheus, Linux, 安全监控, Go, Exporter 2664 字 6 分钟阅读

🔊

为什么写这个

管服务器的人大概都有过这种经历：安全审计来了，一台一台 SSH 上去检查——SSH 配置对不对、SELinux 开没开、防火墙跑没跑、有没有过期账户、密码策略合不合规范。几台机器还好，几十台上百台就纯体力活了。

而且更难受的是，这些东西是没有持续监控的。今天查完合规了，明天有人改了个配置，你根本不知道。

Prometheus 生态里有 node_exporter 做系统基础指标（CPU、内存、磁盘），但安全配置状态这块一直是空白。security-collector-exporter 就是来填这个坑的——把 Linux 安全相关的配置和状态全部变成 Prometheus 指标，接入现有的监控体系，持续跟踪，自动告警。

它能采集什么

整体覆盖了 15 类安全指标，从账户到内核参数基本都管了：

类别	指标	说明
系统信息	`linux_security_os_version_info`	OS 版本、包数量、最后补丁时间
账户管理	`linux_security_account_info`	passwd 信息、sudo 权限检测
密码策略	`linux_security_password_*`	6 个独立指标覆盖 shadow 文件各字段
SSH 配置	`linux_security_sshd_config_info`	sshd_config 关键配置项
防火墙	`linux_security_firewall_enabled`	支持 firewalld/ufw/iptables/nftables
端口监控	`linux_security_ports_use_info`	含进程名、版本号、应用名称
服务状态	`linux_security_services_info`	systemd 服务启停和运行状态
SELinux	`linux_security_selinux_config`	配置和运行模式
内核参数	`linux_security_sysctl_*`	安全相关的 sysctl 参数校验
定时任务	`linux_security_crontab_info`	系统/用户 crontab 条目
审计服务	`linux_security_auditd_info`	auditd 状态和规则数量
登录策略	`linux_security_login_defs_info`	login.defs 配置项

用一张图说清楚整个采集链路：

mermaid
flowchart TB

    subgraph fs["📁 Linux 文件系统"]
        a1["👤 账户与密码策略<br/>passwd / shadow / login.defs"]
        a2["🔑 SSH 配置<br/>sshd_config"]
        a3["🛡️ 强制访问控制<br/>selinux/config / apparmor"]
        a4["📄 网络访问控制<br/>hosts.allow / hosts.deny"]
        a5["⏰ 定时任务<br/>crontab"]
    end

    subgraph proc["⚡ /proc 运行时数据"]
        b1["🌐 网络连接<br/>/proc/net/tcp / udp"]
        b2["🔎 进程信息<br/>/proc/pid/cmdline / exe / fd"]
        b3["📦 容器识别<br/>/proc/pid/cgroup / environ"]
    end

    subgraph svc["⚙️ 系统服务与命令"]
        c1["🧱 防火墙<br/>firewalld / ufw / nftables"]
        c2["📟 服务状态<br/>systemctl list-units"]
        c3["📦 包管理<br/>rpm / dpkg / pacman"]
        c4["🔍 审计守护<br/>auditd 状态与规则"]
    end

    subgraph exp["🔧 security-collector-exporter"]
        detect["版本检测引擎<br/>HTTP API / JAR MANIFEST / 命令行"]
        collect["Collector 指标汇总"]
    end

    fs --> collect
    proc --> collect
    proc --> detect
    svc --> collect
    detect --> collect
    collect -->|"暴露 /metrics :9102"| prom["📊 Prometheus"]

图分三层：上面是 Linux 系统的三类数据源（文件系统静态配置、/proc 运行时数据、系统命令输出），中间是 exporter 内部的 Collector 和版本检测引擎，下面是 Prometheus 采集端点。

几个有意思的设计

端口指标里的版本检测

端口指标不只是记个端口号和进程名。对于常见的服务（MySQL、Nginx、Redis 等），会尝试检测版本号；对于 Java 应用（Elasticsearch、Kafka、Tomcat、Jenkins 等），通过多种方式识别真实应用名称和版本——HTTP API 调用、JAR MANIFEST.MF 解析、命令行参数提取、容器镜像标签读取，层层 fallback。

这个功能花了最多精力，光 process_info.go 就写了 1347 行。因为 Java 应用光看进程名只显示 java，根本不知道跑的是 Elasticsearch 还是 Kafka。

Shadow 文件拆成独立指标

/etc/shadow 里的每个字段（最后修改时间、最大有效期、最小有效期、警告天数、不活跃天数、账户过期时间）没有合成一个大指标，而是拆成了 6 个独立的 gauge。这样在 PromQL 里做阈值判断就很自然：

promql
1
2
# 查密码有效期超过 90 天的账户
linux_security_password_max_days > 90

防火墙状态的多层检测

不是简单地 systemctl is-active firewalld 就完事。对每种防火墙类型都有独立的检测逻辑：检查 systemd service 文件状态、检查进程是否在跑、检查 ufw 的特殊状态文件（/var/lib/ufw/ufw-not-booted）、检查 iptables 规则文件路径。因为现实环境里防火墙"配了但没跑"的情况太常见了。

部署和运行

Docker 是最省心的方式：

bash
1
2
3
4
5
docker run -d \
  --name security-exporter \
  --privileged \
  -p 9102:9102 \
  ghcr.io/mickeyzzc/security-collector-exporter:0.1.0

需要 --privileged 是因为要读 /etc/shadow、/proc 这些系统文件。

几个实用的启动参数：

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 只采集 LISTEN 状态的端口（默认）
./security-exporter --collector.port-states="LISTEN"

# 也采集 ESTABLISHED 连接
./security-exporter --collector.port-states="LISTEN,ESTABLISHED"

# 只采集已启用的服务（默认行为）
./security-exporter --collector.services-enabled=true

# 同时过滤：只采集既启用又在运行的服务
./security-exporter --collector.services-enabled=true --collector.services-running=true

# 开 debug 日志排查问题
./security-exporter --log.level=debug

Prometheus 那边加个 scrape 配置就行：

yaml
1
2
3
4
scrape_configs:
  - job_name: 'security-exporter'
    static_configs:
      - targets: ['localhost:9102']

告警规则示例

项目里附了一份完整的安全合规告警规则，覆盖 SSH、SELinux、防火墙、密码策略、服务管理这些方面。挑几个典型的：

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Root SSH 登录没禁用——严重
- alert: RootSSHLoginEnabled
  expr: linux_security_sshd_config_info{info_key="PermitRootLogin", info_value="yes"}
  labels:
    severity: critical

# SELinux 没开强制模式
- alert: SELinuxNotEnforcing
  expr: linux_security_selinux_config{info_key="SELINUX", info_value=~"permissive|disabled"}
  labels:
    severity: warning

# 防火墙配了但没跑
- alert: FirewallNotRunning
  expr: linux_security_firewall_enabled{firewall_type!="none", is_running="false"} == 1
  labels:
    severity: warning

# 密码有效期超过 90 天
- alert: PasswordMaxDaysTooLong
  expr: linux_security_login_defs_info{info_key="PASS_MAX_DAYS", info_value="num"} > 90
  labels:
    severity: warning

甚至可以算一个安全合规评分（满分 100），把各项检查加权汇总：

promql
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(
  (linux_security_sshd_config_info{info_key="PermitRootLogin", info_value="no"} or vector(0)) * 20 +
  (linux_security_selinux_config{info_key="SELINUX", info_value="enforcing"} or vector(0)) * 15 +
  (linux_security_firewall_enabled{firewall_type!="none"} == 1) * 10 +
  (linux_security_firewall_enabled{firewall_type!="none", is_running="true"} == 1) * 5 +
  ((linux_security_login_defs_info{info_key="PASS_MIN_LEN", info_value="num"} >= 10) or vector(0)) * 10 +
  ((linux_security_login_defs_info{info_key="PASS_MAX_DAYS", info_value="num"} <= 90) or vector(0)) * 10 +
  (linux_security_services_info{service_name="xwindow", is_running="false"} or vector(0)) * 5 +
  (count(linux_security_services_info{service_name=~"nfs|cups|bluetooth|avahi-daemon|rpcbind|postfix", is_running="true"}) == 0) * 10 +
  (linux_security_hosts_options_info{file="hosts.deny", service="ALL", host="ALL", action="deny"} or vector(0)) * 5 +
  (linux_security_last_patch_time{package_type!="unknown"} or vector(0)) * 5
)

在 Grafana 里做成面板，一眼就能看出哪些机器不合规。

技术实现

纯 Go 实现，唯一的第三方依赖就是 prometheus/client_golang。没有用 shell 命令拼凑，安全相关的数据尽量通过读 /proc、/etc 下的文件获取，减少对外部命令的依赖。

架构比较直接：

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cmd/security-exporter/main.go     # 入口，HTTP Server
internal/collector/                # Prometheus Collector 实现
internal/system/                   # 各类安全检查模块
  ├── account_info.go              # 账户
  ├── network_info.go              # 网络
  ├── service_info.go              # 服务
  ├── process_info.go              # 进程版本检测（最大的一个文件）
  ├── selinux_detail.go            # SELinux
  └── ...
pkg/config/                        # 配置管理
pkg/logger/                        # 日志

每个 system 模块独立，一个模块出错不会影响其他模块的采集。

和 node_exporter 的关系

不是竞争关系，是互补。node_exporter 负责 OS 基础指标（CPU、内存、磁盘 IO），security-collector-exporter 负责安全配置状态。两个一起跑，监控面板上加在一起就是完整的系统健康+安全合规视图。

项目地址

代码在这里：github.com/mickeyzzc/security-collector-exporter

v0.1.0 是第一个稳定版本，支持 Linux AMD64 和 ARM64，Docker 镜像已发布到 GHCR。后续计划根据实际使用反馈继续迭代，有兴趣的可以提 issue 或 PR。