security-collector-exporter: Monitoring Linux Security Auditing

Why This Was Built

Anyone managing servers has probably had this experience: compliance audit comes, SSH into machines one by one to check—SSH config correct, SELinux enabled, firewall running, any expired accounts, password policies compliant. A few machines are fine; dozens or hundreds becomes purely manual grunt work.

And the more painful part: none of this has continuous monitoring. You check compliance today, someone changes a config tomorrow, and you’d never know.

The Prometheus ecosystem has node_exporter for basic system metrics (CPU, memory, disk), but security configuration state has always been a gap. security-collector-exporter fills this gap—turning all Linux security-related configurations and states into Prometheus metrics, integrating with existing monitoring systems, continuously tracking, and automatically alerting.

What It Collects

Overall coverage includes 15 categories of security metrics, from accounts to kernel parameters:

CategoryMetricsDescription
System Infolinux_security_os_version_infoOS version, package count, last patch time
Account Managementlinux_security_account_infopasswd info, sudo permission detection
Password Policylinux_security_password_*6 independent metrics covering shadow file fields
SSH Configlinux_security_sshd_config_infosshd_config key configuration items
Firewalllinux_security_firewall_enabledSupports firewalld/ufw/iptables/nftables
Port Monitoringlinux_security_ports_use_infoIncludes process name, version, application name
Service Statuslinux_security_services_infosystemd service start/stop and running status
SELinuxlinux_security_selinux_configConfiguration and enforcement mode
Kernel Parameterslinux_security_sysctl_*Security-related sysctl parameter validation
Scheduled Taskslinux_security_crontab_infoSystem/user crontab entries
Audit Servicelinux_security_auditd_infoauditd status and rule count
Login Policylinux_security_login_defs_infologin.defs configuration items

A diagram showing the entire collection pipeline:

mermaid
---
config:
  theme: base
  themeVariables:
    fontSize: 15px
    fontFamily: "system-ui, sans-serif"
---
flowchart TB

    subgraph fs["📁 Linux File System"]
        a1["👤 Accounts & Password Policy<br/>passwd / shadow / login.defs"]
        a2["🔑 SSH Configuration<br/>sshd_config"]
        a3["🛡️ Mandatory Access Control<br/>selinux/config / apparmor"]
        a4["📄 Network Access Control<br/>hosts.allow / hosts.deny"]
        a5["⏰ Scheduled Tasks<br/>crontab"]
    end

    subgraph proc["⚡ /proc Runtime Data"]
        b1["🌐 Network Connections<br/>/proc/net/tcp / udp"]
        b2["🔎 Process Information<br/>/proc/pid/cmdline / exe / fd"]
        b3["📦 Container Identification<br/>/proc/pid/cgroup / environ"]
    end

    subgraph svc["⚙️ System Services & Commands"]
        c1["🧱 Firewall<br/>firewalld / ufw / nftables"]
        c2["📟 Service Status<br/>systemctl list-units"]
        c3["📦 Package Management<br/>rpm / dpkg / pacman"]
        c4["🔍 Audit Daemon<br/>auditd status & rules"]
    end

    subgraph exp["🔧 security-collector-exporter"]
        detect["Version Detection Engine<br/>HTTP API / JAR MANIFEST / Command Line"]
        collect["Collector Metric Aggregation"]
    end

    fs --> collect
    proc --> collect
    proc --> detect
    svc --> collect
    detect --> collect
    collect -->|"Expose /metrics :9102"| prom["📊 Prometheus"]

The diagram has three layers: Linux system data sources at top (filesystem static config, /proc runtime data, system command output), the exporter’s internal Collector and Version Detection Engine in the middle, and the Prometheus collection endpoint at bottom.

Interesting Design Decisions

Version Detection in Port Metrics

Port metrics don’t just record port numbers and process names. For common services (MySQL, Nginx, Redis, etc.), it attempts to detect the version number; for Java applications (Elasticsearch, Kafka, Tomcat, Jenkins, etc.), it identifies the real application name and version through multiple methods—HTTP API calls, JAR MANIFEST.MF parsing, command-line argument extraction, container image label reading—layer by layer fallback.

This feature took the most effort; process_info.go alone is 1347 lines. Because Java applications only show java as the process name—you’d never know if it’s Elasticsearch or Kafka running.

Shadow File as Independent Metrics

Each field in /etc/shadow (last change date, max validity, min validity, warning days, inactive days, account expiration) isn’t combined into one large metric but split into 6 independent gauges. This makes PromQL threshold evaluations natural:

promql
1
2
# Find accounts with password validity exceeding 90 days
linux_security_password_max_days > 90

Multi-Layer Firewall State Detection

It doesn’t simply run systemctl is-active firewalld and call it done. Each firewall type has independent detection logic: checking systemd service file status, checking if the process is running, checking ufw’s special state file (/var/lib/ufw/ufw-not-booted), checking iptables rules file paths. Because in real environments, the situation where a firewall is “configured but not running” is all too common.

Deployment and Running

Docker is the simplest way:

bash
1
2
3
4
5
docker run -d \
  --name security-exporter \
  --privileged \
  -p 9102:9102 \
  ghcr.io/mickeyzzc/security-collector-exporter:0.1.0

--privileged is needed to read system files like /etc/shadow, /proc, etc.

Some useful startup parameters:

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Only collect LISTEN state ports (default)
./security-exporter --collector.port-states="LISTEN"

# Also collect ESTABLISHED connections
./security-exporter --collector.port-states="LISTEN,ESTABLISHED"

# Only collect enabled services (default behavior)
./security-exporter --collector.services-enabled=true

# Combined filter: collect only services that are both enabled and running
./security-exporter --collector.services-enabled=true --collector.services-running=true

# Enable debug logs for troubleshooting
./security-exporter --log.level=debug

Add a scrape config on the Prometheus side:

yaml
1
2
3
4
scrape_configs:
  - job_name: 'security-exporter'
    static_configs:
      - targets: ['localhost:9102']

Example Alert Rules

The project includes a complete set of security compliance alert rules covering SSH, SELinux, firewall, password policy, and service management. Here are a few typical examples:

yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Root SSH login not disabled — critical
- alert: RootSSHLoginEnabled
  expr: linux_security_sshd_config_info{info_key="PermitRootLogin", info_value="yes"}
  labels:
    severity: critical

# SELinux not in enforcing mode
- alert: SELinuxNotEnforcing
  expr: linux_security_selinux_config{info_key="SELINUX", info_value=~"permissive|disabled"}
  labels:
    severity: warning

# Firewall configured but not running
- alert: FirewallNotRunning
  expr: linux_security_firewall_enabled{firewall_type!="none", is_running="false"} == 1
  labels:
    severity: warning

# Password validity exceeds 90 days
- alert: PasswordMaxDaysTooLong
  expr: linux_security_login_defs_info{info_key="PASS_MAX_DAYS", info_value="num"} > 90
  labels:
    severity: warning

You can even calculate a security compliance score (out of 100), weighting and aggregating all checks:

promql
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(
  (linux_security_sshd_config_info{info_key="PermitRootLogin", info_value="no"} or vector(0)) * 20 +
  (linux_security_selinux_config{info_key="SELINUX", info_value="enforcing"} or vector(0)) * 15 +
  (linux_security_firewall_enabled{firewall_type!="none"} == 1) * 10 +
  (linux_security_firewall_enabled{firewall_type!="none", is_running="true"} == 1) * 5 +
  ((linux_security_login_defs_info{info_key="PASS_MIN_LEN", info_value="num"} >= 10) or vector(0)) * 10 +
  ((linux_security_login_defs_info{info_key="PASS_MAX_DAYS", info_value="num"} <= 90) or vector(0)) * 10 +
  (linux_security_services_info{service_name="xwindow", is_running="false"} or vector(0)) * 5 +
  (count(linux_security_services_info{service_name=~"nfs|cups|bluetooth|avahi-daemon|rpcbind|postfix", is_running="true"}) == 0) * 10 +
  (linux_security_hosts_options_info{file="hosts.deny", service="ALL", host="ALL", action="deny"} or vector(0)) * 5 +
  (linux_security_last_patch_time{package_type!="unknown"} or vector(0)) * 5
)

Turn it into a Grafana dashboard panel for a quick view of which machines are non-compliant.

Technical Implementation

Pure Go implementation, with prometheus/client_golang as the only third-party dependency. No shell command stitching; security-related data is obtained by reading files under /proc, /etc as much as possible, reducing external command dependencies.

The architecture is straightforward:

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cmd/security-exporter/main.go     # Entry point, HTTP Server
internal/collector/                # Prometheus Collector implementation
internal/system/                   # Security check modules
  ├── account_info.go              # Accounts
  ├── network_info.go              # Network
  ├── service_info.go              # Services
  ├── process_info.go              # Process version detection (largest file)
  ├── selinux_detail.go            # SELinux
  └── ...
pkg/config/                        # Configuration management
pkg/logger/                        # Logging

Each system module is independent; an error in one module doesn’t affect collection in others.

Relationship with node_exporter

Not competitive but complementary. node_exporter handles basic OS metrics (CPU, memory, disk IO), while security-collector-exporter handles security configuration state. Running both together gives you a complete system health + security compliance view in your monitoring dashboards.

Project Repository

Code here: github.com/mickeyzzc/security-collector-exporter

v0.1.0 is the first stable version, supports Linux AMD64 and ARM64, Docker images published to GHCR. Future iterations will continue based on usage feedback. Feel free to file issues or submit PRs.