Categorisation of alerts in Prometheus and grafana

Question

I want to do categorisation of alerts in Prometheus and grafana if cpu usage is between 70 and 75 then medium , if cpu usage is between 75 and 80 medium and usage is greater than 80 then critical please help me how can I achieve this

I already have a condition for to calculate in percentage however when I do and between query it fails Please guide how to proceed this

If what you are talking about is Prometheus alerts (configured in alerts.yml of Prometheus), you'd need to create three different alerts. Not 100% sure, but I think same is true for Grafana. — markalex, Commented Jun 8 at 18:55

Moein Tavakoli · Accepted Answer · 2024-06-11 10:51:34Z

First step is to define alert rules into prometheus.yml.

Add this alert.rules to main config prometheus :

rule_files:
 - 'alert.rules'

Then fill the alert.rules with this content :

sudo nano /etc/prometheus/alert.rules

groups:
- name: NodeExporter
  rules:
  - alert: VMOutOfMemory
    expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: VM out of memory (instance {{ $labels.instance }})
      description: "Node memory is filling up (< 10% left)\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: CPUMediumUsage70to75
    expr: (100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 70 and (100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) <= 75
    for: 2m
    labels:
      severity: medium
    annotations:
      summary: CPU usage between 70% and 75% (instance {{ $labels.instance }})
      description: "CPU usage is between 70% and 75%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: CPUMediumUsage75to80
    expr: (100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 75 and (100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) <= 80
    for: 2m
    labels:
      severity: medium
    annotations:
      summary: CPU usage between 75% and 80% (instance {{ $labels.instance }})
      description: "CPU usage is between 75% and 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

  - alert: CPUCriticalUsage
    expr: (100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: CPU usage above 80% (instance {{ $labels.instance }})
      description: "CPU usage is above 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

Now restart prometheus service :

sudo systemctl restart prometheus

You can check alerts in { http or https }://{ prometheus-url }:9090/alerts

Collectives™ on Stack Overflow

Categorisation of alerts in Prometheus and grafana

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
prometheus
grafana
promql
prometheus-alertmanager
grafana-alerts
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged prometheusgrafanapromqlprometheus-alertmanagergrafana-alerts or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
prometheus
grafana
promql
prometheus-alertmanager
grafana-alerts
or ask your own question.