0

I am using Prometheus Alertmanager to monitor dozens of hosts and hundreds of containers on these hosts. I need to receive notifications when any container goes down. I understand from the documentation that it’s possible to configure a rule for each container. However, this approach seems impractical given the large number of containers.

Could anyone guide me on how to efficiently configure such a rule? Any help would be greatly appreciated. Thank you.

I attempted to set up individual rules for each container as suggested in the Alertmanager documentation. However, given the large number of containers, this approach proved to be highly inefficient and difficult to manage.

1 Answer 1

0

First step is to define alert rules into prometheus.yml.

Add this container.rules to main config prometheus :

rule_files:
 - 'container.rules' 

Then fill the container.rules with this content :

sudo nano /etc/prometheus/container.rules

groups:
- name: cadvisor_alerts
  rules:
  - alert: ContainerDown
    expr: container_last_seen{job="cadvisor"} < time() - 300
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Container down on {{ $labels.instance }}"
      description: "Container {{ $labels.container_name }} on {{ $labels.instance }} has not been seen for more than 5 minutes."

This query checks if the container was last seen more than 5 minutes (300 seconds) ago, indicating that it might be down.

Now restart prometheus service :

sudo systemctl restart prometheus

You can check alerts in { http or https }://{ prometheus-url }:9090/alerts

Not the answer you're looking for? Browse other questions tagged or ask your own question.