Skip to content

Alerts

If your application is serving metrics, you can create alerts to notify your team when something happens. The team is notified in the Slack channel specified in NAIS Teams https://teams.<tenant>.cloud.nais.io

You can define alerts by using Kubernetes resources (PrometheusRule), as well as directly in Grafana (GUI based).

You will have a separate alertmanager for each environment available at https://alertmanager.<environment>.<tenant name>.cloud.nais.io/

Kubernetes resources

We use native Prometheus alert rules, and let Alertmanager handle the notifications.

You can define alerts by creating a PrometheusRule resource in your teams namespace.

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-alert
  namespace: <team namespace>
spec:
  groups:
  - name: my-alert
    rules:
    - alert: InstanceDown
      expr: count(up) == 0
      for: 5m
      annotations:
        consequence: Application is unavailable
        action: "`kubectl describe pod <podname>` -> `kubectl logs <podname>`"
        summary: |-
          This is a multi-line summary with
          linebreaks and everything. Here you can give a more detailed
          summary of what this alert is about
      labels:
        namespace: <team namespace> # required
        severity: critical

Apply this resource to your teams namespace by creating a file containing the content above with your own values and running kubectl apply -f <path to file>

This file should be added to your application repository, alongside nais.yaml and be deployed by your CI/CD pipeline using the nais deploy action.

You can see the alerts in the Alertmanager at https://alertmanager.<environment>.<tenant name>.cloud.nais.io/ and the defined rules in Prometheus at https://prometheus.<environment>.<tenant name>.cloud.nais.io/rules

Deployment

To automatically deploy your alerts to the cluster, you can use the nais deploy action. This action will deploy the alerts to the cluster when you change the alerts.yaml file in your repository.

name: Deploy alerts to NAIS
on:
  push:
    branches:
      - main
    paths:
      - 'alerts.yaml'
      - '.github/workflows/alerts.yaml'
jobs:
  apply-alerts:
    name: Apply alerts to cluster
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
      - name: deploy to dev
        uses: nais/deploy/actions/deploy@v1
        env:
          APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY }}
          CLUSTER: dev
          RESOURCE: /path/to/alerts.yaml
      - name: deploy to prod
        uses: nais/deploy/actions/deploy@v1
        env:
          APIKEY: ${{ secrets.NAIS_DEPLOY_APIKEY }}
          CLUSTER: prod
          RESOURCE: /path/to/alerts.yaml

How to write a good alert

Writing the expr

In order to minimize the feedback loop we suggest experimenting on the Prometheus server to find the right metric for your alert and the notification threshold. The Prometheus server can be found in each cluster, at https://prometheus.{env}.{tenant-name}.cloud.nais.io (e.g. https://prometheus.dev.nav.cloud.nais.io).

You can also visit the Alertmanager at https://alertmanager.{env}.{tenant-name}.cloud.nais.io (e.g. https://alertmanager.dev.nav.cloud.nais.io) to see which alerts are triggered now (you can also silence already triggered alerts).

for

How long time the expr must evaluate to true before firing.

When the expr first evaluates to true the alert will be in pending state for the duration specified.

Example values: 30s, 5m, 1h.

Severity

This will affect what color the notification gets. Possible values are critical (red), warning (yellow) and notice (green).

Consequence

Optionally describe ahead of time to the one receiving the alert what happens in the world when this alert fires.

Action

Optionally describe ahead of time to the one receiving the alert what is the best course of action to resolve this issue.

Summary

Optional longer description of the alert

Customizing

Each team namespace will have a default AlertmanagerConfig which will pickup alerts labeled namespace: <team namespace>. If you want to change anything about alerting for your team, e.g. the formatting of alerts, webhook used, ..., you can create a simular AlertmanagerConfig which is configured for different labels:

route:
  matchers:
    - name: team
      value: myteam
      matchType: =
    - name: app
      value: myapp
      matchType: =

Remember that these matchers will match against every alert in the cluster, so be sure to use values that will be unique for your team.
In your PrometheusRule also include the label alert_type: custom to be sure the default configuration doesn't pickup your alert.

For more information about slackConfigs and other posibilites see the Prometheus alerting documentation.


Last update: November 29, 2023
Created: November 29, 2023