How configure Recording and Alerting rules with Loki
Asked Answered
P

1

5

I am trying to configure Recording rule and according to documentation, it is not clear, how to set it up.

I configured rules.yml file in /loki/rules directory. According the doc Recording rules, I implement my own rule:

name: MyRules
interval: 1m
rules:
  - record: generator:requests:rate2m
    expr: |
      sum(
        rate({service="generator_generator"}[2m])
      )
    labels:
      cluster: "something"

At first, this does not make anything, no logs in Loki about wrong format, no metrics in Prometheus (remote write). After that, I copy this file also to directory rules-temp and also to the /loki/rules/fake/ directory, based on doc Ruler storage. From the doc, I am not sure, where this file should be located so I copied it everywhere. The result was the same - no logs in Loki, nothing in Prometheus.

After day off, I started Loki and find out log:

2022-11-03T08:24:24.062210590Z level=error ts=2022-11-03T08:24:24.061854756Z caller=ruler.go:497 msg="unable to list rules" err="failed to list rule groups for user fake: failed to list rule group for user fake and namespace rules.yml: error parsing /loki/rules/fake/rules.yml: /loki/rules/fake/rules.yml: yaml: unmarshal errors:\n  line 1: field name not found in type rulefmt.RuleGroups\n  line 2: field interval not found in type rulefmt.RuleGroups\n  line 3: field rules not found in type rulefmt.RuleGroups"

This log was not there before, even when I restart Loki, it is not there, do not understand why. But I assume, Loki cannot parse my rules file. I found out corterx-tool for validating Loki rules. After few run, I ended up with new rules.yml file:

namespace: rules
groups:
    - name: MyRules
      interval: 1m
      rules:
        - record: generator:requests:rate1m
          expr: |-
            sum(rate({service="generator_generator"}[2m]))
          labels:
            cluster: something

It is quiet different from the one in docs, but It looks like its ok:

$ cortextool rules lint --backend=loki rules.yml
INFO[0000] SUCCESS: 1 rules found, 0 linted expressions

After this small success I run Loki again but no result in Loki logs or Prometheus. I tried even set wrong prometheus remote write addres but Loki does not log anything about this error.

My current configuration of Loki ruler:

ruler:
  alertmanager_url: http://localhost:9093
  remote_write:
    enabled: true
    client:
      url: http://prometheus:9090/api/v1/write

Prometheus runs in default configuration.

Versions: Loki: 2.6.1 Prometheus: v2.39.1

Questions:

  1. Where should be rule file located and whats the difference between /rules, /rules-temp and /rules/<tenant-id>?
  2. What is the format of rules and rule files? Can there be multiple files?
  3. Why the log about rules does not occur in Loki logs (wrong Prometheus url, wrong rules.yml format)?
  4. How to properly configure rules (both Recording and Alerting) in Loki? Documentation looks very unclear.
  5. How to debug this configuration and setup? Basically, I do not know where to check, if something is wrong with no logs or any information about it.

Thanks for any tips.

Presumption answered 5/11, 2022 at 16:37 Comment(0)
E
6

Q: Where should be rule file located and whats the difference between /rules, /rules-temp and /rules/?

A: That depends on your ruler's storage backend. You set the path in:

ruler:
  storage:
    type: local
    local:
      directory: <rules-path>

Then if you have a multi-tenant cluster the rule files should be stored in a subfolder <rules-path>/<tenant-id>. If you do not use multitenancy the rule files should be under <rules-path>/fake.

Q: What is the format of rules and rule files?

A: The format is the same as in Prometheus, but the expressions are in Loki's LogQL.

Q: Can there be multiple files?

A: Yes, however I have not worked with the local filesystem as the ruler's storage, so I cannot give more details. My Loki cluster is a multi-tenant one and uses S3 storage backend for the ruler. In such settings each tenant has a separate folder for their rules in S3 bucket and they upload their rules via Loki ruler API which can also be done with cortextool. When uploading rules via API there is a restriction - one rule group per request.

Q: Why the log about rules does not occur in Loki logs (wrong Prometheus url, wrong rules.yml format)?

A: If you set log_level: debug on your rulers you should be able to see messages related to recording/alerting rules processing in the logs.

Q: How to properly configure rules (both Recording and Alerting) in Loki?

A: Like I mentioned before it is the same format as in Prometheus, but a different query language.

Q: How to debug this configuration and setup? Basically, I do not know where to check, if something is wrong with no logs or any information about it.

A: Here's a few ideas that I can give you:

  • Test your rules expressions in Grafana > Explore against your Loki datasource.
  • Enable debug log level in Loki and receiving Prometheus server.
  • Enable ruler API on Loki and check how your rules are set up by sending a get request to http://<loki-ruler>:<loki-port>/loki/api/v1/rules.
Egidius answered 12/12, 2022 at 20:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.