Context Deadline Exceeded - prometheus
Asked Answered
D

14

49

I have Prometheus configuration with many jobs where I am scraping metrics over HTTP. But I have one job where I need to scrape the metrics over HTTPS.

When I access:

https://ip-address:port/metrics

I can see the metrics. The job that I have added in the prometheus.yml configuration is:

- job_name: 'test-jvm-metrics'
    scheme: https
    static_configs:
      - targets: ['ip:port']

When I restart the Prometheus I can see an error on my target that says:

context deadline exceeded

I have read that maybe the scrape_timeout is the problem, but I have set it to 50 sec and still the same problem.

What can cause this problem and how to fix it? Thank you!

Demark answered 13/4, 2018 at 12:55 Comment(0)
T
12

I had a same problem in the past. In my case the problem was with the certificates and I fixed it with adding:

 tls_config:
      insecure_skip_verify: true

You can try it, maybe it will work.

Trichite answered 18/4, 2018 at 7:12 Comment(2)
It's not working for me. I have tried to put tls_config tag. Howerver the problem still the same :(Portal
My problem was the exact opposite, insecure_skip_verify was causing issues in redis plugin. Although insecure_skip_verify was a high level config not a child under the tls_config .Lustrate
B
42

Probably the default scrape_timeout value is too short for you

[ scrape_timeout: <duration> | default = 10s ]

Set a bigger value for scrape_timeout.

scrape_configs:
  - job_name: 'prometheus'

    scrape_interval: 5m
    scrape_timeout: 1m

Take a look here https://github.com/prometheus/prometheus/issues/1438

Brigand answered 10/12, 2019 at 11:10 Comment(2)
That's so weird! The parameter scrape_timeout doesn't exist in my target job; so I added it, now prometheus fails to start, with message: prometheus.service: Main process exited, code=exited, status=2/INVALIDARGUMENT.Overdo
I also tried adding that parameter to global section, same error.Overdo
T
12

I had a same problem in the past. In my case the problem was with the certificates and I fixed it with adding:

 tls_config:
      insecure_skip_verify: true

You can try it, maybe it will work.

Trichite answered 18/4, 2018 at 7:12 Comment(2)
It's not working for me. I have tried to put tls_config tag. Howerver the problem still the same :(Portal
My problem was the exact opposite, insecure_skip_verify was causing issues in redis plugin. Although insecure_skip_verify was a high level config not a child under the tls_config .Lustrate
J
7

I had a similar problem, so I tried to extend my scrape_timeout but it didn't do anything - using promtool, however, explained the problem

My problematic job looked like this:

- job_name: 'slow_fella'
  scrape_interval: 10s
  scrape_timeout: 90s
  static_configs:
  - targets: ['192.168.1.152:9100']
    labels:
      alias: sloooow    

check your config in /etc/prometheus dir, type this:

promtool check config prometheus.yml

Result explains the problem and indicates how to solve it:

Checking prometheus.yml
  FAILED: parsing YAML file prometheus.yml: scrape timeout greater than scrape interval for scrape config with job name "slow_fella"

Just ensure that your scrape_timeout is long enough to accommodate your required scrape_interval.

Jacquelinejacquelyn answered 4/5, 2020 at 17:39 Comment(1)
I did this as scrape_interval: 5m scrape_timeout: 1m .But the problem is same. After checking the promtool config it says SUCCESS: prometheus.yml is valid prometheus config file syntax. But the thing is curl data of metrics are visible.(ip:port/metrics)Magician
F
2

This can be happened when the prometheus server can't reach out to the scraping endpoints maybe of firewall denied rules. Just check hitting the url in a browser with <url>:9100 (here 9100 is the node_exporter service running port`) and check if you still can access?

Ferous answered 26/7, 2020 at 20:45 Comment(0)
L
1

I was facing this issue due to max connections reached. I increased the max_connections parameter in database and released some connections. Then Prometheus was able to scrape metrics again.

Lakeshialakey answered 16/7, 2021 at 6:16 Comment(0)
D
1

I have added scrape_interval: 100s and scrape_timeout: 90s my issue is fixed and it working fine. my prometheus.yml file is

  - job_name: "localhost"
  scrape_interval: 100s
  scrape_timeout: 90s
  strong textstatic_configs:
  - targets: ["192.168.0.2:9104"]

Note: make sure the scrape timeout is greater than the scrape interval

Dishonest answered 21/3, 2024 at 12:15 Comment(1)
Do you mean the other way round? scrape_interval needs to be greater than the scrape_timeout?Mustachio
T
0

in my case it was issue with IPv6. I have blocked IPv6 with ip6tables, but it also blocked prometheus traffic. Correct IPv6 settings solved issue for me

Trisomic answered 30/6, 2019 at 9:43 Comment(1)
Can you elaborate this ? How did you check this and fixed ?Haydenhaydn
H
0

In my case I had accidentally put the wrong port on my Kubernetes Deployment manifest than what was defined in the service associated with it as well as the Prometheus target.

Hallock answered 22/10, 2019 at 1:40 Comment(0)
U
0

Increasing the timeout to 1m helped me to fix a similar issue

Unwritten answered 28/7, 2020 at 8:14 Comment(0)
O
0

We Started facing similar issue when we re-configured istio-system namespace and its istio-component. We also had prometheus install via prometheus-operator in monitoring namespace where istio-injection was enabled.

Restarting the promtheus components of the monitoring (istio-injection enabled) namespace resolved the issue.

Oneupmanship answered 25/6, 2021 at 13:10 Comment(0)
I
0

On AWS, for me opening port(for prometheus) in SG, Worked

Iq answered 22/8, 2022 at 7:46 Comment(0)
R
0

For me the problem was that i was running the exporter inside an ec2 instance and forgot to allow tcp connections for the listen port in the security group (also check the routing of your subnets). So the prometheus container could not connect to the listen port of my exporter's machine.

inside the prometheus container you can run wget exporterIp:listenPort, if it does not return anything/not connecting, there may be a network issue.

Ryder answered 26/8, 2023 at 10:30 Comment(0)
C
0

I had this problem trying to collect metrics from a container from a pod in Kubernetes. In my case, the problem is that there was a deny-all network policy preventing connecting to any container.

I solved that by creating yet another network policy opening that specific port in that specific container. Something like this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
  labels:
    app: my-app
  name: allow-metrics-port-in-app
  namespace: my-namespace
spec:
  policyTypes:
  - Ingress
  ingress:
  - ports:
    - port: 5000
      protocol: TCP
  podSelector:
    matchLabels:
      app: my-app

Cringe answered 8/2, 2024 at 20:0 Comment(0)
T
0

I had a similar issue with the firewall like @Jananath.

I'm collecting data from servers that are hosted at different cloud providers and one cloud provider did introduce additional firewalls that I didn't know about first:

  • In their cloud panel
  • One server additionally had Plesk installed which introduces another firewall (make sure you're logged in as root/admin to access that)

One additional recommendation here: create the firewall exception ONLY for the machine you're collecting the data from so that you don't allow that traffic for the whole internet like so if you're using ufw:

# do this on the machines where you installed the node exporter
sudo ufw allow from COLLECTING_MACHINE to any port 9100
Tiein answered 10/2, 2024 at 8:16 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.