POST /loki/api/v1/push (500) Response: empty ring and context canceled
Asked Answered
S

1

6

I have built a test EKS environment to apply Loki for a few weeks, so far I have created 10s of Loki’s pods using Monolithic mode, each Loki’s pod received logs from multiple ec2 instances.

However, I have found promtail keep returning error messages below,

error sending batch, will retry" status=500 error="server returned HTTP status 500 Internal Server Error (500): empty ring" 

or

error sending batch, will retry" status=-1 error="Post \"loki:port/loki/api/v1/push\": context deadline exceeded"

When I check the logs from Loki, I have found similar error messages:

level=warn ts=2022-11-07T08:59:39.648738164Z caller=logging.go:86 traceID=414f3905fdec9c5b orgID=fake msg="POST /loki/api/v1/push (500) 5.863871ms Response: \"empty ring\\n\" ws: false; Content-Length: 267464; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; X-Amzn-Trace-Id: Root=id; X-Forwarded-For: IP; X-Forwarded-Port: PORT; X-Forwarded-Proto: http; "

level=warn ts=2022-11-07T09:23:23.193157476Z caller=logging.go:86 traceID=0a15f11b53b377ce orgID=fake msg="POST /loki/api/v1/push (500) 9.998004113s Response: \"context canceled\\n\" ws: false; Content-Length: 235297; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; X-Amzn-Trace-Id: Root=id; X-Forwarded-For: IP; X-Forwarded-Port: PORT; X-Forwarded-Proto: http; "

After this issue started, Grafana failed to call resources on and off, which means Loki’s pod is reachable in a short period of time, started collecting logs busily, and failed again.

I am not sure why this situation happened in some of my Loki pods only as the config are nearly the same.

Can anyone know how to solve this problem? What does “empty ring” mean?

Thank you!

My config:

promtail.yaml

server:
  http_listen_port: 9080
clients:
  - url: http://ip_where_Loki_run:3100/loki/api/v1/push
positions:
  filename: /usr/local/promtail/positions.yaml
scrape_configs:
  - job_name: server_log
    static_configs:
      - targets:
          - localhost
        labels:
          job: server_log
          hostname: ab
          __path__: /var/log/server.log

Loki.yaml

auth_enabled: false 

server: 
  http_listen_port: 3100 
  grpc_listen_port: 9096
  grpc_server_max_recv_msg_size: 104857600
  grpc_server_max_send_msg_size: 104857600
  http_server_read_timeout: 300s
  http_server_write_timeout: 300s
  http_server_idle_timeout: 300s

ingester: 
  wal: 
    enabled: true 
    dir: /loki/wal 
  lifecycler: 
    ring: 
      kvstore: 
        store: inmemory
      replication_factor: 1 
    final_sleep: 0s 
  chunk_idle_period: 3m       
  chunk_retain_period: 30s
  chunk_encoding: lz4     
  max_transfer_retries: 0     
  chunk_target_size: 1048576  
  max_chunk_age: 1h           

schema_config: 
  configs: 
    - from: 2022-10-05
      store: boltdb-shipper 
      object_store: aws 
      schema: v12 
      index: 
        prefix: index_ 
        period: 24h 

storage_config: 
  boltdb_shipper: 
    active_index_directory: /loki/index
    cache_location: /loki/index_cache
    shared_store: s3 

  aws:
    bucketnames:  bucketnames
    endpoint: s3.us-west-2.amazonaws.com
    region: us-west-2
    access_key_id: access_key_id
    secret_access_key: secret_access_key
    sse_encryption: true

compactor: 
  working_directory: /loki/compactor 
  shared_store: s3 
  compaction_interval: 5m
  retention_enabled: true

limits_config: 
  reject_old_samples: true 
  reject_old_samples_max_age: 720h
  retention_period: 720h
  per_stream_rate_limit: 15MB
  per_stream_rate_limit_burst: 30MB
  ingestion_rate_mb: 15
  ingestion_burst_size_mb: 30

chunk_store_config: 
  max_look_back_period: 0s 

querier:
  query_ingesters_within: 0
  engine:
    max_look_back_period: 3m

query_scheduler:
  max_outstanding_requests_per_tenant: 2048

query_range:
  parallelise_shardable_queries: false
  split_queries_by_interval: 0

frontend:
  max_outstanding_per_tenant: 10240

ingester_client:
  remote_timeout: 30s
Senegal answered 7/11, 2022 at 10:6 Comment(2)
Having same error as you have: context deadline exceeded but see no corresponding messages in loki logs. Did you manage to address the issue?Marden
@Marden you may try to increase timeout. ref: github.com/grafana/loki/issues/5963Senegal
S
0

Finally I deleted the current EFS and let my Fargate pods attach to a new EFS, they run as usual again. Not sure if I have solved the issue.

Senegal answered 10/11, 2022 at 10:2 Comment(1)
Can you add more information? Have you made some changes to the config files? Maybe no even on purpose? Any other difference to what you posted above?Metrist

© 2022 - 2024 — McMap. All rights reserved.