I have built a test EKS environment to apply Loki for a few weeks, so far I have created 10s of Loki’s pods using Monolithic mode, each Loki’s pod received logs from multiple ec2 instances.
However, I have found promtail keep returning error messages below,
error sending batch, will retry" status=500 error="server returned HTTP status 500 Internal Server Error (500): empty ring"
or
error sending batch, will retry" status=-1 error="Post \"loki:port/loki/api/v1/push\": context deadline exceeded"
When I check the logs from Loki, I have found similar error messages:
level=warn ts=2022-11-07T08:59:39.648738164Z caller=logging.go:86 traceID=414f3905fdec9c5b orgID=fake msg="POST /loki/api/v1/push (500) 5.863871ms Response: \"empty ring\\n\" ws: false; Content-Length: 267464; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; X-Amzn-Trace-Id: Root=id; X-Forwarded-For: IP; X-Forwarded-Port: PORT; X-Forwarded-Proto: http; "
level=warn ts=2022-11-07T09:23:23.193157476Z caller=logging.go:86 traceID=0a15f11b53b377ce orgID=fake msg="POST /loki/api/v1/push (500) 9.998004113s Response: \"context canceled\\n\" ws: false; Content-Length: 235297; Content-Type: application/x-protobuf; User-Agent: promtail/2.6.1; X-Amzn-Trace-Id: Root=id; X-Forwarded-For: IP; X-Forwarded-Port: PORT; X-Forwarded-Proto: http; "
After this issue started, Grafana failed to call resources on and off, which means Loki’s pod is reachable in a short period of time, started collecting logs busily, and failed again.
I am not sure why this situation happened in some of my Loki pods only as the config are nearly the same.
Can anyone know how to solve this problem? What does “empty ring” mean?
Thank you!
My config:
promtail.yaml
server:
http_listen_port: 9080
clients:
- url: http://ip_where_Loki_run:3100/loki/api/v1/push
positions:
filename: /usr/local/promtail/positions.yaml
scrape_configs:
- job_name: server_log
static_configs:
- targets:
- localhost
labels:
job: server_log
hostname: ab
__path__: /var/log/server.log
Loki.yaml
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
grpc_server_max_recv_msg_size: 104857600
grpc_server_max_send_msg_size: 104857600
http_server_read_timeout: 300s
http_server_write_timeout: 300s
http_server_idle_timeout: 300s
ingester:
wal:
enabled: true
dir: /loki/wal
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 3m
chunk_retain_period: 30s
chunk_encoding: lz4
max_transfer_retries: 0
chunk_target_size: 1048576
max_chunk_age: 1h
schema_config:
configs:
- from: 2022-10-05
store: boltdb-shipper
object_store: aws
schema: v12
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
aws:
bucketnames: bucketnames
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
access_key_id: access_key_id
secret_access_key: secret_access_key
sse_encryption: true
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
retention_enabled: true
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 720h
retention_period: 720h
per_stream_rate_limit: 15MB
per_stream_rate_limit_burst: 30MB
ingestion_rate_mb: 15
ingestion_burst_size_mb: 30
chunk_store_config:
max_look_back_period: 0s
querier:
query_ingesters_within: 0
engine:
max_look_back_period: 3m
query_scheduler:
max_outstanding_requests_per_tenant: 2048
query_range:
parallelise_shardable_queries: false
split_queries_by_interval: 0
frontend:
max_outstanding_per_tenant: 10240
ingester_client:
remote_timeout: 30s
context deadline exceeded
but see no corresponding messages in loki logs. Did you manage to address the issue? – Marden