Redis pod failing
Asked Answered
D

2

7

I have redis DB setup running on my minikube cluster. I have shutdown my minikube and started after 3 days and I can see my redis pod is failing to come up with below error from pod log

Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>.

Below is my Stateful Set yaml file for redis master deployed via a helm chart

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Please let me know your suggestions on how can I fix this.

Doykos answered 23/8, 2021 at 8:43 Comment(7)
Could you add your Pod / Deployment configuration ?Lindbom
Thanks @Lindbom for the help. I have added my STS yaml to the question which is deployed via helm chart.Doykos
Please provide your helm install options, additional configuration used for redis, kubectl describe pod <your redis pod> with all useful information from events section - looks like problem with config file validation during pod startup.Lorola
Also add your cluster version...Limiter
@Chandra Sekar Any progress with your issue?Lorola
Hey @Lorola I have deleted the entire redis db and recreated a fresh one as I couldn't able to recover from thisDoykos
did you try repair appendonly.aof file using helper pod redis-client as in the example provided below?Lorola
L
13

I am not an Redis expert but from what I can see:

kubectl describe pod red3-redis-master-0
...
Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>
...

Means that your appendonly.aof file was corrupted with invalid byte sequences in the middle.

How we can proceed if redis-master is not working?:

  • Verify pvc attached to the redis-master-pod:
kubectl get pvc

NAME                               STATUS   VOLUME                                    
redis-data-red3-redis-master-0     Bound    pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359  
  • Create new redis-client pod wit the same pvc redis-data-red3-redis-master-0:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: redis-client
spec:
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: redis-data-red3-redis-master-0
  containers:
    - name: redis
      image: docker.io/bitnami/redis:6.2.3-debian-10-r0
      command: ["/bin/bash"]
      args: ["-c", "sleep infinity"]
      volumeMounts:
        - mountPath: "/tmp"
          name: data
EOF
  • Backup your files:
kubectl cp redis-client:/tmp .
  • Repair appendonly.aof file:
kubectl exec -it redis-client -- /bin/bash

cd /tmp

# make copy of appendonly.aof file:
cp appendonly.aof appendonly.aofbackup

# verify appendonly.aof file:
redis-check-aof appendonly.aof

...
0x              38: Expected prefix '*', got: '"'
AOF analyzed: size=62, ok_up_to=56, ok_up_to_line=13, diff=6
AOF is not valid. Use the --fix option to try fixing it.
...

# repair appendonly.aof file:
redis-check-aof --fix appendonly.aof

# compare files using diff:
diff appendonly.aof appendonly.aofbackup

Note:

As per docs:

The best thing to do is to run the redis-check-aof utility, initially without the --fix option, then understand the problem, jump at the given offset in the file, and see if it is possible to manually repair the file: the AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.

In addition as described in the comments by @Miffa Young you can verify where your data is stored using k8s.io/minikube-hostpath provisioner:

kubectl get pv 
...
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                      
pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359   8Gi        RWO            Delete           Bound    default/redis-data-red3-redis-master-0     
...

kubectl describe pv pvc-cf59a0b2-a3ee-4f7f-9f07-8f4922518359
...
Source:
    Type:          HostPath (bare host directory volume)
    Path:          /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
...

Your redis instance is failing down because your appendonly.aof is malformed and stored permanently under this location.

You can ssh into your vm:

minikube -p redis ssh 
cd /tmp/hostpath-provisioner/default/redis-data-red3-redis-master-0
# from there you can backup/repair/remove your files:

Another solution is to install this chart using new name in this case new set of pv,pvc for redis StatefulSets will be created.

Lorola answered 26/8, 2021 at 12:50 Comment(0)
C
1
  • I think your redis is not quit Gracefully , so the AOF file is in a bad format What is AOF

  • you should repair aof file using a initcontainer by command (./redis-check-aof --fix .)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: test-redis
    meta.helm.sh/release-namespace: test
  generation: 1
  labels:
    app.kubernetes.io/component: master
    app.kubernetes.io/instance: test-redis
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: redis
    helm.sh/chart: redis-14.8.11
  name: test-redis-master
  namespace: test
  resourceVersion: "191902"
  uid: 3a4e541f-154f-4c54-a379-63974d90089e
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/component: master
      app.kubernetes.io/instance: test-redis
      app.kubernetes.io/name: redis
  serviceName: test-redis-headless
  template:
    metadata:
      annotations:
        checksum/configmap: dd1f90e0231e5f9ebd1f3f687d534d9ec53df571cba9c23274b749c01e5bc2bb
        checksum/health: xxxxx
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/managed-by: Helm
        app.kubernetes.io/name: redis
        helm.sh/chart: redis-14.8.11
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchLabels:
                  app.kubernetes.io/component: master
                  app.kubernetes.io/instance: test-redis
                  app.kubernetes.io/name: redis
              namespaces:
              - tyk
              topologyKey: kubernetes.io/hostname
            weight: 1
      initContainers:
      - name: repair-redis
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        command: ['sh', '-c', "redis-check-aof --fix  /data/appendonly.aof"]
      containers:
      - args:
        - -c
        - /opt/bitnami/scripts/start-scripts/start-master.sh
        command:
        - /bin/bash
        env:
        - name: BITNAMI_DEBUG
          value: "false"
        - name: REDIS_REPLICATION_MODE
          value: master
        - name: ALLOW_EMPTY_PASSWORD
          value: "no"
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: redis-password
              name: test-redis
        - name: REDIS_TLS_ENABLED
          value: "no"
        - name: REDIS_PORT
          value: "6379"
        image: docker.io/bitnami/redis:6.2.5-debian-10-r11
        imagePullPolicy: IfNotPresent
        livenessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_liveness_local.sh 5
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 6
        name: redis
        ports:
        - containerPort: 6379
          name: redis
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - sh
            - -c
            - /health/ping_readiness_local.sh 1
          failureThreshold: 5
          initialDelaySeconds: 20
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 2
        resources: {}
        securityContext:
          runAsUser: 1001
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /opt/bitnami/scripts/start-scripts
          name: start-scripts
        - mountPath: /health
          name: health
        - mountPath: /data
          name: redis-data
        - mountPath: /opt/bitnami/redis/mounted-etc
          name: config
        - mountPath: /opt/bitnami/redis/etc/
          name: redis-tmp-conf
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
      serviceAccount: test-redis
      serviceAccountName: test-redis
      terminationGracePeriodSeconds: 30
      volumes:
      - configMap:
          defaultMode: 493
          name: test-redis-scripts
        name: start-scripts
      - configMap:
          defaultMode: 493
          name: test-redis-health
        name: health
      - configMap:
          defaultMode: 420
          name: test-redis-configuration
        name: config
      - emptyDir: {}
        name: redis-tmp-conf
      - emptyDir: {}
        name: tmp
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: master
        app.kubernetes.io/instance: test-redis
        app.kubernetes.io/name: redis
      name: redis-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 8Gi
      volumeMode: Filesystem

Convulsive answered 24/8, 2021 at 6:41 Comment(9)
I am getting an error "Invalid value: "The edited file failed validation": ValidationError(StatefulSet.spec.template.spec): unknown field "initcontainers" in io.k8s.api.core.v1.PodSpec" when I add init containers field in the sts manifestDoykos
sorry , It‘s initContainers. I fixed this typo.Convulsive
After adding initContainers it is not starting saying Init:Error :(Doykos
kubectl logs your_podname -n your_amespace init : check initcontainer logsConvulsive
@Chandra Sekar did you try to fix appendonly.aof file with redis-check-aof bin?Lorola
If data in redis is not important and discardable, you can delete this workload, and recreate it.Convulsive
@Lorola how to do that when redis pod is not even startingDoykos
@MiffaYoung I even deleted and recreated STS still it is failing with same errorDoykos
You must delete the pvc mounted by redis; Though you recreate the redis, the redis pvc(data dir) is not deleted, so the bad aof file is still here;Convulsive

© 2022 - 2024 — McMap. All rights reserved.