Kafka 1.0 stops with FATAL SHUTDOWN error. Logs directory failed
Asked Answered
H

10

30

I have just upgraded to Kafka 1.0 and zookeeper 3.4.10.At first, it all started fine. Stand - alone producer and consumer worked as expected. After I've ran my code for about 10 minutes, Kafka fails with this error:

[2017-11-07 16:48:01,304] INFO Stopping serving logs in dir C:\Kafka\kafka_2.12-1.0.0\kafka-logs (kafka.log.LogManager)

[2017-11-07 16:48:01,320] FATAL Shutdown broker because all log dirs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs have failed (kafka.log.LogManager)

I have reinstalled and reconfigured Kafka 1.0 again, the same thing happened. If I try to restart, the same error occurs.

Deleting log files helps to start Kafka, but it fails again after the short run.

I have been running 0.10.2 version for a long while, and never encountered anything like this, it was very stable over the long periods of time.

I have tried to find a solution and followed instructions in the documentation.

This is not yet a production environment, it is fairly simple setup, one producer, one consumer reading from one topic.

I am not sure if this could have anything to do with zookeeper.

**Update: ** the issue has been posted at Apache JIRA board The consensus so far seems to be that it is a Windows issue.

Hanan answered 7/11, 2017 at 22:16 Comment(10)
Windows is not a supported platform for Kafka brokers. Similar issues are reported on Windows (link1, link2). Feel free to file a bug and provide details hereEmerick
Version 0.10.2.1 worked just fine on Windows, we are still running an instance on a different server. Thank you for the link.Hanan
I am facing exactly the same problem here. I am using AWS efs file system to store the kafka log files. My error log -> Caused by: java.nio.file.FileSystemException: /var/lib/kafka/data/ksql_transient_8376289768731246768_1513675960541-KSTREAM-REDUCE-STATE-STORE-0000000003-changelog-1.a9edc755278d425e9227bb03eb0cd55f-delete/.nfs937861751206a94a00000fa2: Device or resource busyHoward
Looks like the only solution at this point when this happens is to delete all temporal files from tmp folder.Induct
David, thanks for the comment. Which tmp folder do you refer to? Can you add your path?Hanan
Deleting the contents of the kafka-logs directory did the trick for me. Problem is, this happens every time I start Kafka.Valuator
Or this one, https://mcmap.net/q/394203/-apache-kafka-failed-to-acquire-lock-on-file-lock-in-tmp-kafka-logs, works for older versions of Kafka, haven't tested extensively with 1.0. That would be removing .lock file in kafka-logs folder, where ever that folder is on your Windows machine.Hanan
Try looking at your java version, I produced this error with jdk1.8(32bit), than changed java version to JRE9(64bit) which solved this issue.Singly
Havvy Liu, thank you for suggestion, we are already running 64 bit version of JavaHanan
I have the same issue. I install the Apache Kafka in the WSL environment on Windows 10 Home. Try to delete a topic, the Kafka broker crashed. I had to delete both the data of Zookeeper and Kafka Broker. I run them on Java SDK 11.Chinatown
K
40

Ran into this issue as well, and only clearing the kafka-logs did not work. You'll also have to clear zookeeper.

Steps to resolve:

  1. Make sure to stop zookeeper.
  2. Take a look at your server.properties file and locate the logs directory under the following entry.

    Example:
    log.dirs=/tmp/kafka-logs/
    
  3. Delete the log directory and its contents. Kafka will recreate the directory once it's started again.

  4. Take a look at the zookeeper.properties file and locate the data directory under the following entry.

    Example:
    dataDir=/tmp/zookeeper
    
  5. Delete the data directory and its contents. Zookeeper will recreate the directory once it's started again.

  6. Start zookeeper.

    <KAFKA_HOME>bin/zookeeper-server-start.sh -daemon <KAFKA_HOME>config/zookeeper.properties
    
  7. Start the kakfa broker.

    <KAFKA_HOME>bin/kafka-server-start.sh -daemon <KAFKA_HOME>config/server.properties
    
  8. Verify the broker has started with no issues by looking at the logs/kafkaServer.out log file.

Klemens answered 6/11, 2018 at 19:57 Comment(5)
Thanks, this is a compilation of all the steps in previous answers, and yes, this works when testing, I have tried it more than once. However, it does not resolve an issue when Kafka is in production and fails, because it requires manual intervention and deletion of all log files, and thus all data from the stream. Automatic restart will fail, so this is not a permanent solution that resolves the issue.Hanan
This is a temporary fix. This problem continues to happen over and over. Is there really no actual fix for this?Lobo
@Lobo I also kept running into this issue after posting the solution. I found two issues: When I was killing the process I was using a kill -9 <process_id>, switched that to kill -s TERM <process_id>, and every time I had to kill the process it didn't cause the issue again. Also, I was running out of disk space and didn't even realize it. So far, have been running since late November until now with no issues, even if I had to kill the process for one reason or another during testing.Klemens
Thanks a lot I ran into this issue when I started Kafka from a different account in Windows. I'm not sure if that is what caused the issue but your solution sure did fix it.Powerboat
Probably the sigkill is the issue - I was doing the same - haven't tried your solution but best would be to run bin/windows/kafka-server-stop.bat. Anyway this is a bug and should be posted to them @TeilaRei, in the sense that if sever fails (as in an outage) then kafka should recover. haven't tried with later versions of kafka maybe fixed already?Clabo
P
7

I've tried all the solutions like

  • Clearing Kafka Logs and Zookeeper Data (issue reoccurred after creating new topic)
  • Changing log.dirs path from forward slash "/" to backward slash "\" (like log.dirs=C:\kafka_2.12-2.1.1\data\kafka ) folder named C:\kafka_2.12-2.1.1\kafka_2.12-2.1.1datakafka was created and the issue did stop and the issue was resolved.

Finally I found this link, you'll get it if you google kafka log.dirs windows

This is on Dzone you'll get it if you google kafka log.dirs windows

Powerboat answered 13/4, 2019 at 12:35 Comment(0)
D
2

Just clean the logs in C:\Kafka\kafka_2.12-1.0.0\kafka-logs and restart kafka

Descendent answered 19/10, 2018 at 7:51 Comment(2)
It's only a temporary solution, and not acceptable in production. After the manual clean-up (which is already negating one of Kafka's main functionalities) it will start all right, but it will never automatically restart if Kafka fails e.g. there was a power outage or an upgrade was implemented. Also, your data will be lost.Hanan
It works. Maybe it is temporary, but it is ok if you are just discovering Kafka and need to quickly get around the problemHawkweed
S
1

If at all, you are trying to execute in Windows machine, try changing path in windows way for parameter log.dirs (like log.dirs=C:\some_path\some_path_kafLogs) in server.properties in /config folder.

By default, this path will be in unix way (like /unix/path/).

This worked for me in Windows machine.

Sasin answered 12/10, 2018 at 11:15 Comment(3)
I have done that from the start, tried different paths on different machines. Kafka starts fine, but if it restarts, it can't find these files, or it sees them as locked. Tried this with Kafka 1.0 and now with 2.0.Hanan
Setting the path to the following: log.dirs=C:\kafka\kafka_2.11-2.1.0\kafka-logs Caused this error: ERROR Shutdown broker because all log dirs in C:\Kafka\kafka_2.11-2.1.0\kafkakafka_2.11-2.1.0kafka-logs have failed (kafka.log.LogManager)Lobo
This is not a problem that occurs while setup, it occurs suddenlyFebrifuge
C
1

So this seems to be a windows issue.

https://issues.apache.org/jira/browse/KAFKA-6188

The JIRA is resolved, and there is an unmerged patch attached to it.

https://github.com/apache/kafka/pull/6403

so your options are:

  • get it running on windows and build it with the patch
  • run it in a unix style filesystem (linux or mac)
  • perhaps running it on docker in windows is worth a shot
Chastain answered 7/2, 2020 at 16:11 Comment(2)
Thanks for sharing this. Have you had a chance to test any of these solutions?Hanan
not yet. I will update once I have tested the branch. I have docker on windows running and will try that. I wanted to see if I can get away with increasing the retention time to avoid the crash but that did not help. Running it in WSL does not help so much I can tell.Chastain
S
1

The problem is in a concurrent working with log files of kafka. The task is a delaying of external log files changing between all Kafka threads and

Topic configuration can help:

Map<String, String> config = new HashMap<>();
config.put(CLEANUP_POLICY_CONFIG, CLEANUP_POLICY_COMPACT);
config.put(FILE_DELETE_DELAY_MS_CONFIG, "3600000");
config.put(DELETE_RETENTION_MS_CONFIG, "864000000");
config.put(RETENTION_MS_CONFIG, "86400000");
Seventieth answered 23/4, 2020 at 10:41 Comment(1)
Are you saying that the error is only for compacted topics?Lefevre
F
0

What worked for me was deleting both kafka and zookeeper log directories then configuring my log directories path in both kafka and zookeeper server.properties files (can be found in kafka/conf/server.properties) from the usual slash '/' to a backslash '\'

Firedrake answered 8/3, 2020 at 20:0 Comment(0)
C
0

on windows changing to path separators '' resolved the issue, each required a double backslash ' C:\\path\\logs

Compelling answered 20/6, 2021 at 10:27 Comment(0)
J
0

If above all methods don't work in your case or you have already done everything correctly. Please try to change broker.id in server.properties so this particular error should be gone.

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
Jungian answered 2/5 at 13:22 Comment(0)
L
-2

Simply delete all the logs from :

C:\tmp\kafka-logs

and restart zookeeper and kafka server.

Lucic answered 10/11, 2020 at 16:20 Comment(1)
Thanks for the suggestion, it is not a permanent solution when system is in production. That does indeed work when done manually or by script, but requires stopping all the processes and eliminating the benefit of using Kafka as self-maintaining streamHanan

© 2022 - 2024 — McMap. All rights reserved.