Flume sink to HDFS error: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument
Asked Answered
D

5

7

With:

  • Java 1.8.0_231
  • Hadoop 3.2.1
  • Flume 1.8.0

Have created a hdfs service on 9000 port.

jps:

11688 DataNode
10120 Jps
11465 NameNode
11964 SecondaryNameNode
12621 NodeManager
12239 ResourceManager

Flume conf:

agent1.channels.memory-channel.type=memory
agent1.sources.tail-source.type=exec
agent1.sources.tail-source.command=tail -F /var/log/nginx/access.log
agent1.sources.tail-source.channels=memory-channel

#hdfs sink
agent1.sinks.hdfs-sink.channel=memory-channel
agent1.sinks.hdfs-sink.type=hdfs
agent1.sinks.hdfs-sink.hdfs.path=hdfs://cluster01:9000/system.log
agent1.sinks.hdfs-sink.hdfs.fileType=DataStream
agent1.channels=memory-channel
agent1.sources=tail-source
agent1.sinks=log-sink hdfs-sink

Then start flume:

./bin/flume-ng agent --conf conf -conf-file conf/test1.conf --name agent1 -Dflume.root.logger=INFO,console

Then meet error:

Info: Including Hadoop libraries found via (/usr/local/hadoop-3.2.1/bin/hadoop) for HDFS access
...
2019-11-04 14:48:24,818 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: hdfs-sink started
2019-11-04 14:48:28,823 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-11-04 14:48:28,836 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1679)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:226)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:401)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)

I have searched for a while but haven't found same error on net. Is there any advice to solve this problem?

Decameter answered 4/11, 2019 at 6:51 Comment(0)
D
7

That may caused by lib/guava.

I removed lib/guava-11.0.2.jar, and restart flume, found it works.

outputs:

2019-11-04 16:52:58,062 (hdfs-hdfs-sink-call-runner-0) [WARN - org.apache.hadoop.util.NativeCodeLoader.<clinit>(NativeCodeLoader.java:60)] Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-11-04 16:53:01,532 (Thread-9) [INFO - org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:239)] SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

But I still don't know which version of guava it using now.

Decameter answered 4/11, 2019 at 9:16 Comment(0)
I
2

I had the same issue. It seems to be a bug in flume. It references a class name that does not exist in that version of guava

Inalterable answered 5/2, 2020 at 20:59 Comment(0)
M
2

Replace guava-11.x.x.jar file with guava-27.x.x.jar from hadoop 3 common library, this will work

hadoop-3.3.0/share/hadoop/common/lib/guava-27.0-jre.jar put this file in your flume library, don't forget to delete older version from flume library first

Matriarchate answered 24/1, 2021 at 4:11 Comment(0)
B
1

As others said, there is clash between guava-11 (hadoop2/flume 1.8.0/1.9.0) and guava-27 (hadoop3).

Other answers don't explain the root cause of the issue: the script under $FLUME_HOME/bin/flume-ng puts into flume classpath all the jars in our hadoop distribution if $HADOOP_HOME environment variable is set.

Few words on why the suggested actions "fix" the problem: deleting $FLUME_HOME/lib/guava-11.0.2.jar leaves only guava-27.0-jre.jar, no more clash.

So, there is no need to copy it under $FLUME_HOME/lib, and it's no bug from Flume, just a version incompatibility because Flume did not upgrade Guava, while Hadoop 3 did.

I did not dig into the details of the changes between those guava versions, it might happen that everything works fine until it does not (for instance, if there is any backward incompatible change between the two).

So, before using this "fix" in production environment, I suggest to test extensively to reduce the risk of unexpected problems.

The best solution would be to wait (or contribute) a new Flume version where Guava is upgrade to v27.

Bamberg answered 2/2, 2021 at 16:39 Comment(1)
Thanks for the description. I just had to rename the library for it to start working correctly. mv guava-11.0.2.jar guava-11.0.2.jar.back. I also moved the log4j library to remove the duplicated binding warnings mv log4j-slf4j-impl-2.17.1.jar log4j-slf4j-impl-2.17.1.jar.back. Btw, this is in the context of a GCP Dataproc VM instance with a single node.Unwarranted
O
0

I do agree with Alessandro S.

Flume communicates with HDFS via the HDFS APIs, and it doesn't matter which version the hadoop platform runs with if the APIs do not change which is the most cases. Actually flume is build with some specific version of hadoop library. The problem is that you use the wrong hadoop library version to run flume。

So just use hadoop library from version 2.x.x to run your 1.8.0 flume

Occupant answered 9/5, 2021 at 8:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.