ClassNotFound exception when attempting to use DataflowRunner
Asked Answered
A

2

11

I'm trying to launch a Dataflow job on GCP using Apache Beam 0.6.0. I am compiling an uber jar using the shade plugin because I cannot launch the job using "mvn:execjava". I'm including this dependency:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
  <version>0.6.0-SNAPSHOT</version>
</dependency>

I am getting the following exception:

Exception in thread "main" java.lang.IllegalArgumentException: Unknown 'runner' specified 'DataflowRunner', supported pipeline runners [DirectRunner]
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1609)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.access$400(PipelineOptionsFactory.java:104)
    at org.apache.beam.sdk.options.PipelineOptionsFactory$Builder.as(PipelineOptionsFactory.java:289)
    at com.disney.dtss.desa.tools.SpannerSinkTest.main(SpannerSinkTest.java:116)
Caused by: java.lang.ClassNotFoundException: DataflowRunner
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.beam.sdk.options.PipelineOptionsFactory.parseObjects(PipelineOptionsFactory.java:1595)

Am I missing something else?

Appurtenant answered 21/3, 2017 at 20:13 Comment(2)
That is definitely the expected output if the DataflowRunner is not registered. Can you share anything more about your pom.xml, your mvn invocation, or perhaps a listing of the contents of your uber jar and how you invoke it?Brynn
I'm having the same issue. It works fine when I start the pipeline though mvn compile exec:java, when I build jar it fails. The uberjar contains the necessary classes.Carlson
T
10

try

mvn compile exec:java -Dexec.mainClass=Yourmain Class -Pdataflow-runner

*add -Pdataflow-runner at the last

Terrenceterrene answered 5/5, 2017 at 20:2 Comment(1)
In pom.xml, if the dependency is defined as part of a profile, make sure to specify the profile for the mvn command. The default WordCount example from Apache Beam does this for the DataflowRunner. If you don't care about profiles, just move the dependency definition to the <dependencies> section of the pom file.Deyo
F
4

Following @Andrew Nguonly's comment I copied the dependency for DataflowRunner to outer scope (to the <dependencies> tag) in the pom.xml file.

Basically added this:

<dependency>
  <groupId>org.apache.beam</groupId>
  <artifactId>beam-runners-google-cloud-dataflow-java</artifactId>
  <version>${beam.version}</version>
  <scope>runtime</scope>
</dependency>

Before the closing </dependencies> at pom.xml from the beam wordCount example.

Forestay answered 18/3, 2020 at 8:3 Comment(1)
For VSCode users, the above method might be the best bet as there isn't yet a clean way to switch profiles: github.com/microsoft/vscode-maven/issues/465Ache

© 2022 - 2024 — McMap. All rights reserved.