apache spark: akka version error by build jar with all dependencies
Asked Answered
B

4

7

i have build a jar file from my spark app with maven (mvn clean compile assembly:single) and the following pom file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>mgm.tp.bigdata</groupId>
  <artifactId>ma-spark</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>ma-spark</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <repositories>
    <repository>
      <id>cloudera</id>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>
  </repositories>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.1.0-cdh5.2.5</version>
    </dependency>
    <dependency>
        <groupId>mgm.tp.bigdata</groupId>
        <artifactId>ma-commons</artifactId>
        <version>0.0.1-SNAPSHOT</version>
    </dependency>
  </dependencies>

  <build>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <configuration>
        <archive>
          <manifest>
            <mainClass>mgm.tp.bigdata.ma_spark.SparkMain</mainClass>
          </manifest>
        </archive>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
    </plugin>
  </plugins>
</build>
</project>

if i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar on terminal, i get the following error message:

VirtualBox:~/Schreibtisch$ java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar
2015-Jun-02 12:53:36,348 [main] org.apache.spark.util.Utils
 WARN  - Your hostname, proewer-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface eth0)
2015-Jun-02 12:53:36,350 [main] org.apache.spark.util.Utils
 WARN  - Set SPARK_LOCAL_IP if you need to bind to another address
2015-Jun-02 12:53:36,401 [main] org.apache.spark.SecurityManager
 INFO  - Changing view acls to: proewer
2015-Jun-02 12:53:36,402 [main] org.apache.spark.SecurityManager
 INFO  - Changing modify acls to: proewer
2015-Jun-02 12:53:36,403 [main] org.apache.spark.SecurityManager
 INFO  - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(proewer); users with modify permissions: Set(proewer)
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'
    at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:115)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:136)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:142)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:150)
    at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:155)
    at com.typesafe.config.impl.SimpleConfig.getString(SimpleConfig.java:197)
    at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:136)
    at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:470)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:111)
    at akka.actor.ActorSystem$.apply(ActorSystem.scala:104)
    at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
    at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
    at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1454)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1450)
    at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:156)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:203)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:53)
    at mgm.tp.bigdata.ma_spark.SparkMain.main(SparkMain.java:38)

what i do wrong?

best regards, paul

Bautista answered 2/6, 2015 at 13:21 Comment(0)
S
12

This is what you are doing wrong :

i run my app with java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar

Once you have your application build, your should launch it using the spark-submit script. This script takes care of setting up the classpath with Spark and its dependencies, and can support different cluster managers and deploy modes that Spark supports:

./bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

I strongly advice your to read the official documentation about Submitting Application.

Spurt answered 2/6, 2015 at 16:19 Comment(13)
i want run as java app on clusterConoid
This is how it's done! I use it on daily basis. I'll not repeat the answer in the comment. read the doc in the link I gave in my answer!Spurt
how i can test my ap in local mode?Conoid
you can't set your master as 'local[2]' per example. Again it's on the documentation page in my answer. please valid then answer if it solved your problem. Thanks!Spurt
maybe you know, how i can run the submit script on cloudera live cluster?Conoid
The answer is too long, I can't write it in a comment.Spurt
You CAN run java -jar ma-spark-0.0.1-SNAPSHOT-jar-with-dependencies.jar . spark-submit does many configuration and classpath settings for you so it's easier and hence documented by spark. But you can certainly do all those configs and classpath setting via sparkconf and sparkcontext yourself. But I agree unless one has strong reason to do that just use spark-submitDelapaz
Thanks for the answer. It worked partially. Now I am stuck at this error ------"java.sql.SQLException: No suitable driver found for jdbc:mysql:"Conclusion
@Conclusion This is a different issue. I've also written an answer on it tought. Can you try to look it up in my answer ?Spurt
where is that answer ?Conclusion
this really helped, struggled with this for 6 hours+Irby
This answer doesn't work if you run jar outside the cluster.Peculation
What does "running jar outside the cluster" means ? Just to be sure that we are talking about the same thing. Did you build an uber jar ?Spurt
W
7

It is most likely because the akka conf file from akka jar got overridden or missed while packaging the fat jar.

You can try another plug-in called maven-shade-plugin. And in the pom.xml you need to specify how to solve conflicts of resources with the same name. Below is an example -

             <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.1</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <minimizeJar>false</minimizeJar>
                            <createDependencyReducedPom>false</createDependencyReducedPom>
                            <artifactSet>
                                <includes>
                                    <!-- Include here the dependencies you want to be packed in your fat jar -->
                                    <include>my.package.etc....:*</include>
                                </includes>
                            </artifactSet>
                            <filters>
                                <filter>
                                    <artifact>*:*</artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SF</exclude>
                                        <exclude>META-INF/*.DSA</exclude>
                                        <exclude>META-INF/*.RSA</exclude>
                                    </excludes>
                                </filter>
                            </filters>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.conf</resource>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

Please note the <transformers> section where it is instructing the shade plugin to append the content, instead of replacing.

Woe answered 3/6, 2015 at 0:32 Comment(0)
P
4

This worked for me.

 <plugin>
      <groupId>org.apache.maven.plugins</groupId>
      <artifactId>maven-shade-plugin</artifactId>
      <version>1.5</version>
      <executions>
        <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>allinone</shadedClassifierName>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <filters>
                <filter>
                  <artifact>*:*</artifact>
                  <excludes>
                    <exclude>META-INF/*.SF</exclude>
                    <exclude>META-INF/*.DSA</exclude>
                    <exclude>META-INF/*.RSA</exclude>
                  </excludes>
                </filter>
              </filters>
          <transformers>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>reference.conf</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.handlers</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
              <resource>META-INF/spring.schemas</resource>
            </transformer>
            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <manifestEntries>
                <Main-Class>com.echoed.chamber.Main</Main-Class>
              </manifestEntries>
            </transformer>
          </transformers>
        </configuration>
          </execution>
        </executions>
    </plugin>
Pottery answered 22/8, 2016 at 10:49 Comment(0)
L
1

ConfigException$Missing error indicates that akka config file i.e., reference.conf file is not bundled in application jar file. Reason could be that when there are multiple files available with the same name in different dependent jar's, default strategy will check to see if they all of are same. If not, then it'll omit that file.

I had the same issue and I resolved it as follows:

Generate merged reference.conf using AppendingTransformer: By merged reference.conf file, what I mean is that all the dependent modules such as akka-core, akka-http, akka-remoting etc containing resource named reference.conf are appended together by AppendingTransformer. We add AppendingTransformer in pom file as follows:

 <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
     <resource>reference.conf</resource>
 </transformer>

mvn clean install will now generate fat jar with merged reference.conf file.

Still same error: spark-submit <main-class> <app.jar> still gave the same error when i deployed my spark-app in EMR.

Reason: Since HDFS is the configured filesystem, Spark jobs on EMR cluster reads from HDFS by default. So, the file you want to use must already exist in HDFS. I've added reference.conf file to hdfs using following approach:

1. Extract reference.conf file from app.jar into /tmp folder
    `cd /tmp`
    `jar xvf path_to_application.jar reference.conf` 
2. Copy extracted reference.conf from local-path (in this case /tmp) to HDFS-path (ex: /user/hadoop)
    `hdfs dfs -put /tmp/reference.conf /user/hadoop`
3. Load config as follows:
   `val parsedConfig = ConfigFactory.parseFile(new File("/user/hadoop/reference.conf"))`                                   
   `val config = COnfigFactory.load(par)`   

Alternate solution:

  • Extract reference.conf file from app.jar file and copy it on all the nodes of EMR cluster at the same path for drivers and executors.
  • ConfigFactory.parseFile(new File(“file:///tmp/reference.conf”)) will now read reference.conf from local file system. Hope that helps and saves some debugging time for you guys!!
Lubin answered 3/9, 2017 at 11:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.