Getting sometimes NullPointerException while saving into cassandra
Asked Answered
W

1

0

I have following method to write into cassandra some time it saving data fine. When I run it again , sometimes it is throwing NullPointerException Not sure what is going wrong here ... Can you please help me.

'
  @throws(classOf[IOException])
  def writeDfToCassandra(o_model_family:DataFrame , keyspace:String, columnFamilyName: String) = {
    logger.info(s"writeDfToCassandra")

    o_model_family.write.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> columnFamilyName, "keyspace" -> keyspace ))
    .mode(SaveMode.Append)
    .save()
  }

'
18/10/29 05:23:56 ERROR BMValsProcessor: java.lang.NullPointerException
    at java.util.regex.Matcher.getTextLength(Matcher.java:1283)
    at java.util.regex.Matcher.reset(Matcher.java:309)
    at java.util.regex.Matcher.<init>(Matcher.java:229)
    at java.util.regex.Pattern.matcher(Pattern.java:1093)
    at scala.util.matching.Regex.findFirstIn(Regex.scala:388)
    at org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2698)
    at org.apache.spark.util.Utils$$anonfun$redact$1$$anonfun$apply$15.apply(Utils.scala:2698)
    at scala.Option.orElse(Option.scala:289)
    at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2698)
    at org.apache.spark.util.Utils$$anonfun$redact$1.apply(Utils.scala:2696)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractTraversable.map(Traversable.scala:104)
    at org.apache.spark.util.Utils$.redact(Utils.scala:2696)
    at org.apache.spark.util.Utils$.redact(Utils.scala:2663)
    at org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:1650)
    at org.apache.spark.sql.internal.SQLConf$$anonfun$redactOptions$1.apply(SQLConf.scala:1650)
    at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
    at scala.collection.immutable.List.foldLeft(List.scala:84)
    at org.apache.spark.sql.internal.SQLConf.redactOptions(SQLConf.scala:1650)
    at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.simpleString(SaveIntoDataSourceCommand.scala:52)
    at org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:178)
    at org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
    at org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:198)
    at org.apache.spark.sql.execution.QueryExecution$$anonfun$4.apply(QueryExecution.scala:198)
    at org.apache.spark.sql.execution.QueryExecution.stringOrError(QueryExecution.scala:100)
    at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:198)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
    at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:654)
    at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:273)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:267)
    at com.snp.utils.DbUtils$.writeDfToCassandra(DbUtils.scala:47)
Wandering answered 29/10, 2018 at 9:36 Comment(2)
@Alexott sir any clue what am I doing wrong hereWandering
@jrook sir any idea how to fix it ?Wandering
H
1

Oddly this is failing in the "redact" function of the Spark Utils. This is used on options that are potentially passed to Spark to remove sensitive data from UI's and such. I can't imagine why a null key-name would pop up in your SqlConf (since I believe you can only have Empty Strings) but I would check there. Could be a mutation of the conf while the method is being executed?

Highjack answered 29/10, 2018 at 14:57 Comment(9)
thank you for your quick reply sir. Me not sure about these "redact" settings.I am new to spark stuff... here is the config i set up .... gist.github.com/shatestest/7b0c0723d000a4e84d2aeb58352cf445 .... In one case i need to create and save RDD into cassandra ...for the I set config options apart from setCassandraConf...Wandering
sir any clue of what went wrong here and what is the fix. I am stuck here . I would be very thankful for fixing this issueWandering
That's what I said already, the error is a npe in the spark uis redaction code. It attempts to react things like password or secrets. It does this by matching a regex against certain config keys. Somehow one of these keys was nullHighjack
sir when i run for single query/process it is running fine. i.e. with same setting and configs. When these processors run them in parallel for map , i am getting this issue. i.e. ' 'gist.github.com/shatestest/…Wandering
i am getting this issue. i.e. ' val procs : LinkedHashMap[String, (DataFrameReader, SparkSession, String) => Unit] = getAllDefinedProcessors(); for ( key <- procs.keys){ procs.get(key).map{ println("process started for loading column family : " + key); fun => fun(ora_df_options_conf,spark,columnFamilyName) } 'gist.github.com/shatestest/…Wandering
sir I downloaded the spark-catalyst_2.11-2.3.1-sources.jar ...where should i look into ? i.e. any specific configuration parameters i need to check ? If you give some clue i will debug and try to provide work a round solutionWandering
sir i debugged it , there was table name going as null string , i fixed it...thanks a lot for your time , sorry to trouble you.Wandering
Sir you guys developed cassandra in Java lang...but spark-c*-connector in scala language why ? any specific reasons or advantages you find? in other words what issues you see with java in this use-case ?Wandering
Spark is written in Scala, so we wrote the connector in Scala. Some things are more difficult to do from Java because we still haven't flushed out Java wrappers for some of the scala things. It is possible to use Scala from java though so it's not a big deal most of the time.Highjack

© 2022 - 2024 — McMap. All rights reserved.