How to use Avro on HDInsight Spark/Jupyter?
Asked Answered
B

1

6

I am trying to read in a avro file inside HDInsight Spark/Jupyter cluster but got

u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'
Traceback (most recent call last):
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/readwriter.py", line 159, in load
    return self._df(self._jreader.load(path))
  File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/hdp/current/spark2-client/python/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;'

df = spark.read.format("com.databricks.spark.avro").load("wasb://[email protected]/...")

How do I resolve this? It seems like I need to install the package but how can I do it on HDInsight?

Biplane answered 1/4, 2018 at 9:49 Comment(0)
V
5

You just need to follow the below article

https://learn.microsoft.com/en-in/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages

For HDInsight 3.3 and HDInsight 3.4

You will add below cell in your notebook

%%configure 
{ "packages":["com.databricks:spark-avro_2.10:0.1"] }

For HDInsight 3.5

You will add below cell in your notebook

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.10:0.1" }}

For HDInsight 3.6

You will add below cell in your notebook

%%configure
{ "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}
Victorvictoria answered 9/4, 2018 at 6:44 Comment(2)
I run on HdInsight 3.6 and for me I had to use a newer version of the avro package: %%configure { "conf": {"spark.jars.packages": "com.databricks:spark-avro_2.11:4.0.0" }}Supervisor
@HaimBendanan, thanks for the feedback. Updated the answer with the details. Hopefully I kept the 3.5 version still the same, so another feedback may be needed by someone else who uses 3.5Victorvictoria

© 2022 - 2024 — McMap. All rights reserved.