Is there a Spark SQL jdbc driver?
Asked Answered
E

2

9

I'm looking for a client jdbc driver that supports Spark SQL.

I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC so I can use third-party SQL clients (e.g. SQuirreL, SQL Explorer, etc.) instead of the notebook interface.

I found an ODBC driver from Microsoft but this doesn't help me with java-based SQL clients. I also tried downloading the Hive jdbc driver from my cluster, but the Hive JDBC driver does not appear to support more advance SQL features that Spark does. For example, the Hive driver complains about not supporting join statements that are not equajoins, where I know that this is a supported feature of Spark because I've executed the same SQL in Jupyter successfully.

Essie answered 9/6, 2016 at 18:27 Comment(7)
Questions asking for recommendations or help with finding a library or another off-site resources are off topic.Tropology
simba.com/drivers/spark-jdbc-odbc Simba’s Apache Spark ODBC and JDBC Drivers efficiently map SQL to Spark SQL by transforming an application’s SQL query into the equivalent form in Spark SQL, enabling direct standard SQL-92 access to Apache Spark distributions. TheMadelyn
I would try the hive jdbc driver to talk to it.Rachele
@Madelyn - Simba driver is expensive, and I was hoping for for something that's part of the platform. Sounds like this is not available today, and although the hive driver ships as part of the stack, there is no spark jdbc driver available in a similar capacity.Essie
@Rachele - Problem with the hive driver is that it doesn't accept the broader SQL features supported today by Spark. I'm confused why the hive jdbc driver is included as downloadable component on the server, but nothing similar on the spark sql side. Maybe it's just a matter of time?...Essie
I submitted an HDInsight feature request here: feedback.azure.com/forums/34192--general-feedback/suggestions/…Essie
so when you start beeline up that comes with spark this is what the java command looks like /usr/jdk64/jdk1.7.0_67/bin/java -cp $SPARK_HOME/conf/:$SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:$SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar:$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar:$SPARK_HOME/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/hadoop-client/conf/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.hive.beeline.BeeLine maybe one of these jars has the magic in themRachele
F
1

the Hive JDBC driver does not appear to support more advance SQL features that Spark does

Regardless of the support that it provides, the Spark Thrift Server is fully compatible with Hive/Beeline's JDBC connection.

Therefore, that is the JAR you need to use. I have verified this works in DBVisualizer.

The alternative solution would be to run Spark code in your Java clients (non-third party tools) directly and skip the need for the JDBC connection.

Fathom answered 7/7, 2017 at 17:47 Comment(8)
How to run Spark code in your Java clients ? How are the queries submitted?Chengtu
You just compile and run it... Feel free to post your own question outside the comments to get more in depth answersFathom
I am not sure how one can just compile and run without going through spark-submit? Spark-submit has its own class loader which is not the default java class loader.Chengtu
I've setup both IntelliJ and Eclipse for Java/Scala and Hue/Jupyter/Zeppelin for Python/Scala/R. They don't use spark-submitFathom
are you running your java code from outside the cluster or inside the cluster ? after compilation ? I know you can use Livy to connect spark as rest service from outside the cluster how did you achieved it without a JDBC driver ?Compton
@sri Well, you can only run code after compilation. JDBC has nothing to do with adding Spark code into an existing JVM application.Fathom
@cricket_007 , my question is how do you get data from hive table querying outside the cluster using spark on client (windows for example) ? you have to use hive or spark JDBC drivers right ? we right now connect impala from outside the cluster using impala JDBC driver and j certificates from the cluster what we are connectingCompton
@sri There is no "Spark" JDBC driver. The Hive JDBC driver connects to the Spark ThriftServer, which is linked to in my question. You can connect Tableau or other BI tools to that, for example.Fathom
K
0

I think the correct answer to the question is to download the driver from databricks https://www.databricks.com/spark/odbc-drivers-download

Kora answered 25/9, 2024 at 23:53 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.