Is there a Spark SQL jdbc driver?

Asked 9/6, 2016 at 18:27 Answered 25/9, 2024 at 23:53

apache-spark jdbc apache-spark-sql azure-hdinsight

I'm looking for a client jdbc driver that supports Spark SQL.

I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC so I can use third-party SQL clients (e.g. SQuirreL, SQL Explorer, etc.) instead of the notebook interface.

I found an ODBC driver from Microsoft but this doesn't help me with java-based SQL clients. I also tried downloading the Hive jdbc driver from my cluster, but the Hive JDBC driver does not appear to support more advance SQL features that Spark does. For example, the Hive driver complains about not supporting join statements that are not equajoins, where I know that this is a supported feature of Spark because I've executed the same SQL in Jupyter successfully.

Essie answered 9/6, 2016 at 18:27 Comment(7)

Questions asking for recommendations or help with finding a library or another off-site resources are off topic. – Tropology 9/6, 2016 at 18:42

simba.com/drivers/spark-jdbc-odbc Simba’s Apache Spark ODBC and JDBC Drivers efficiently map SQL to Spark SQL by transforming an application’s SQL query into the equivalent form in Spark SQL, enabling direct standard SQL-92 access to Apache Spark distributions. The – Madelyn 10/6, 2016 at 7:6

I would try the hive jdbc driver to talk to it. – Rachele 10/6, 2016 at 14:7

@Madelyn - Simba driver is expensive, and I was hoping for for something that's part of the platform. Sounds like this is not available today, and although the hive driver ships as part of the stack, there is no spark jdbc driver available in a similar capacity. – Essie 13/6, 2016 at 2:58

@Rachele - Problem with the hive driver is that it doesn't accept the broader SQL features supported today by Spark. I'm confused why the hive jdbc driver is included as downloadable component on the server, but nothing similar on the spark sql side. Maybe it's just a matter of time?... – Essie 13/6, 2016 at 3:1

I submitted an HDInsight feature request here: feedback.azure.com/forums/34192--general-feedback/suggestions/… – Essie 13/6, 2016 at 3:44

so when you start beeline up that comes with spark this is what the java command looks like

/usr/jdk64/jdk1.7.0_67/bin/java -cp $SPARK_HOME/conf/:$SPARK_HOME/lib/spark-assembly-1.6.1-hadoop2.6.0.jar:$SPARK_HOME/lib/datanucleus-api-jdo-3.2.6.jar:$SPARK_HOME/lib/datanucleus-rdbms-3.2.9.jar:$SPARK_HOME/lib/datanucleus-core-3.2.10.jar:/usr/hdp/current/hadoop-client/conf/ -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.hive.beeline.BeeLine

maybe one of these jars has the magic in them – Rachele 13/6, 2016 at 17:58

the Hive JDBC driver does not appear to support more advance SQL features that Spark does

Regardless of the support that it provides, the Spark Thrift Server is fully compatible with Hive/Beeline's JDBC connection.

Therefore, that is the JAR you need to use. I have verified this works in DBVisualizer.

The alternative solution would be to run Spark code in your Java clients (non-third party tools) directly and skip the need for the JDBC connection.

Fathom answered 7/7, 2017 at 17:47 Comment(8)

How to run Spark code in your Java clients ? How are the queries submitted? – Chengtu 20/8, 2017 at 0:40

You just compile and run it... Feel free to post your own question outside the comments to get more in depth answers – Fathom 20/8, 2017 at 2:0

I am not sure how one can just compile and run without going through spark-submit? Spark-submit has its own class loader which is not the default java class loader. – Chengtu 29/8, 2017 at 9:35

I've setup both IntelliJ and Eclipse for Java/Scala and Hue/Jupyter/Zeppelin for Python/Scala/R. They don't use spark-submit – Fathom 29/8, 2017 at 16:37

are you running your java code from outside the cluster or inside the cluster ? after compilation ? I know you can use Livy to connect spark as rest service from outside the cluster how did you achieved it without a JDBC driver ? – Compton 17/4, 2018 at 21:27

@sri Well, you can only run code after compilation. JDBC has nothing to do with adding Spark code into an existing JVM application. – Fathom 17/4, 2018 at 22:33

@cricket_007 , my question is how do you get data from hive table querying outside the cluster using spark on client (windows for example) ? you have to use hive or spark JDBC drivers right ? we right now connect impala from outside the cluster using impala JDBC driver and j certificates from the cluster what we are connecting – Compton 18/4, 2018 at 18:53

@sri There is no "Spark" JDBC driver. The Hive JDBC driver connects to the Spark ThriftServer, which is linked to in my question. You can connect Tableau or other BI tools to that, for example. – Fathom 18/4, 2018 at 19:58

I think the correct answer to the question is to download the driver from databricks https://www.databricks.com/spark/odbc-drivers-download

Kora answered 25/9, 2024 at 23:53 Comment(0)

Recommended topics

Hot tags