Spark-sqlserver connection

M

3

9

Can we connect spark with sql-server? If so, how? I am new to spark, I want to connect the server to spark and work directly from sql-server instead of uploading .txt or .csv file. Please help, Thank you.

Melantha answered 17/1, 2018 at 7:12 Comment(0)

I

4

Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.

//Spark 1.4.1

//Insert data from DataFrame

case class Conf(mykey: String, myvalue: String)

val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))

val df = data.toDF()

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val table = "t2"

df.insertIntoJDBC(url, table, true)

//Load from database using SqlContext

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}

tbl.show()

Some issue to consider are:

Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.

Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html

source: https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/

Ingrate answered 18/1, 2018 at 2:36 Comment(2)

I'm getting an exception for this---->>scala> Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver") java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver – Melantha 18/1, 2018 at 9:25

I hope you have downloaded sql jdbc driver and put it in class path. Else download it from microsoft.com/en-in/download/details.aspx?id=11774 – Ingrate 18/1, 2018 at 9:37

P

10

// Spark 2.x
import org.apache.spark.SparkContext

// Create dataframe on top of SQLServer database table
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val jdbcDF = sqlContext.read.format("jdbc").option("driver" , "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
           .option("url", "jdbc:sqlserver://XXXXX.com:port;databaseName=xxx") \
           .option("dbtable", "(SELECT * FROM xxxx) tmp") \
           .option("user", "xxx") \
           .option("password", "xxx") \
           .load()

// show sample records from data frame

jdbcDF.show(5)

Picket answered 29/10, 2018 at 15:24 Comment(4)

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Carner 29/10, 2018 at 15:42

If you can see in code, its simple, here first I am creating the required Spark SQL context using Spark context and once you have SparkSQL context is ready then i have used same SparkSQL context(sqlcontext) to connect to Microsoft SQLServer database, where I have provided the driver details such as JDBC url, database table and user credential etc. and this will create Spark data frame on top of it. – Picket 29/10, 2018 at 16:15

class SQLContext is deprecated: Use SparkSession.builder instead – Guinness 30/5, 2019 at 23:33

Hi, i am getting below error when running this code: The driver could not establish a secure connection to SQL Server by using Secure Sockets Layer (SSL) encryption. Can you help me? – Oconnell 3/12, 2020 at 15:35

I

4

Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.

//Spark 1.4.1

//Insert data from DataFrame

case class Conf(mykey: String, myvalue: String)

val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))

val df = data.toDF()

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val table = "t2"

df.insertIntoJDBC(url, table, true)

//Load from database using SqlContext

val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"

val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";

val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}

tbl.show()

Some issue to consider are:

Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.

Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html

source: https://blogs.msdn.microsoft.com/bigdatasupport/2015/10/22/how-to-allow-spark-to-access-microsoft-sql-server/

Ingrate answered 18/1, 2018 at 2:36 Comment(2)

I'm getting an exception for this---->>scala> Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver") java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver – Melantha 18/1, 2018 at 9:25

I hope you have downloaded sql jdbc driver and put it in class path. Else download it from microsoft.com/en-in/download/details.aspx?id=11774 – Ingrate 18/1, 2018 at 9:37

B

0

Inside SQL Server Big Data Clusters, Spark its also included. Starting in SQL Server 2019 version , big data clusters allows for large-scale, near real-time processing of data over the HDFS file system and other data sources. It also leverages the Apache Spark framework which is integrated into one environment for management, monitoring, and security of your environment.

Weissman, B.& Van de Laar E. (2019). SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1. The Netherlands: Apress.

Buckjumper answered 19/1, 2020 at 2:43 Comment(0)

Recommended topics

Hot tags