Can we connect spark with sql-server? If so, how? I am new to spark, I want to connect the server to spark and work directly from sql-server instead of uploading .txt or .csv file. Please help, Thank you.
Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.
//Spark 1.4.1
//Insert data from DataFrame
case class Conf(mykey: String, myvalue: String)
val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))
val df = data.toDF()
val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"
val table = "t2"
df.insertIntoJDBC(url, table, true)
//Load from database using SqlContext
val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}
tbl.show()
Some issue to consider are:
Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.
Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
// Spark 2.x
import org.apache.spark.SparkContext
// Create dataframe on top of SQLServer database table
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val jdbcDF = sqlContext.read.format("jdbc").option("driver" , "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
.option("url", "jdbc:sqlserver://XXXXX.com:port;databaseName=xxx") \
.option("dbtable", "(SELECT * FROM xxxx) tmp") \
.option("user", "xxx") \
.option("password", "xxx") \
.load()
// show sample records from data frame
jdbcDF.show(5)
Here are some code snippets. A DataFrame is used to create the table t2 and insert data. The SqlContext is used to load the data from the t2 table into a DataFrame. I added the spark.driver.extraClassPath and spark.executor.extraClassPath to my spark-default.conf file.
//Spark 1.4.1
//Insert data from DataFrame
case class Conf(mykey: String, myvalue: String)
val data = sc.parallelize( Seq(Conf("1", "Delaware"), Conf("2", "Virginia"), Conf("3", "Maryland"), Conf("4", "South Carolina") ))
val df = data.toDF()
val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"
val table = "t2"
df.insertIntoJDBC(url, table, true)
//Load from database using SqlContext
val url = "jdbc:sqlserver://wcarroll3:1433;database=mydb;user=ReportUser;password=ReportUser"
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver";
val tbl = { sqlContext.load("jdbc", Map( "url" -> url, "driver" -> driver, "dbtable" -> "t2", "partitionColumn" -> "mykey", "lowerBound" -> "0", "upperBound" -> "100", "numPartitions" -> "1" ))}
tbl.show()
Some issue to consider are:
Make sure firewall ports are open for port 1433. If using Microsoft Azure SQL Server DB, tables require a primary key. Some of the methods create the table, but Spark's code is not creating the primary key so the table creation fails.
Other details to take care: https://docs.databricks.com/spark/latest/data-sources/sql-databases.html
Inside SQL Server Big Data Clusters, Spark its also included. Starting in SQL Server 2019 version , big data clusters allows for large-scale, near real-time processing of data over the HDFS file system and other data sources. It also leverages the Apache Spark framework which is integrated into one environment for management, monitoring, and security of your environment.
Weissman, B.& Van de Laar E. (2019). SQL Server Big Data Clusters: Early First Edition Based on Release Candidate 1. The Netherlands: Apress.
© 2022 - 2024 — McMap. All rights reserved.