azure-hdinsight Questions

2

I'm looking for a client jdbc driver that supports Spark SQL. I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC s...
Essie asked 9/6, 2016 at 18:27

3

Solved

I am using pyspark as code language. I added column to get filename with path. from pyspark.sql.functions import input_file_name data = data.withColumn("sourcefile",input_file_name()) I want t...
Meave asked 17/5, 2018 at 12:57

3

Solved

Problem I am trying to run a remote Spark Job through IntelliJ with a Spark HDInsight cluster (HDI 4.0). In my Spark application I am trying to read an input stream from a folder of parquet files f...

2

Solved

While programming for HDInsight I came across lines like $storageAccountKey = Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccountName | %{ $_.Key1 } I ...
Diagnosis asked 8/2, 2016 at 14:27

4

Solved

I am trying to use Hadoop of Azure HDInsight. I am logging into the cluster by ssh and running the following hadoop jar jar_name class_name wasb://[email protected]/inputdir wasb://[email&#16...
Sham asked 9/11, 2015 at 1:15

3

Solved

I need to read some JSON data from a web service thats providing REST interfaces to query the data from my SPARK SQL code for analysis. I am able to read a JSON stored in the blob store and use it....
Quinnquinol asked 9/5, 2016 at 10:6

6

Solved

As I recently started mingling around with Windows Azure, I've came up to a situation where, which one to go for between the Block Blob & Page Blob. I'm currently in progress of uploading some ...
Meant asked 16/3, 2015 at 14:25

1

Solved

If no, is it possible to use WebHDFS API from HDInsight to connect with ADL Gen2?
Cockchafer asked 10/10, 2019 at 6:45

3

Situation: I've started a new job and been assigned the task of figuring out what to do with their sensor data table. It has 1.3 billion rows of sensor data. The data is pretty simple: basically ju...
Greathearted asked 10/1, 2016 at 18:31

4

Solved

As per title, I would like to request a calculation to a Spark cluster (local/HDInsight in Azure) and get the results back from a C# application. I acknowledged the existence of Livy which I under...
Estrus asked 30/6, 2017 at 13:47

3

Solved

I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calcu...
Gonagle asked 29/5, 2015 at 19:25

2

I'm trying to run a Spark-based application on an Azure HDInsight on-demand cluster, and am seeing lots of SparkExceptions (caused by ConcurrentModificationExceptions) being logged. The application...
Hukill asked 26/11, 2018 at 15:53

5

I was going through the Microsoft documents: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview I'm new to Azure Data lake and HDInsight. There is a statement in the...
Shekinah asked 4/6, 2018 at 11:48

1

I am trying to read in a avro file inside HDInsight Spark/Jupyter cluster but got u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/t...
Biplane asked 1/4, 2018 at 9:49

0

I have created HDI (3.6) Spark(2.1.0) cluster in Azure and installed my custom application. When I start my application, I am getting the following error in my custom application log. Error log:-...
Lil asked 28/4, 2017 at 9:48

1

Solved

I have data saved as parquet files in Azure blob storage. Data is partitioned by year, month, day and hour like: cont/data/year=2017/month=02/day=01/ I want to create external table in Hive using...
Pauperism asked 11/4, 2017 at 12:46

2

I have a timestamp field in a csv file that I load to a dataframe using spark csv library. The same piece of code works on my local machine with Spark 2.0 version but throws an error on Azure Horto...

2

We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by installing echo 'deb http...
Cytotaxonomy asked 7/7, 2016 at 20:32

3

Solved

This says the function quarter() was introduced in Hive 1.3 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions I am using the default version of H...
Kolkhoz asked 28/7, 2015 at 22:57

3

Solved

I am trying to automatically launch a Spark job on an HDInsight cluster from Microsoft Azure. I am aware that several methods exist to automate Hadoop job submission (provided by Azure itself), but...
Vituline asked 16/2, 2015 at 13:22

2

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other di...
Serpentiform asked 30/6, 2015 at 5:11

2

Solved

I would like to add a new column to a table, but only if that column does not already exist. This works if the column does not exist: ALTER TABLE MyTable ADD COLUMNS (mycolumn string); But when...
Summerlin asked 13/8, 2014 at 17:57

1

Solved

I am attempting to insert into a table by selecting from another: INSERT OVERWRITE TABLE testtable1 select * from testtable0 The error: Moving data to: wasb://{container}@{storage}.blob.core.wi...
Constrain asked 1/6, 2015 at 15:13

1

Solved

I have created a table: DROP TABLE IF EXISTS sampleout; CREATE EXTERNAL TABLE sampleout( id bigint, LNG FLOAT, LAT FLOAT, GMTDateTime TIMESTAMP, calculatedcolumn FLOAT ) ROW FORMAT DELIMIT...
Baranowski asked 1/6, 2015 at 17:58

3

Solved

I am very excited that HDInsight switched to Hadoop version 2, which supports Apache Spark through YARN. Apache Spark is a much better fitting parallel programming paradigm than MapReduce for the t...
Luff asked 10/7, 2014 at 9:14

© 2022 - 2024 — McMap. All rights reserved.