orc - McMap

3

I am using spark 1.6.1 and I am trying to save a dataframe to an orc format. The problem I am facing is that the save method is very slow, and it takes about 6 minutes for 50M orc file on each exe...

apache-spark apache-spark-sql orc

Dulcle asked 22/7, 2016 at 16:13

3

Hadoop ORC file - How it works - How to fetch metadata

I am new to ORC file. I went through many blogs, but didn't get clear understanding. Please help and clarify below questions. Can I fetch schema from ORC file? I know in Avro, schema can fetched....

hadoop hive file-format orc

Gaekwar asked 7/5, 2015 at 7:32

3

Convert Pandas dataframe from/to ORC file

Is it possible to convert a Pandas dataframe from/to an ORC file? I can transform the df in a parquet file, but the library doesn't seem to have ORC support. Is there an available solution in Pytho...

python pandas orc

Reign asked 6/11, 2019 at 11:2

6

Parquet vs ORC vs ORC with Snappy

I am running a few tests on the storage formats available with Hive and using Parquet and ORC as major options. I included ORC once with default compression and once with Snappy. I have read many ...

hadoop hive parquet snappy orc

Mide asked 3/9, 2015 at 10:45

6

How to read an ORC file stored locally in Python Pandas?

Can I think of an ORC file as similar to a CSV file with column headings and row labels containing data? If so, can I somehow read it into a simple pandas dataframe? I am not that familiar with too...

python pandas pyspark data-science orc

Dreamy asked 19/10, 2018 at 9:33

4

Solved

Reading an ORC file in Java

How do you read an ORC file in Java? I'm wanting to read in a small file for some unit test output verification, but I can't find a solution.

java hadoop orc

Lantha asked 22/9, 2015 at 9:9

3

Solved

How do I Combine or Merge Small ORC files into Larger ORC file?

Most questions/answers on SO and the web discuss using Hive to combine a bunch of small ORC files into a larger one, however, my ORC files are log files which are separated by day and I need to kee...

java hive hdfs orc

Hudak asked 26/4, 2018 at 11:48

1

Solved

how does sortWithinPartitions sort?

After applying sortWithinPartitions to a df and writing the output to a table I'm getting a result I'm not sure how to interpret. df .select($"type", $"id", $"time") ....

apache-spark orc columnsorting snappy

Topo asked 8/3, 2021 at 17:13

2

Solved

How can I convert local ORC files to CSV?

I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...). How can I convert ORC to CSV?

csv orc

Orangutan asked 1/2, 2019 at 15:49

4

Solved

How to create a Schema file in Spark

I am trying to read a Schema file (which is a text file) and apply it to my CSV file without a header. Since I already have a schema file I don't want to use InferSchema option which is an overhead...

scala apache-spark-sql schema orc

Encyclopedic asked 24/5, 2018 at 4:8

5

Solved

Aggregating multiple columns with custom function in Spark

I was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns. I have a table like this of the type (name, item, price): john | tomato ...

scala apache-spark dataframe apache-spark-sql orc

Horlacher asked 9/6, 2016 at 23:38

2

Solved

How do I use Spark ORC indexes?

What is the option to enable orc indexing from spark? df .write() .option("mode", "DROPMALFORMED") .option("compression", "snappy") .mode("overwrite") .format("orc") .option("index", "user_...

apache-spark orc

Dyane asked 29/10, 2017 at 21:9

3

How to keep partition columns when reading in ORC files in Spark

When reading in an ORC file in Spark, if you specify the partition column in the path, that column will not be included in the dataset. For example, if we have val dfWithColumn = spark.read.orc("/...

apache-spark apache-spark-sql orc

Recourse asked 12/9, 2018 at 20:23

2

Solved

Why is Apache Orc RecordReader.searchArgument() not filtering correctly?

Here is a simple program that: Writes records into an Orc file Then tries to read the file using predicate pushdown (searchArgument) Questions: Is this the right way to use predicate push do...

java apache hadoop orc

Sakovich asked 22/6, 2017 at 6:9

1

Solved

Spark Structured Streaming Writestream to Hive ORC Partioned External Table

I am trying to use Spark Structured Streaming - writeStream API to write to an External Partitioned Hive table. CREATE EXTERNAL TABLE `XX`( `a` string, `b` string, `b` string, `happened` timestamp...

apache-spark hive spark-structured-streaming orc hive-partitions

Nellnella asked 11/8, 2018 at 22:29

1

Merge delta data into an external table using hive's merge statement

I have an external table mapped in Hive (v2.3.2 on EMR-5.11.0) that I need to update with new data around once a week. The merge consists of a conditional upsert statement. The table's location is...

hadoop hive emr acid orc

Cleres asked 2/1, 2018 at 13:7

1

Solved

Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention: file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a4...

hadoop apache-spark hive parquet orc

Argolis asked 8/3, 2018 at 4:57

0

Disable creation of .orc.crc file

I am working with the apache orc-core java api. I have noticed a couple of things and was wondering if there are options to control them Does not overwrite files. The call to OrcFile.createWriter...

java orc

Lenka asked 24/1, 2018 at 4:43

1

create hive table from orc file without specifying schema

I have a job to transfer hive tables between hadoop cluster. What I did was download the orc file from the source hadoop cluster and then upload the orc file into target hdfs cluster using the the ...

hive orc

Eachelle asked 13/9, 2016 at 9:51

2

Solved

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table, you get this: STORED AS INPUTFORMAT ‘org.apache.had...

hadoop hive hiveql orc hive-serde

Winegar asked 8/6, 2017 at 19:3

0

Sqoop import as ORC ERROR java.io.IOException: HCat exited with status 1

I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here Sqoop command: sqoop import -m 1 --connect <jdbc_url> --driver <database_dr...

hadoop hive sqoop orc

Gigi asked 22/4, 2016 at 0:9

1

Solved

Spark: Save Dataframe in ORC format

In the previous version, we used to have a 'saveAsOrcFile()' method on RDD. This is now gone! How do I save data in DataFrame in ORC File format? def main(args: Array[String]) { println("Creating ...

scala apache-spark apache-spark-sql orc

Proboscidean asked 16/9, 2015 at 19:13

orc Questions

Recommended topics

Hot tags