orc Questions

3

I am using spark 1.6.1 and I am trying to save a dataframe to an orc format. The problem I am facing is that the save method is very slow, and it takes about 6 minutes for 50M orc file on each exe...
Dulcle asked 22/7, 2016 at 16:13

3

I am new to ORC file. I went through many blogs, but didn't get clear understanding. Please help and clarify below questions. Can I fetch schema from ORC file? I know in Avro, schema can fetched....
Gaekwar asked 7/5, 2015 at 7:32

3

Is it possible to convert a Pandas dataframe from/to an ORC file? I can transform the df in a parquet file, but the library doesn't seem to have ORC support. Is there an available solution in Pytho...
Reign asked 6/11, 2019 at 11:2

6

I am running a few tests on the storage formats available with Hive and using Parquet and ORC as major options. I included ORC once with default compression and once with Snappy. I have read many ...
Mide asked 3/9, 2015 at 10:45

6

Can I think of an ORC file as similar to a CSV file with column headings and row labels containing data? If so, can I somehow read it into a simple pandas dataframe? I am not that familiar with too...
Dreamy asked 19/10, 2018 at 9:33

4

Solved

How do you read an ORC file in Java? I'm wanting to read in a small file for some unit test output verification, but I can't find a solution.
Lantha asked 22/9, 2015 at 9:9

3

Solved

Most questions/answers on SO and the web discuss using Hive to combine a bunch of small ORC files into a larger one, however, my ORC files are log files which are separated by day and I need to kee...
Hudak asked 26/4, 2018 at 11:48

1

Solved

After applying sortWithinPartitions to a df and writing the output to a table I'm getting a result I'm not sure how to interpret. df .select($"type", $"id", $"time") ....
Topo asked 8/3, 2021 at 17:13

2

Solved

I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...). How can I convert ORC to CSV?
Orangutan asked 1/2, 2019 at 15:49

4

Solved

I am trying to read a Schema file (which is a text file) and apply it to my CSV file without a header. Since I already have a schema file I don't want to use InferSchema option which is an overhead...
Encyclopedic asked 24/5, 2018 at 4:8

5

Solved

I was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns. I have a table like this of the type (name, item, price): john | tomato ...
Horlacher asked 9/6, 2016 at 23:38

2

Solved

What is the option to enable orc indexing from spark? df .write() .option("mode", "DROPMALFORMED") .option("compression", "snappy") .mode("overwrite") .format("orc") .option("index", "user_...
Dyane asked 29/10, 2017 at 21:9

3

When reading in an ORC file in Spark, if you specify the partition column in the path, that column will not be included in the dataset. For example, if we have val dfWithColumn = spark.read.orc("/...
Recourse asked 12/9, 2018 at 20:23

2

Solved

Here is a simple program that: Writes records into an Orc file Then tries to read the file using predicate pushdown (searchArgument) Questions: Is this the right way to use predicate push do...
Sakovich asked 22/6, 2017 at 6:9

1

Solved

I am trying to use Spark Structured Streaming - writeStream API to write to an External Partitioned Hive table. CREATE EXTERNAL TABLE `XX`( `a` string, `b` string, `b` string, `happened` timestamp...

1

I have an external table mapped in Hive (v2.3.2 on EMR-5.11.0) that I need to update with new data around once a week. The merge consists of a conditional upsert statement. The table's location is...
Cleres asked 2/1, 2018 at 13:7

1

Solved

I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention: file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a4...
Argolis asked 8/3, 2018 at 4:57

0

I am working with the apache orc-core java api. I have noticed a couple of things and was wondering if there are options to control them Does not overwrite files. The call to OrcFile.createWriter...
Lenka asked 24/1, 2018 at 4:43

1

I have a job to transfer hive tables between hadoop cluster. What I did was download the orc file from the source hadoop cluster and then upload the orc file into target hdfs cluster using the the ...
Eachelle asked 13/9, 2016 at 9:51

2

Solved

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table, you get this: STORED AS INPUTFORMAT ‘org.apache.had...
Winegar asked 8/6, 2017 at 19:3

0

I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here Sqoop command: sqoop import -m 1 --connect <jdbc_url> --driver <database_dr...
Gigi asked 22/4, 2016 at 0:9

1

Solved

In the previous version, we used to have a 'saveAsOrcFile()' method on RDD. This is now gone! How do I save data in DataFrame in ORC File format? def main(args: Array[String]) { println("Creating ...
Proboscidean asked 16/9, 2015 at 19:13
1

© 2022 - 2024 — McMap. All rights reserved.