orc Questions
3
I am using spark 1.6.1 and I am trying to save a dataframe to an orc format.
The problem I am facing is that the save method is very slow, and it takes about 6 minutes for 50M orc file on each exe...
Dulcle asked 22/7, 2016 at 16:13
3
I am new to ORC file. I went through many blogs, but didn't get clear understanding. Please help and clarify below questions.
Can I fetch schema from ORC file? I know in Avro, schema can fetched....
Gaekwar asked 7/5, 2015 at 7:32
3
Is it possible to convert a Pandas dataframe from/to an ORC file? I can transform the df in a parquet file, but the library doesn't seem to have ORC support. Is there an available solution in Pytho...
6
6
Can I think of an ORC file as similar to a CSV file with column headings and row labels containing data? If so, can I somehow read it into a simple pandas dataframe? I am not that familiar with too...
Dreamy asked 19/10, 2018 at 9:33
4
Solved
How do you read an ORC file in Java? I'm wanting to read in a small file for some unit test output verification, but I can't find a solution.
3
Solved
Most questions/answers on SO and the web discuss using Hive to combine a bunch of small ORC files into a larger one, however, my ORC files are log files which are separated by day and I need to kee...
1
Solved
After applying sortWithinPartitions to a df and writing the output to a table I'm getting a result I'm not sure how to interpret.
df
.select($"type", $"id", $"time")
....
Topo asked 8/3, 2021 at 17:13
2
Solved
I have an ORC file on my local machine and I need any reasonable format from it (e.g. CSV, JSON, YAML, ...).
How can I convert ORC to CSV?
4
Solved
I am trying to read a Schema file (which is a text file) and apply it to my CSV file without a header. Since I already have a schema file I don't want to use InferSchema option which is an overhead...
Encyclopedic asked 24/5, 2018 at 4:8
5
Solved
I was wondering if there is some way to specify a custom aggregation function for spark dataframes over multiple columns.
I have a table like this of the type (name, item, price):
john | tomato ...
Horlacher asked 9/6, 2016 at 23:38
2
Solved
What is the option to enable orc indexing from spark?
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.format("orc")
.option("index", "user_...
Dyane asked 29/10, 2017 at 21:9
3
When reading in an ORC file in Spark, if you specify the partition column in the path, that column will not be included in the dataset. For example, if we have
val dfWithColumn = spark.read.orc("/...
Recourse asked 12/9, 2018 at 20:23
2
Solved
Here is a simple program that:
Writes records into an Orc file
Then tries to read the file using predicate pushdown (searchArgument)
Questions:
Is this the right way to use predicate push do...
1
Solved
I am trying to use Spark Structured Streaming - writeStream API to write to an External Partitioned Hive table.
CREATE EXTERNAL TABLE `XX`(
`a` string,
`b` string,
`b` string,
`happened` timestamp...
Nellnella asked 11/8, 2018 at 22:29
1
I have an external table mapped in Hive (v2.3.2 on EMR-5.11.0) that I need to update with new data around once a week. The merge consists of a conditional upsert statement.
The table's location is...
1
Solved
I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention:
file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a4...
Argolis asked 8/3, 2018 at 4:57
0
I am working with the apache orc-core java api. I have noticed a couple of things and was wondering if there are options to control them
Does not overwrite files. The call to OrcFile.createWriter...
1
I have a job to transfer hive tables between hadoop cluster.
What I did was download the orc file from the source hadoop cluster and then upload the orc file into target hdfs cluster using the the ...
2
Solved
Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.
Using show create table, you get this:
STORED AS INPUTFORMAT
‘org.apache.had...
Winegar asked 8/6, 2017 at 19:3
0
I am trying to import a table from Netezza DB using sqoop hcatlog ( see below) in ORC format as suggested here
Sqoop command:
sqoop import
-m 1
--connect <jdbc_url>
--driver <database_dr...
1
Solved
In the previous version, we used to have a 'saveAsOrcFile()' method on RDD. This is now gone! How do I save data in DataFrame in ORC File format?
def main(args: Array[String]) {
println("Creating ...
Proboscidean asked 16/9, 2015 at 19:13
1
© 2022 - 2024 — McMap. All rights reserved.