How do I use Spark ORC indexes?

About

Asked 29/10, 2017 at 21:9 Answered 23/2, 2020 at 9:15

What is the option to enable orc indexing from spark?

          df
            .write()
            .option("mode", "DROPMALFORMED")
            .option("compression", "snappy")
            .mode("overwrite")
            .format("orc")
            .option("index", "user_id")
            .save(...);

I'm making up .option("index", uid), what would I have to put there to index column "user_id" from orc.

Dyane answered 29/10, 2017 at 21:9 Comment(0)

Have you tried : .partitionBy("user_id") ?

 df
        .write()
        .option("mode", "DROPMALFORMED")
        .option("compression", "snappy")
        .mode("overwrite")
        .format("orc")
        .partitionBy("user_id")
        .save(...)

Pacifist answered 8/11, 2017 at 18:8 Comment(4)

I think partitionBy will create a new file per user, rather than create an index. But you're only one that answered so I give you the bounty. – Dyane 9/11, 2017 at 11:26

@Dyane i am researching on this. Will let you know soon. – Intermingle 9/11, 2017 at 21:41

@Achyuth, have you found any approach to create index in ORC file? I found nothing till today. It seems to me the only way to leverage index in ORC file is using Hive. Please correct me if it's wrong. Thanks! – Faun 26/2, 2018 at 19:45

@JamesGan Thats true, The file is already indexed on many thing, stats collumn offset etc. The best way to achieve the performance is by creating an orc files which are around 200 to 300 mb. – Intermingle 26/2, 2018 at 20:8

According to the original blogpost on bringing ORC support to Apache Spark, there is a configuration knob to turn on in your spark context to enable ORC indexes.

# enable filters in ORC
sqlContext.setConf("spark.sql.orc.filterPushdown", "true")

Reference: https://databricks.com/blog/2015/07/16/joint-blog-post-bringing-orc-support-into-apache-spark.html

Wadesworth answered 23/2, 2020 at 9:15 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags