For example, I want to save a table, what is the difference between the two strategies?
bucketBy:
someDF.write.format("parquet")
.bucketBy(4, "country")
.mode(SaveMode.OverWrite)
.saveAsTable("someTable")
partitionBy:
someDF.write.format("parquet")
.partitionBy("country") # <-- here is the only difference
.mode(SaveMode.OverWrite)
.saveAsTable("someTable")
I guess, that bucketBy in first case creates 4 directories with countries, while partitionBy will create as many directories as many unique values in column "countries". is it correct understanding ?