How to reduce generating files of SQL "Alter Table/Partition Concatenate" in Hive?
Asked Answered
H

2

6

Hive version: 1.2.1

Configuration:

set hive.execution.engine=tez;
set hive.merge.mapredfiles=true;
set hive.merge.smallfiles.avgsize=256000000;
set hive.merge.tezfiles=true;

HQL:

ALTER TABLE `table_name` PARTITION (partion_name1 = 'val1', partion_name2='val2', partion_name3='val3', partion_name4='val4') CONCATENATE;

I use the HQL to merge files of specific table / partition. However, after execution there are still many files in output directory; and their size are far less than 256000000. So how to decrease the number of output files.

BTW, use MapReduce instead of Tez also didn't work.

Handley answered 16/10, 2015 at 9:1 Comment(1)
can you post the table DDL ?Olivaolivaceous
T
-2

You may set your reducer number to 1 then, it would only create one output file.

You may do it with the following;

set mapred.reduce.tasks=1
Tarbox answered 19/4, 2016 at 13:20 Comment(6)
Please check the comment: BTW, use MapReduce instead of Tez also didn't work. So he may use MapReduce as well if he wants to. In addition, you may use the configuration above also for Tez.Tarbox
I can also ensure you that it does solve the issue. Maybe tez is slightly different story but it does work for MapReduce and it is what user asked.Tarbox
I have tried it rigth now and the result is that I have 25 files. Moreover the triggered MapReduce job is a map-only job. Maybe you are using a different Hive version. I'm using Hive 1.2.1 and the files are ORC. And in these conditions, your solution doesn't work.Arterialize
Well, can you please try to set the mappers to 1 ?Tarbox
Well, then i am out of ideas sorry for that.Tarbox
this does not workMarylouisemaryly
G
-2

Maybe u can try insert overwrite table ... partition ( ... ) select * from ...

This one can use the merge setting for tezfiles.

Guarantor answered 16/8, 2017 at 0:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.