Exporting Hive Table to a S3 bucket

Asked 28/2, 2012 at 20:48 Answered 5/11, 2015 at 16:20

Solved amazon-s3 hive elastic-map-reduce emr

I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

Does anyone know how to do this?

Tremulant answered 28/2, 2012 at 20:48 Comment(0)

Yes you have to export and import your data at the start and end of your hive session

To do this you need to create a table that is mapped onto S3 bucket and directory

CREATE TABLE csvexport (
  id BIGINT, time STRING, log STRING
  ) 
 row format delimited fields terminated by ',' 
 lines terminated by '\n' 
 STORED AS TEXTFILE
 LOCATION 's3n://bucket/directory/';

Insert data into s3 table and when the insert is complete the directory will have a csv file

 INSERT OVERWRITE TABLE csvexport 
 select id, time, log
 from csvimport;

Your table is now preserved and when you create a new hive instance you can reimport your data

Your table can be stored in a few different formats depending on where you want to use it.

Quittor answered 6/3, 2012 at 15:52 Comment(2)

the file in S3 is created without headers and with a random file name. is there a way to solve this problem? checked some of the posts on SO, couldn't find any relevant answers. – Taciturnity 24/9, 2020 at 13:13

Same comment as above + how to fill the AWS credentials in there? I mean the ID and Secret keys. Thanks. – Orgeat 19/1, 2021 at 13:24

Above Query needs to use EXTERNAL keyword, i.e:

CREATE EXTERNAL TABLE csvexport ( id BIGINT, time STRING, log STRING ) 
row format delimited fields terminated by ',' lines terminated by '\n' 
STORED AS TEXTFILE LOCATION 's3n://bucket/directory/';
INSERT OVERWRITE TABLE csvexport select id, time, log from csvimport;

An another alternative is to use the query

INSERT OVERWRITE DIRECTORY 's3n://bucket/directory/'  select id, time, log from csvimport;

the table is stored in the S3 directory with HIVE default delimiters.

Laius answered 27/3, 2012 at 5:2 Comment(3)

This doesn't copy the header. is there a way to copy header also? – Affront 2/11, 2016 at 7:22

@MohammadAdnan did you find a way to copy header? – Membership 24/7, 2018 at 21:51

@MohammadAdnan did u find a way to copy the headers? – Diatomite 16/12, 2020 at 6:11

If you could access aws console and have the "Access Key Id" and "Secret Access Key" for your account

You can try this too..

CREATE TABLE csvexport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION 's3n://"access id":"secret key"@bucket/folder/path';

Now insert the data as other stated above..

INSERT OVERWRITE TABLE csvexport select id, time, log from csvimport;

Bielefeld answered 5/11, 2015 at 16:20 Comment(0)

Recommended topics

Hot tags