Export as csv in beeline hive
Asked Answered
I

5

24

I am trying to export my hive table as a csv in beeline hive. When I run the command !sql select * from database1 > /user/bob/output.csv it gives me syntax error.

I have successfully connected to the database at this point using the below command. The query outputs the correct results on console.

beeline -u 'jdbc:hive2://[databaseaddress]' --outputformat=csv

Also, not very clear where the file ends up. It should be the file path in hdfs correct?

Indurate answered 19/9, 2016 at 15:40 Comment(0)
A
40

When hive version is at least 0.11.0 you can execute:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/directoryWhereToStoreData' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
LINES TERMINATED BY "\n"
SELECT * FROM yourTable;

from hive/beeline to store the table into a directory on the local filesystem.


Alternatively, with beeline, save your SELECT query in yourSQLFile.sql and run:

beeline -u 'jdbc:hive2://[databaseaddress]' --outputformat=csv2 -f yourSQlFile.sql > theFileWhereToStoreTheData.csv 

Also this will store the result into a file in the local file system.


From hive, to store the data somewhere into HDFS:

CREATE EXTERNAL TABLE output 
LIKE yourTable 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hfds://WhereDoYou/Like';

INSERT OVERWRITE TABLE output SELECT * from yourTable;

then you can collect the data to a local file using:

hdfs dfs -getmerge /WhereDoYou/Like

This is another option to get the data using beeline only:

env HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline -u "jdbc:hive2://your.hive.server.address:10000/" --incremental=true --outputformat=csv2 -e "select * from youdatabase.yourtable" 

Working on:

Connected to: Apache Hive (version 1.1.0-cdh5.10.1)
Driver: Hive JDBC (version 1.1.0-cdh5.10.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.10.1 by Apache Hive
Adorable answered 23/9, 2016 at 20:17 Comment(6)
One weird thing I noticed, --outputformat doesn't work if you place it after the -e query or -f file with query switches. Strange I thought these were named arguments but seems beeline ignores any args after the query. I kept getting the default table format.Ralph
Please add the versions since Hadoop ecosystem is a work in progress.Adorable
I'm using this on AWS, if I run beeline --version it shows me Hive 2.3.2-amzn-0Ralph
Thanks so much for this. Your CLI statement really helped me move my project forward!Cnidus
this certainly make the fields delimited by comma but it doesn't guarantee it is a CSV, right? is it possible to specify the escape character, quote character or all those goodies have already been taking care of and this will guarantee a legit CSV.Wandie
Don't know it. I can read it using the Python library CSV and other minor tools I use. Since I never faced any issues in importing this file, I never feel the need to find a way to set so many details in the answer.Adorable
E
4

You can use this command to save output in CSV format from beeline:

beeline -u 'jdbc:hive2://bigdataplatform-dev.nam.nsroot.net:10000/;principal=hive/[email protected];ssl=true' --outputformat=csv2 --verbose=false  --fastConnect=true   --silent=true -f $query_file>out.csv

Save your SQL query file into $query_file.

Result will be in out.csv.

I have complete example here: hivehoney

Encrata answered 14/6, 2019 at 13:16 Comment(0)
S
3

Following worked for me

hive --silent=true --verbose=false --outputformat=csv2 -e "use <db_name>; select * from <table_name>" > table_name.csv


One advantage over using beeline is that you don't have have to provide hostname or user/pwd if you are running on hive node.

When some of the columns have string values having commas, tsv (tab separated) works better

hive --silent=true --verbose=false --outputformat=tsv -e "use <db_name>; select * from <table_name>" > table_name.tsv
Selfpity answered 16/7, 2019 at 23:1 Comment(0)
M
3

Output format in CSV:

$ beeline -u jdbc:hive2://192.168.0.41:10000/test_db -n user1 -p password **--outputformat=csv2** -e "select * from t1";

Output format in custom delimiter:

$ beeline -u jdbc:hive2://192.168.0.41:10000/test_db -n user1 -p password **--outputformat=dsv** **--delimiterForDSV='|'** -e "select * from t1";

Running command in background and redirect out to file:

$nohup `$ beeline -u jdbc:hive2://192.168.0.41:10000/test_db -n user1 -p password --outputformat=csv2 -e "select * from t1"; > output.csv 2> log` &

Reference URLs:

https://dwgeek.com/export-hive-table-into-csv-format-using-beeline-client-example.html/ https://dwgeek.com/hiveserver2-beeline-command-line-shell-options-examples.html/

Mutiny answered 17/10, 2020 at 4:12 Comment(0)
Z
0

From Beeline

beeline -u 'jdbc:hive2://123.12.4132:345/database_name' --outputformat=csv2 -e "select col1, col2, col3 from table_name" > /path/to/dump.csv

Zygophyllaceous answered 5/12, 2022 at 13:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.