MySQL export into outfile : CSV escaping chars
Asked Answered
B

6

50

I've a database table of timesheets with some common feilds.

id, client_id, project_id, task_id, description, time, date 

There are more but thats the gist of it.

I have an export running on that table to a CSV file overnight to give the user a backup of their data. It also is used as a data import for a macro Excel file with some custom reports.

This all works with me looping through the timesheets with php and printing the lines to a file.

The problem is with a big database it can take hours to run which isn't acceptable. So I rewrote it with the MySQL INTO OUTFILE command and it reduced it down to a few seconds to run which was great.

The problem now is I can't seem to escape all the new line characters, etc., in the description field. Really, a user can type potentially any combination of characters in here including carriage returns/new lines.

This is a snippet of the MySQL code I have:

SELECT id, 
       client,
       project,
       task,
       REPLACE(REPLACE(ifnull(ts.description,''),'\n',' '),'\r',' ') AS description, 
       time,
       date  
      INTO OUTFILE '/path/to/file.csv'
      FIELDS ESCAPED BY '""'
      TERMINATED BY ',' ENCLOSED BY '"'
      LINES TERMINATED BY '\n'
      FROM ....

But...

When I try look at the source of the output file, newlines still exist in the file, therefore the CSV import for the Excel breaks all the fancy macros and pivot tables the Excel wizard has created.

Any thoughts on a best course of action?

Barrie answered 13/7, 2009 at 13:1 Comment(0)
S
86

I think your statement should look like:

SELECT id, 
   client,
   project,
   task,
   description, 
   time,
   date  
  INTO OUTFILE '/path/to/file.csv'
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\n'
  FROM ts

Mainly without the FIELDS ESCAPED BY '""' option, OPTIONALLY ENCLOSED BY '"' will do the trick for description fields etc and your numbers will be treated as numbers in Excel (not strings comprising of numerics)

Also try calling:

SET NAMES utf8;

before your outfile select, that might help getting the character encodings inline (all UTF8)

Let us know how you get on.

Streak answered 28/7, 2009 at 22:6 Comment(5)
yes this is the statement i'm using now actually and it appears to be working. I am replacing the \n and , in description though as well just in case.Barrie
How exactly do you use the SET NAMES utf8;?Bolton
dayne, use SET NAMES utf8; as a normal query (just as you would query a SELECT or UPDATE statement, i.e. in the mysql command line or via PHP or Ruby or.... ) before your SELECT statementStreak
If you're having file permission issues, just write to /tmp like @Felix Z did and that should solve any issues.Derna
OPTIONALLY was the missing piece of the puzzle! OPTIONALLY ENCLOSED BY '"' ESCAPED BY '"' did the trick for me (Windows Office style escapes).Increasing
B
19

Here is what worked here: Simulates Excel 2003 (Save as CSV format)

SELECT 
REPLACE( IFNULL(notes, ''), '\r\n' , '\n' )   AS notes
FROM sometables
INTO OUTFILE '/tmp/test.csv' 
FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"'
LINES TERMINATED BY '\r\n';
  1. Excel saves \r\n for line separators.
  2. Excel saves \n for newline characters within column data
  3. Have to replace \r\n inside your data first otherwise Excel will think its a start of the next line.
Brunhilda answered 18/5, 2010 at 16:12 Comment(1)
thanks! this was the only one here that worked for meShowalter
P
2

What happens if you try the following?

Instead of your double REPLACE statement, try:

REPLACE(IFNULL(ts.description, ''),'\r\n', '\n')

Also, I think it should be LINES TERMINATED BY '\r\n' instead of just '\n'

Parlour answered 14/7, 2009 at 6:22 Comment(0)
U
2

Without actually seeing your output file for confirmation, my guess is that you've got to get rid of the FIELDS ESCAPED BY value.

MySQL's FIELDS ESCAPED BY is probably behaving in two ways that you were not counting on: (1) it is only meant to be one character, so in your case it is probably equal to just one quotation mark; (2) it is used to precede each character that MySQL thinks needs escaping, including the FIELDS TERMINATED BY and LINES TERMINATED BY values. This makes sense to most of the computing world, but it isn't the way Excel does escaping.

I think your double REPLACE is working, and that you are successfully replacing literal newlines with spaces (two spaces in the case of Windows-style newlines). But if you have any commas in your data (literals, not field separators), these are being preceded by quotation marks, which Excel treats much differently than MySQL. If that's the case, then the erroneous newlines that are tripping up Excel are actually newlines that MySQL had intended as line terminators.

Uprise answered 15/7, 2009 at 3:25 Comment(0)
P
0

Probably won't help but you could try creating a CSV table with that content:

DROP TABLE IF EXISTS foo_export;
CREATE TABLE foo_export LIKE foo;
ALTER TABLE foo_export ENGINE=CSV;
INSERT INTO foo_export SELECT id, 
   client,
   project,
   task,
   REPLACE(REPLACE(ifnull(ts.description,''),'\n',' '),'\r',' ') AS description, 
   time,
   date
  FROM ....
Personnel answered 29/7, 2009 at 14:45 Comment(0)
L
0

Below procedure worked for me to resolve all the escaping issues and have the procedure more a generic utility.

CREATE PROCEDURE `export_table`(
IN tab_name varchar(50), 
IN select_columns varchar(1000),
IN filename varchar(100),
IN where_clause varchar(1000),
IN header_row varchar(2000))

BEGIN
INSERT INTO impl_log_activities(TABLE_NAME, LOG_MESSAGE,CREATED_TS) values(tab_name, where_clause,sysdate());
COMMIT;
SELECT CONCAT( "SELECT ", header_row,
    " UNION ALL ",
    "SELECT ", select_columns, 
    " INTO OUTFILE ", "'",filename,"'"
    " FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '""' ",
    " LINES TERMINATED BY '\n'"
    " FROM ", tab_name, " ",
    (case when where_clause is null then "" else where_clause end)
) INTO @SQL_QUERY;

INSERT INTO impl_log_activities(TABLE_NAME, LOG_MESSAGE,CREATED_TS) values(tab_name, @SQL_QUERY, sysdate());
COMMIT;

PREPARE stmt FROM @SQL_QUERY;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

END

Linskey answered 2/7, 2020 at 11:6 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.