Overview
fgetcsv and fputcsv support an $escape
argument, however, it's either broken, or I'm not understanding how it's supposed to work. Ignore the fact that you don't see the $escape
parameter documented on fputcsv
, it is supported in the PHP source, there's a small bug preventing it from coming through in the documentation.
The function also supports $delimiter
and $enclosure
parameters, defaulting to a comma and a double quote respectively. I would expect the $escape
parameter should be passed in order to have a field containing any one of those metacharacters (backslash, comma or double quote), however this certainly isn't the case. (I now understand from reading Wikipedia, these are to be enclosed in double-quotes).
What I've tried
Take for example the pitfall that has affected numerous posters in the comments section from the fgetcsv
documentation. The case where we'd like to write a single backslash to a field.
$r = fopen('/tmp/test.csv', 'w');
fwrite($r, '"\"');
fclose($r);
$r = fopen('/tmp/test.csv', 'r');
var_dump(fgetcsv($r));
fclose($r);
This returns false
. I've also tried "\\"
, however that also returns false
. Padding the backslash(es) with some nebulous text gives fgetcsv
the boost it needs... "hi\\there"
and "hi\there"
both parse and have the same result, but the result has only 1 backslash, so what's the point of the $escape
at all?
I've observed the same behavior when not enclosing the backslash in double quotes. Writing a 'CSV' file containing the string \
, and \\
, have the same result when parsed by fgetcsv
, 1 backslash.
Let's ask PHP how it might encode a backslash as a field in a CSV using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, array('\\'));
fclose($r);
echo file_get_contents('/tmp/test.csv');
The result is a double-quote enclosed single backslash (and I've tried 3 versions of PHP > 5.5.4 when $enclose
support was supposedly added to fputcsv
). The hilarity of this is that fgetcsv
can't even read it properly per my notes above, it returns false
... I'd expect fputcsv
not to enclose the backslash in double quotes or fgetcsv
to be able to read "\"
as fputcsv
has written it..., or really in my apparently misconstrued mind, for fputcsv
to write a double quote enclosed pair of backslashes and for fgetcsv
to be able to properly parse it!
Reproducible Test
Try writing a single quote to a file using fputcsv
, then reading it via fgetcsv
.
$aBackslash = array('\\');
// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash);
fclose($r);
// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r);
fclose($r);
// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
echo "PHP CSV support is broken\n";
Questions
Taking a step back I have some questions
- What's the point of the
$escape
parameter? - Given the loose definition of CSV files, can it be said PHP is supporting them correctly?
- What's the 'proper' way to encode a backslash in a CSV file?
Background
I initially discovered this when a co-worker provided me a CSV file produced from Python, which wrote out a single backslash enclosed by double quotes and after fgetcsv
failed to read it. I had the gaul to ask him if he could use a standard Python function. Little did I know the PHP CSV toolkit is a tangled mess! (FWIW: the Python dev tells me he's using the CSV writing module).
'"\\"'
stands for the string"\"
. If you want two backslashes in your string you need to write'"\\\\"'
. I think half of your complaints about a single backslash is based on this misunderstanding, no? – Carraway$escape
argument infgetcsv
andfputcsv
then? – Decor"\"
is invalid CSV. It means an opening enclosure, followed by a literal double quote character, without a terminating enclosure. Unfortunately, since reading your question and the answers, I've done some experimenting and I've discovered that, contrary to what you'd expect, the returned string is not unescaped. So, to encode a single backslash, the CSV needs to be"\\"
(four characters, or"\\\\"
in PHP code) which will return `\`. It's then up to you to unescape the escape characters. It's actually not broken, but you have to realise how unintuitive it is first. – Aeroscope