Python csv writer : "Unknown Dialect" Error
Asked Answered
E

3

8

I have a very large string in the CSV format that will be written to a CSV file.

I try to write it to CSV using the simplest if the python script

            results=""" "2013-12-03 23:59:52","/core/log","79.223.39.000","logging-4.0",iPad,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,3,"1385593191.865",true,ERROR,"app_error","iPad/Unknown/webkit/537.51.1",NA,"Does+not",false
        "2013-12-03 23:58:41","/core/log","217.7.59.000","logging-4.0",Win32,Unknown,"1.0.1.59-266060",NA,NA,NA,NA,4,"1385593120.68",true,ERROR,"app_error","Win32/Unknown/msie/9.0",NA,"Does+not,false 
"2013-12-03 23:58:19","/core/client_log","79.240.195.000","logging-4.0",Win32,"5.1","1.0.1.59-266060",NA,NA,NA,NA,6,"1385593099.001",true,ERROR,"app_error","Win32/5.1/mozilla/25.0",NA,"Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}",false"""
            resultArray = results.split('\n')
            with open(csvfile, 'wb') as f:
                writer = csv.writer(f)
                for row in resultArray:
                writer.writerows(row)

The code returns

"Unknown Dialect"

Error

Is the error because of the script or is it due to the string that is being written?

EDIT

If the problem is bad input how do I sanitize it so that it can be used by the csv.writer() method?

Eulaliaeulaliah answered 4/12, 2013 at 7:22 Comment(1)
Can you split the string manually into the number of columns you need in your output? That way the real problem can be handled (which isn't writing out a csv; but parsing the original string into the correct number of columns).Zena
Z
5

You need to specify the format of your string:

with open(csvfile, 'wb') as f:
    writer = csv.writer(f, delimiter=',', quotechar="'", quoting=csv.QUOTE_ALL)

You might also want to re-visit your writing loop; the way you have it written you will get one column in your file, and each row will be one character from the results string.

To really exploit the module, try this:

import csv

lines = ["'A','bunch+of','multiline','CSV,LIKE,STRING'"]

reader = csv.reader(lines, quotechar="'")

with open('out.csv', 'wb') as f:
   writer = csv.writer(f)
   writer.writerows(list(reader))

out.csv will have:

A,bunch+of,multiline,"CSV,LIKE,STRING"

If you want to quote all the column values, then add quoting=csv.QUOTE_ALL to the writer object; then you file will have:

"A","bunch+of","multiline","CSV,LIKE,STRING"

To change the quotes to ', add quotechar="'" to the writer object.

Zena answered 4/12, 2013 at 7:25 Comment(11)
Not necessarily. writer defaults to using the excel dialect, which should be fine in general. I think the problem is bad input.Hollington
Is resultsArray getting split like you think it is?Elongate
@PeterDeGlopper the quotechar in excel dialect is ", and quoting is set to minimal which is not what the input string is.Zena
Also, the sample string is just a sample. The original string that I have is huge. Approximately 14,000 line long, each with 7 fields. So, I take each line and write it as a row in the csv.Eulaliaeulaliah
But the writer doesn't care what the dialect of the input string is. If the OP were using the csv library to turn his quoted strings into a list before writing (as I expect he will have to in the end) the dialect setting would be important. But it should be fine to pass even a string containing many characters that need escaping to a writer using any dialect, as long as you pass it to the writer in the data structure expected (eg not just a string when it expects a sequence of sequences of strings).Hollington
If your original list contains strings that are already comma separated and quoted, you really don't need the csv module. Simply loop through the list and write each line to the file.Zena
Bottom line, it is perfectly valid to call writer.writerow(["'A','bunch+of','multiline','CSV,LIKE,STRING'"]) regardless of the writer's dialect. It is not valid to call writer.writerows("'A','bunch+of','multiline','CSV,LIKE,STRING'") - again, regardless of the writer's dialect. In either case it's probably not what you actually want to do, but one will do what you asked (although not what you meant) while the other will throw an exception.Hollington
If you do writer.writerow(["'A', 'bunch+of', 'multiline', 'CSV,LIKE,STRING'"]) you will get a file with one line which will be literal string "'A', 'bunch+of', 'multiline', 'CSV,LIKE,STRING'", which is not what is required; and if you do writer.writerows("'A','bunch+of','multiline','CSV,LIKE,STRING'") you will not get any exception, your file will end up with one column, with each row being each character of the string.Zena
Yes, that's why I said it isn't what you actually want to do. But there's a difference between a call that successfully executes but does something undesirable and a call that raises an exception. I am explaining why messing about with the dialect of the writer is completely irrelevant.Hollington
It is relevant if you care about the output of the file :)Zena
But unless the OP fixes the way he's calling the writer, there will be no output to care about.Hollington
H
1

The above code does not give csv.writer.writerows input that it expects. Specifically:

resultArray = results.split('\n')

This creates a list of strings. Then, you pass each string to your writer and tell it to writerows with it:

for row in resultArray:
    writer.writerows(row)

But writerows does not expect a single string. From the docs:

csvwriter.writerows(rows)

Write all the rows parameters (a list of row objects as described above) to the writer’s file object, formatted according to the current dialect.

So you're passing a string to a method that expects its argument to be a list of row objects, where a row object is itself expected to be a sequence of strings or numbers:

A row must be a sequence of strings or numbers for Writer objects

Are you sure your listed example code accurately reflects your attempt? While it certainly won't work, I would expect the exception produced to be different.

For a possible fix - if all you are trying to do is to write a big string to a file, you don't need the csv library at all. You can just write the string directly. Even splitting on newlines is unnecessary unless you need to do something like replacing Unix-style linefeeds with DOS-style linefeeds.

If you need to use the csv module after all, you need to give your writer something it understands - in this example, that would be something like writer.writerow(['A','bunch+of','multiline','CSV,LIKE,STRING']). Note that that's a true Python list of strings. If you need to turn your raw string "'A','bunch+of','multiline','CSV,LIKE,STRING'" into such a list, I think you'll find the csv library useful as a reader - no need to reinvent the wheel to handle the quoted commas in the substring 'CSV,LIKE,STRING'. And in that case you would need to care about your dialect.

Hollington answered 4/12, 2013 at 7:33 Comment(7)
I have added a couple of lines from the original string, if that may be of help.Eulaliaeulaliah
@Eulaliaeulaliah - assuming the line breaks in your triple-quoted string are \n characters in the original string, and that your closing "does+not" actually has a closing " character separate from the """, I would expect most csv-aware tools to be able to open the results of just f.write(results). What cleanup do you need to apply?Hollington
That is exaclty the problem..! If you see the third line there is this field : "Could+not:+{"url":"/all.json?status=ongoing,scheduled,conflict","code":0,"data":"","success":false,"error":true,"cached":false,"jqXhr":{"readyState":0,"responseText":"","status":0,"statusText":"error"}}" This field has ' , ' which is the same as the delimiter between the fields within the escaping ' " '. Excel truncated this field wrongly if I use the f.write method. So I need to make this csv compliant to make it work properly.Eulaliaeulaliah
Yeah, I saw that once you edited in the third line. Once you have double quotes nested inside other double quotes in ways that you expect a parser to handle, well, you have a very complicated problem. I do not expect that the csv library can help you with it - if I remember my computational theory correctly only recursive parsers can handle such arbitrarily nested delimiters. If you can in any way control your input, you should. If you cannot, you have a mess to deal with.Hollington
That was exactly what I feared. I have no way of controlling the input.Eulaliaeulaliah
I am sorry to say that as far as I can tell you will have to write a recursive parser, or maybe a slightly shallower one if you know how deeply nested your quotes can be, and I cannot quickly find a useful summary of what's involved with that.Hollington
Your example input only includes unescaped double quotes inside a JSON string - is that reliable? You can write a parsing routine that escapes double quotes if they're inside a curly-brace block without it being too horrible, since curly braces are directional and you can count nesting depth. Without some specialized knowledge like that, there are actually multiple valid interpretations of your example third line.Hollington
A
0

you can use 'register_dialect':

for example for escaped formatting:

csv.register_dialect('escaped', escapechar='\\', doublequote=True, quoting=csv.QUOTE_ALL)
Apologue answered 11/2, 2018 at 12:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.