I'm loading a CSV file from S3 into Redshift. This CSV file is analytics data which contains the PageUrl (which may contain user search info inside a query string for example).
It chokes on rows where there is a single, double-quote character, for example if there is a page for a 14" toy then the PageUrl would contain:
http://www.mywebsite.com/a-14"-toy/1234.html
Redshift understandably can't handle this as it is expecting a closing double quote character.
The way I see it my options are:
- Pre-process the input and remove these characters
- Configure the COPY command in Redshift to ignore these characters but still load the row
- Set MAXERRORS to a high value and sweep up the errors using a separate process
Option 2 would be ideal, but I can't find it!
Any other suggestions if I'm just not looking hard enough?
Thanks
Duncan