I am new to AWS datapipeline. I created a successful datapipeline to pull all the content from RDS to S3 bucket. Everything works. I see my .csv file in S3 bucket. But I am storing spanish names in my table, in csv I see "Garc�a" instead of "García"
Looks like the wrong codepage is used. Just reference the correct codepage and you should be fine. The following topic might help: Text files uploaded to S3 are encoded strangely?
AWS DataPipeline is implemented in Java, and uses JDBC (Java Database Connectivity) drivers (specifically, MySQL Connector/J for MySQL in your case) to connect to the database. According to the Using Character Sets and Unicode section of the documentation, the character set used by the connector is automatically determined based on the character_set_server
system variable on the RDS/MySQL server, which is set to latin1
by default.
If this setting is not correct for your application (run SHOW VARIABLES LIKE 'character%';
in a MySQL client to confirm), you have two options to correct this:
- Set
character_set_server
toutf8
on your RDS/MySQL server. To make this change permanently from the RDS console, see Modifying Parameters in a DB Parameter Group for instructions. Pass additional JDBC properties in your DataPipeline configuration to override the character set used by the JDBC connection. For this approach, add the following JDBC properties to your
RdsDatabase
orJdbcDatabase
object (see properties reference):"jdbcProperties": "useUnicode=true,characterEncoding=UTF-8"
jdbcProperties
- the documentation only says "Pairs of the form A=B that will be set as properties on jdbc connections for this database". It might instead be useUnicode=true&characterEncoding=UTF-8
or something else entirely. Let me know if either form works if you try this option. –
Singularize jdbcProperties
keys, one for each property you wish to set: "jdbcProperties": "useUnicode=true", "jdbcProperties": "characterEncoding=UTF-8"
; 2. Pass an array to jdbcProperties
: "jdbcProperties": ["useUnicode=true", "characterEncoding=UTF-8"]
. Let me know if either works. –
Singularize This question is a little similar to this Text files uploaded to S3 are encoded strangely?. If so, kindly reference my answer there.
© 2022 - 2024 — McMap. All rights reserved.