utf8_encode does not produce right result
Asked Answered
I

3

1

My problem is the following:

I store an array, which has keys like "e", "f", etc. At some point, I have to get the value of the key. This works well. But if I want to store "í", "é", etc. as the keys, it won't produce the right result (results in �). My page has to be in UTF-8. Looking up the problem, I found out that utf8_encode should help my problem. It didn't: although it produced a more-readable character, it still totally differed from what I want. If important, phpinfo gives:

Directive   Local Value Master Value
iconv.input_encoding    ISO-8859-1  ISO-8859-1
iconv.internal_encoding ISO-8859-1  ISO-8859-1
iconv.output_encoding   ISO-8859-1  ISO-8859-1

What could help the problem?

Edit: I think that array keys make some data loss. Is it true? If yes, how to prevent?

Edit2: Solutions I've tried so far: get the array key value - failed; make an array with same keys but a values of utf-8 characters: failed; utf8_encode failed; [tried with both] iconv_set_encoding: failed; ini_set failed; mb_internal_encoding failed. All returned with either à or �.

Ingraham answered 11/2, 2012 at 16:11 Comment(11)
You output might be ISO-8859-1 encoded according to these settings. That's totally unrelated to utf8_encode. Check with your browser which encoding applies.Janie
Yes, the output seems to be ISO-8859-1 somehow. How can I fix that without editing php.ini?Ingraham
Sent a header that signals that. Disable iconv output encoding as well, if you don't know what that is, you won't need it. Check PHP Manual how you can change that at runtime. Good luck!Janie
I did send the header. The problem was that the browser interpreted the ISO-8859-1 as UTF-8. I could not find how to disable iconv runtime. :(Ingraham
You don't need to disable iconv, you just need to change the output setting (and the other two settings as well if you like it more direct).Janie
I added iconv_set_encoding("input_encoding", "UTF-8");iconv_set_encoding("internal_encoding", "UTF-8"); iconv_set_encoding("output_encoding", "UTF-8"); to my code but the issue is still thereIngraham
Please read this: Handling Unicode In A Web App and possibly this: What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.Harwin
Read it. This helped a little bit, í, é are good, but I still have problem with ő and ű.Ingraham
What did you change/fix? It's simply a problem of your source code/source data being encoded in one encoding, but the client/browser/whatever is trying to interpret it in another encoding. Where is the data coming from? Is it hardcoded in the source code? Then make sure the file is saved as UTF-8. Is it coming from the database? Then make sure it's stored correctly there and the database connection encoding is set to UTF-8. Where are you displaying it? Make sure it's interpreted as UTF-8 there.Harwin
It's coming from an UTF-8 encoded form with POST. I think the problem is that array keys are in ISO-8859-1 and I have to echo the keys as well and ISO does not support ő and ű. The browser is surely interpreting UTF8, checked. My question is now basically this : is my theory of array keys true and if not, how can I make them UTF8?Ingraham
No, keys are strings (well, or numeric types). PHP strings are byte arrays and have no associated encoding. That doesn't change when they're used as array keys. If the string is UTF-8 encoded, the key is UTF-8 encoded.Harwin
I
1

I've put together some solutions and finally it works.

What I've done is the following: First, I've put together all solutions with adding this line:

ini_set('default_charset', 'UTF-8');
iconv_set_encoding("input_encoding", "UTF-8");
iconv_set_encoding("internal_encoding", "UTF-8");
iconv_set_encoding("output_encoding", "UTF-8");
mb_internal_encoding("UTF-8");

This did not work.

I looked at all the links, the utf8_encode - utf8_decode method didn't work. Then I took a look at the functions, I found the mbstring, so I replaced every string function with its mbstring equivalent.

This worked fine. Then, I figured out that mb_internal_encoding("UTF-8"); is enough. So now it works. Thanks for all the suggestions!

Ingraham answered 12/2, 2012 at 9:46 Comment(5)
So basically you were destroying your strings by manipulating them with encoding-unaware string functions?Harwin
As it seems, yes. The interesting part is that previously it didn't destroy. Perhaps because this didn't run on localhost but on another server.Ingraham
Worth a read: kunststube.net/encoding (good section on PHP, plus follow the link to Joel's article too).Flamsteed
I already read this too, the problem was that I thought PHP functions do support UTF8 (and only one header is enough) while it does only with mbstring. (Or actually this version & config which I use.)Ingraham
@Harwin what would be the best approach on this case for you?Soda
D
0

Try adding this line at the top of all scripts that'll have to deal with UTF-8 data:

mb_internal_encoding("UTF-8");

or even better, edit the internal encoding in your php.ini file.

Desantis answered 11/2, 2012 at 16:20 Comment(1)
I can't edit the php.ini sadly (shared hosting). :( Also, this does not seem working, output is still �.Ingraham
F
0

Try setting the default_charset directive:

ini_set('default_charset', 'UTF-8');

This sets the character encoding which is sent to the browser in the Content-Type header.

Flamsteed answered 11/2, 2012 at 21:30 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.