Tidy replacing   with a weird character
Asked Answered
J

1

11

I am using Tidy (with PHP5) with UTF8 input, output, and char encoding enabled. When I clean a string with an   in it, it is replacing it with an odd character. I've tried messing with the tidy config but nothing I try seems to work.

Before Tidy:

This is a test.  Why does this not work?

After Tidy:

This is a test. ▒Why does this not work?

I don't know what the character is, but I assume it has something to do with the encoding of the enteties in utf8. Any ideas as to how I can get tidy to just leave the   alone?

Juliusjullundur answered 12/7, 2011 at 17:5 Comment(4)
no-break space is a different character than space in utf8 : utf8-chartable.de I guess you'll have to use str_replace before TidySclerodermatous
I need the   in there though because without it, HTML won't render two spaces on the screen.Juliusjullundur
What about using   instead of  . Maybe tidy's looking for it explicitly?Fawkes
I tried this, and it gives the same result. I think its trying to encode an actual non-breaking space character instead of leaving the entity alone. I would like tidy to just treat it like plain text, and ignore any conversion on the entity itself.Juliusjullundur
D
15

Have you tried the preserve-entities config option?

Dispense answered 12/7, 2011 at 19:29 Comment(1)
Go figure, the only option I missed. That works, thank you very much!Juliusjullundur

© 2022 - 2024 — McMap. All rights reserved.