ctype_print() best solution for UTF-8 characters
Asked Answered
A

3

9

In PHP, what is the best approach to let ctype_print() (or is this not possible?) work with UTF-8? Currently when I use it with some UTF-8 characters it fails, for example:

ctype_print("Curaçao");

(after the Dutch independent island Curaçao) returns false.

Thanks in advance for your time and help.

Ambrotype answered 10/8, 2014 at 13:34 Comment(1)
This thread talk about this issue and they offer solutions, hereCultivation
S
1

There is a PCRE/POSIX regular expression character class that matches any printable characters (reference). When used with the u modifier, it will match any UTF-8 printable characters:

[:print:] This matches the same characters as [:graph:] plus space characters that are not controls, that is, characters with the Zs property.

If you are only interested in glyphs and spaces count as no-no, you would use:

[:graph:] This matches characters that have glyphs that mark the page when printed. In Unicode property terms, it matches all characters with the L, M, N, P, S, or Cf properties

Then, your regular expression would look like:

preg_match('~^[[:print:]]+$~u', $string)

And a possible function would look like:

function is_utf8_printable($string): bool {
    return (bool) preg_match('~^[[:print:]]+$~u', $string);
}

This will tell you whether all characters, from the ^ beginning to the $ end of the string, match the :print: character class. Please see here for several test string iterations.

Skeens answered 16/12, 2021 at 12:13 Comment(1)
@Petah any follow-up on this oldie you resurrected? (OP hasn't been around for ages.)Skeens
C
0

The ctype_print functions use ASCII symbols only:

Its a wrapper over C isprint function

Use regex

function ctype_print_utf(string $string): bool
{
    return preg_match('/[[:cntrl:]]/', $string) === 0;
}
Chyme answered 11/12, 2021 at 22:50 Comment(2)
Doesn't seem to work: 3v4l.org/L9CW9Unbodied
Works as ctype_print: 3v4l.org/N6NRKChyme
C
0

You can try use code bellow:

function ctype_print_unicode($input) {
    return preg_match("~^[\pL\pN\s\"\~". preg_quote("!#$%&'()*+,-./:;<=>?@[\]^_`{|}´") ."]+$~u", $input);
}

print ctype_print_unicode('Curaçao');

Ouput:

1
Callow answered 12/12, 2021 at 11:49 Comment(1)
3v4l.org/bGbMlUnbodied

© 2022 - 2024 — McMap. All rights reserved.