Find out if a string contains only ASCII characters
Asked Answered
E

3

3

I need to know whether a string contains only ASCII characters. So far I use this REGEX:

DECLARE
    str VARCHAR2(100) := 'xyz';
BEGIN
    IF REGEXP_LIKE(str, '^[ -~]+$') THEN
        DBMS_OUTPUT.PUT_LINE('Pure ASCII');
    END IF;
END;
/

Pure ASCII

' ' and ~ are the first, resp. last character in ASCII.

Problem is, this REGEXP_LIKE fails on certain NLS-Settings:

ALTER SESSION SET NLS_SORT = 'GERMAN'; 

DECLARE
    str VARCHAR2(100) := 'xyz';
BEGIN
    IF REGEXP_LIKE(str, '^[ -~]+$') THEN
        DBMS_OUTPUT.PUT_LINE('Pure ASCII');
    END IF;
END;
/

ORA-12728: invalid range in regular expression
ORA-06512: at line 4

Do anybody knows a solution which works independently from current user NLS-Settings? Is this behavior on purpose or should it be considered as a bug?

Esemplastic answered 18/6, 2018 at 17:48 Comment(11)
Would this work: IF (ASCIISTR(str) = str) THEN?Demotic
[ -~] is printable ascii, but ascii is really 7-bit [\x00-\x7F]Caracole
@sln, you are right, I mean "... contains only printable ASCII characters"Esemplastic
Then you would want [\x20-\x7E] Using byte syntax should never throw a range error.Caracole
Interestingly a backslash seems to fail that test; asciistr('\') gives \005C, presumably to avoid confusion as that has a special meaning in the output.Deathbed
@sln, [\x20-\x7E] also raises ORA-12728Esemplastic
@WernfriedDomscheit - what's that exception ? If it's invalid range, then there is something modifying all your text, not just the regex. Or, it could be that this is a new bug Oracle doesn't know about.Caracole
@sln - in Oracle's German linguistic sort order, the character x20 is "greater" than character x7E. x7E is #63 and x20 is #66. The full printable ASCII character range is x21 - x39.Culinarian
@Culinarian - That's the thing. In regex land, \xDD is not a character, it is a number, the code of the character. There is no translation from code to character, there is only translation from char to code, in regex landia. Where the range integer arithmetic is done. Hope you understand that. That's why the syntax \xDD is unique to regex parsers, so language does not play a factor.Caracole
@Culinarian - It would seem impossible for a regex engine to throw a range error when the first number is lower than the second number, if you catch my drift. There could always be the problem of surrogate pairs inside of classes, which can cause a range error.Caracole
Basically the same question >> #2236975Avertin
E
0

I think I will go for one of these two

IF CONVERT(str, 'US7ASCII') = str THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;



IF ASCIISTR(REPLACE(str, '\', '/')) = REPLACE(str, '\', '/') THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;
Esemplastic answered 19/6, 2018 at 7:16 Comment(2)
Those will include non-printable characters at the low end of the range, won't they?Reproachful
@MatthewMcPeak, yes but my main interest is to find special characters like öäü. Non-printable characters are not relevant in my case.Esemplastic
R
1

You can use TRANSLATE to do this. Basically, translate away all the ASCII printable characters (there aren't that many of them) and see what you have left.

Here is a query that does it:

WITH input ( p_string_to_test) AS ( 
SELECT 'This this string' FROM DUAL UNION ALL
SELECT 'Test this ' || CHR(7) || ' string too!' FROM DUAL UNION ALL
SELECT 'xxx' FROM DUAL)
SELECT p_string_to_test, 
       case when translate(p_string_to_test, 
       chr(0) || q'[ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~]', 
       chr(0)) is null then 'Yes' else 'No' END is_ascii
FROM input;
+-------------------------+----------+
|    P_STRING_TO_TEST     | IS_ASCII |
+-------------------------+----------+
| This this string        | Yes      |
| Test this  string too!  | No       |
| xxx                     | Yes      |
+-------------------------+----------+
Reproachful answered 18/6, 2018 at 18:27 Comment(1)
[:print:] includes non-ascii characters.Culinarian
L
1

ASCII function with upper limit of 127 may be used :

declare
    str nvarchar2(100) := '\xyz~*-=)(/&%+$#£>|"éß';
    a   nvarchar2(1);
    b   number := 0;
begin
    for i in 1..length(str)
    loop                 
      a := substrc(str,i,1);
      b := greatest(ascii(a),b);      
    end loop;

    if b < 128 then  
     dbms_output.put_line('String is composed of Pure ASCII characters');
    else
     dbms_output.put_line('String has non-ASCII characters');      
    end if; 
end;
Lieu answered 18/6, 2018 at 22:38 Comment(0)
E
0

I think I will go for one of these two

IF CONVERT(str, 'US7ASCII') = str THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;



IF ASCIISTR(REPLACE(str, '\', '/')) = REPLACE(str, '\', '/') THEN
    DBMS_OUTPUT.PUT_LINE('Pure ASCII');
END IF;
Esemplastic answered 19/6, 2018 at 7:16 Comment(2)
Those will include non-printable characters at the low end of the range, won't they?Reproachful
@MatthewMcPeak, yes but my main interest is to find special characters like öäü. Non-printable characters are not relevant in my case.Esemplastic

© 2022 - 2024 — McMap. All rights reserved.