How to determine if a string is a valid variable name?
Asked Answered
P

7

35

I'm looking for a quick way (in C#) to determine if a string is a valid variable name. My first intuition is to whip up some regex to do it, but I'm wondering if there's a better way to do it. Like maybe some kind of a secret method hidden deep somewhere called IsThisAValidVariableName(string name), or some other slick way to do it that is not prone to errors that might arise due to lack of regex prowess.

Pulcheria answered 1/12, 2009 at 23:28 Comment(5)
do you mean C# variable name? And I think regex is your best bet unless you roll your own little parser thing(which is overkill for such a small thing to check)Donovan
One thing to be careful of if you're using a regex is that there are several uinicode character classes that you might need to take into account: msdn.microsoft.com/en-us/library/aa664670%28VS.71%29.aspxMutule
Do you care about whether the variable is valid in a specific context or only about whether it can ever be a valid identifier in any context?Uninstructed
It would be helpful to know WHY you want this information. Are you writing a C# compiler? Are you generating code based on a user-supplied string?Shifrah
Dupe: #1904752Tammitammie
S
57

Try this:

// using System.CodeDom.Compiler;
CodeDomProvider provider = CodeDomProvider.CreateProvider("C#");
if (provider.IsValidIdentifier (YOUR_VARIABLE_NAME)) {
      // Valid
} else {
      // Not valid
}
Smithery answered 1/12, 2009 at 23:31 Comment(6)
You need to reference the System.CodeDom.Compiler namespace for that :-)Labialize
Yes. You also need to put that code inside a method, and the method in a class and a variable named YOUR_VARIABLE_NAME and... ;-)Smithery
How expensive is a CodeDomProvider?Guy
@Loadmaster: I think only CreateProvider will be 'expensive' (it reads from configuration files). Everything else is 'cheap'.Smithery
That's what I meant: how much overhead is needed to instantiate provider?Guy
using new Microsoft.CSharp.CSharpCodeProvider().IsValidIdentifier(...) directly requires significantly less overhead; no parsing to find the providerObediah
T
5
public static bool IsIdentifier(string text)
{
    if (string.IsNullOrEmpty(text))
        return false;
    if (!char.IsLetter(text[0]) && text[0] != '_')
        return false;
    for (int ix = 1; ix < text.Length; ++ix)
        if (!char.IsLetterOrDigit(text[ix]) && text[ix] != '_')
           return false;
    return true;
}
Teresiateresina answered 19/7, 2017 at 21:42 Comment(0)
U
2

There are a couple of special cases around the @ character that are easy to forget to check - namely, '@' by itself is not a valid identifier, and neither is "@1foo". To catch these, you can first check if the string is a keyword, then remove @ from the start of the string, and then check if what's left is a valid identifier (disallowing @ characters).

Here I've combined this with a method to parse Unicode escape sequences in identifiers, and hopefully complete C# (5.0) Unicode character checking. To use it, first call TryParseRawIdentifier() to handle keywords, escape sequences, formatting characters (which are removed), and verbatim identifiers. Next, pass the result to IsValidParsedIdentifier() to check if the first and subsequent characters are valid. Note that the strings returned from TryParseRawIdentifier() are equal if and only if the identifiers are considered identical by C#.

public static class CSharpIdentifiers
{
    private static HashSet<string> _keywords = new HashSet<string> {
        "abstract", "as", "base", "bool", "break", "byte", "case", "catch", "char", "checked",
        "class", "const", "continue", "decimal", "default", "delegate", "do", "double", "else",
        "enum", "event", "explicit", "extern", "false", "finally", "fixed", "float", "for",
        "foreach", "goto", "if", "implicit", "in", "int", "interface", "internal", "is", "lock",
        "long", "namespace", "new", "null", "object", "operator", "out", "override", "params",
        "private", "protected", "public", "readonly", "ref", "return", "sbyte", "sealed",
        "short", "sizeof", "stackalloc", "static", "string", "struct", "switch", "this", "throw",
        "true", "try", "typeof", "uint", "ulong", "unchecked", "unsafe", "ushort", "using",
        "virtual", "void", "volatile", "while"
    };

    public static IReadOnlyCollection<string> Keywords { get { return _keywords; } }


    public static bool TryParseRawIdentifier(string str, out string parsed)
    {
        if (string.IsNullOrEmpty(str) || _keywords.Contains(str)) { parsed = null; return false; }

        StringBuilder sb = new StringBuilder(str.Length);

        int verbatimCharWidth = str[0] == '@' ? 1 : 0;

        for (int i = verbatimCharWidth; i < str.Length; ) //Manual increment
        {
            char c = str[i];

            if (c == '\\')
            {
                char next = str[i + 1];

                int charCodeLength;
                if (next == 'u') charCodeLength = 4;
                else if (next == 'U') charCodeLength = 8;
                else { parsed = null; return false; }
                //No need to check for escaped backslashes or special sequences like \n,
                //as they not valid identifier characters

                int charCode;
                if (!TryParseHex(str.Substring(i + 2, charCodeLength), out charCode)) { parsed = null; return false; }

                sb.Append(char.ConvertFromUtf32(charCodeLength)); //Handle characters above 2^16 by converting them to a surrogate pair
                i += 2 + charCodeLength;
            }
            else if (char.GetUnicodeCategory(str, i) == UnicodeCategory.Format)
            {
                //Use (string, index) in order to handle surrogate pairs
                //Skip this character
                if (char.IsSurrogatePair(str, i)) i += 2;
                else i += 1;
            }
            else
            {
                sb.Append(c);
                i++;
            }
        }

        parsed = sb.ToString();
        return true;
    }

    private static bool TryParseHex(string str, out int result)
    {
        return int.TryParse(str, NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture, out result);
        //NumberStyles.AllowHexSpecifier forces all characters to be hex digits
    }

    public static bool IsValidParsedIdentifier(string str)
    {
        if (string.IsNullOrEmpty(str)) return false;

        if (!IsValidParsedIdentifierStart(str, 0)) return false;

        int firstCharWidth = char.IsSurrogatePair(str, 0) ? 2 : 1;

        for (int i = firstCharWidth; i < str.Length; ) //Manual increment
        {
            if (!IsValidParsedIdentifierPart(str, i)) return false;
            if (char.IsSurrogatePair(str, i)) i += 2;
            else i += 1;
        }

        return true;
    }

    //(String, index) pairs are used instead of chars in order to support surrogate pairs
    //(Unicode code-points above 2^16 represented using two 16-bit characters)

    public static bool IsValidParsedIdentifierStart(string s, int index)
    {
        return s[index] == '_' || char.IsLetter(s, index) || char.GetUnicodeCategory(s, index) == UnicodeCategory.LetterNumber;
    }

    public static bool IsValidParsedIdentifierPart(string s, int index)
    {
        if (s[index] == '_' || (s[index] >= '0' && s[index] <= '9') || char.IsLetter(s, index)) return true;

        switch (char.GetUnicodeCategory(s, index))
        {
            case UnicodeCategory.LetterNumber: //Eg. Special Roman numeral characters (not covered by IsLetter())
            case UnicodeCategory.DecimalDigitNumber: //Includes decimal digits in other cultures
            case UnicodeCategory.ConnectorPunctuation:
            case UnicodeCategory.NonSpacingMark:
            case UnicodeCategory.SpacingCombiningMark:
            //UnicodeCategory.Format handled in TryParseRawIdentifier()
                return true;
            default:
                return false;
        }
    }
}
Unmeriting answered 11/3, 2017 at 10:57 Comment(0)
L
1

The longer way, plus it is much slower, is to use reflection to iterate over members of a class/namespace and compare by checking if the reflected member**.ToString()** is the same as the string input, this requires having the assembly loaded beforehand.

Another way of doing it (a much longer way round it that overcomes the use of regex, by using an already available Antlr scanner/parser) borders on parsing/lexing C# code and then scanning for member names (i.e. variables) and comparing to the string used as an input, for example, input a string called 'fooBar', then specify the source (such as assembly or C# code) and scan it by analyzing looking specifically for declaration of members such as for example

private int fooBar;

Yes, it is complex but a powerful understanding will arise when you realize what compiler writers are doing and will enhance your knowledge of the C# language to a level where you get quite intimate with the syntax and its peculiarities.

Luisluisa answered 1/12, 2009 at 23:42 Comment(0)
B
1

More recent solution is using Roslyn APIs from Microsoft.CodeAnalysis.CSharp

 SyntaxFacts.IsValidIdentifier('identifierToCheck')

link

Bleed answered 19/1 at 15:21 Comment(0)
M
1

A slight improvement over romfir's awnser using Microsoft.CodeAnalysis.CSharp, to also treat reserved keywords as invalid member names:

public static bool IsValidMemberName(string name)
{
    return SyntaxFacts.IsValidIdentifier(name) &&
           SyntaxFacts.GetKeywordKind(name) == SyntaxKind.None;
}
Mechelle answered 18/6 at 10:24 Comment(0)
H
0

In WPF this can be uses to check if a string is a valid variable name. But it does not regognize reserved strings like "public".

// works only in WPF!
public static bool CheckIfStringIsValidVarName(string stringToCheck)
{
    if (string.IsNullOrWhiteSpace(stringToCheck))
        return false;

    TextBox textBox = new TextBox();

    try
    {
        // stringToCheck == ""; // !!! does NOT throw !!!
        // stringToCheck == "Name$"; // throws
        // stringToCheck == "0"; // throws
        // stringToCheck == "name with blank"; // throws
        // stringToCheck == "public"; // does NOT throw
        // stringToCheck == "ValidName";

        textBox.Name = stringToCheck;
    }
    catch (ArgumentException ex)
    {
        return false;
    }

    return true;
}
Honaker answered 20/4, 2019 at 9:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.