I'm looking for a quick way (in C#) to determine if a string is a valid variable name. My first intuition is to whip up some regex to do it, but I'm wondering if there's a better way to do it. Like maybe some kind of a secret method hidden deep somewhere called IsThisAValidVariableName(string name), or some other slick way to do it that is not prone to errors that might arise due to lack of regex prowess.
Try this:
// using System.CodeDom.Compiler;
CodeDomProvider provider = CodeDomProvider.CreateProvider("C#");
if (provider.IsValidIdentifier (YOUR_VARIABLE_NAME)) {
// Valid
} else {
// Not valid
}
System.CodeDom.Compiler
namespace for that :-) –
Labialize CodeDomProvider
? –
Guy provider
? –
Guy new Microsoft.CSharp.CSharpCodeProvider().IsValidIdentifier(...)
directly requires significantly less overhead; no parsing to find the provider –
Obediah public static bool IsIdentifier(string text)
{
if (string.IsNullOrEmpty(text))
return false;
if (!char.IsLetter(text[0]) && text[0] != '_')
return false;
for (int ix = 1; ix < text.Length; ++ix)
if (!char.IsLetterOrDigit(text[ix]) && text[ix] != '_')
return false;
return true;
}
There are a couple of special cases around the @
character that are easy to forget to check - namely, '@'
by itself is not a valid identifier, and neither is "@1foo"
. To catch these, you can first check if the string is a keyword, then remove @
from the start of the string, and then check if what's left is a valid identifier (disallowing @
characters).
Here I've combined this with a method to parse Unicode escape sequences in identifiers, and hopefully complete C# (5.0) Unicode character checking. To use it, first call TryParseRawIdentifier()
to handle keywords, escape sequences, formatting characters (which are removed), and verbatim identifiers. Next, pass the result to IsValidParsedIdentifier()
to check if the first and subsequent characters are valid. Note that the strings returned from TryParseRawIdentifier()
are equal if and only if the identifiers are considered identical by C#.
public static class CSharpIdentifiers
{
private static HashSet<string> _keywords = new HashSet<string> {
"abstract", "as", "base", "bool", "break", "byte", "case", "catch", "char", "checked",
"class", "const", "continue", "decimal", "default", "delegate", "do", "double", "else",
"enum", "event", "explicit", "extern", "false", "finally", "fixed", "float", "for",
"foreach", "goto", "if", "implicit", "in", "int", "interface", "internal", "is", "lock",
"long", "namespace", "new", "null", "object", "operator", "out", "override", "params",
"private", "protected", "public", "readonly", "ref", "return", "sbyte", "sealed",
"short", "sizeof", "stackalloc", "static", "string", "struct", "switch", "this", "throw",
"true", "try", "typeof", "uint", "ulong", "unchecked", "unsafe", "ushort", "using",
"virtual", "void", "volatile", "while"
};
public static IReadOnlyCollection<string> Keywords { get { return _keywords; } }
public static bool TryParseRawIdentifier(string str, out string parsed)
{
if (string.IsNullOrEmpty(str) || _keywords.Contains(str)) { parsed = null; return false; }
StringBuilder sb = new StringBuilder(str.Length);
int verbatimCharWidth = str[0] == '@' ? 1 : 0;
for (int i = verbatimCharWidth; i < str.Length; ) //Manual increment
{
char c = str[i];
if (c == '\\')
{
char next = str[i + 1];
int charCodeLength;
if (next == 'u') charCodeLength = 4;
else if (next == 'U') charCodeLength = 8;
else { parsed = null; return false; }
//No need to check for escaped backslashes or special sequences like \n,
//as they not valid identifier characters
int charCode;
if (!TryParseHex(str.Substring(i + 2, charCodeLength), out charCode)) { parsed = null; return false; }
sb.Append(char.ConvertFromUtf32(charCodeLength)); //Handle characters above 2^16 by converting them to a surrogate pair
i += 2 + charCodeLength;
}
else if (char.GetUnicodeCategory(str, i) == UnicodeCategory.Format)
{
//Use (string, index) in order to handle surrogate pairs
//Skip this character
if (char.IsSurrogatePair(str, i)) i += 2;
else i += 1;
}
else
{
sb.Append(c);
i++;
}
}
parsed = sb.ToString();
return true;
}
private static bool TryParseHex(string str, out int result)
{
return int.TryParse(str, NumberStyles.AllowHexSpecifier, CultureInfo.InvariantCulture, out result);
//NumberStyles.AllowHexSpecifier forces all characters to be hex digits
}
public static bool IsValidParsedIdentifier(string str)
{
if (string.IsNullOrEmpty(str)) return false;
if (!IsValidParsedIdentifierStart(str, 0)) return false;
int firstCharWidth = char.IsSurrogatePair(str, 0) ? 2 : 1;
for (int i = firstCharWidth; i < str.Length; ) //Manual increment
{
if (!IsValidParsedIdentifierPart(str, i)) return false;
if (char.IsSurrogatePair(str, i)) i += 2;
else i += 1;
}
return true;
}
//(String, index) pairs are used instead of chars in order to support surrogate pairs
//(Unicode code-points above 2^16 represented using two 16-bit characters)
public static bool IsValidParsedIdentifierStart(string s, int index)
{
return s[index] == '_' || char.IsLetter(s, index) || char.GetUnicodeCategory(s, index) == UnicodeCategory.LetterNumber;
}
public static bool IsValidParsedIdentifierPart(string s, int index)
{
if (s[index] == '_' || (s[index] >= '0' && s[index] <= '9') || char.IsLetter(s, index)) return true;
switch (char.GetUnicodeCategory(s, index))
{
case UnicodeCategory.LetterNumber: //Eg. Special Roman numeral characters (not covered by IsLetter())
case UnicodeCategory.DecimalDigitNumber: //Includes decimal digits in other cultures
case UnicodeCategory.ConnectorPunctuation:
case UnicodeCategory.NonSpacingMark:
case UnicodeCategory.SpacingCombiningMark:
//UnicodeCategory.Format handled in TryParseRawIdentifier()
return true;
default:
return false;
}
}
}
The longer way, plus it is much slower, is to use reflection to iterate over members of a class/namespace and compare by checking if the reflected member**.ToString()** is the same as the string input, this requires having the assembly loaded beforehand.
Another way of doing it (a much longer way round it that overcomes the use of regex, by using an already available Antlr scanner/parser) borders on parsing/lexing C# code and then scanning for member names (i.e. variables) and comparing to the string used as an input, for example, input a string called 'fooBar', then specify the source (such as assembly or C# code) and scan it by analyzing looking specifically for declaration of members such as for example
private int fooBar;
Yes, it is complex but a powerful understanding will arise when you realize what compiler writers are doing and will enhance your knowledge of the C# language to a level where you get quite intimate with the syntax and its peculiarities.
More recent solution is using Roslyn APIs from Microsoft.CodeAnalysis.CSharp
SyntaxFacts.IsValidIdentifier('identifierToCheck')
A slight improvement over romfir's awnser using Microsoft.CodeAnalysis.CSharp
, to also treat reserved keywords as invalid member names:
public static bool IsValidMemberName(string name)
{
return SyntaxFacts.IsValidIdentifier(name) &&
SyntaxFacts.GetKeywordKind(name) == SyntaxKind.None;
}
In WPF this can be uses to check if a string is a valid variable name. But it does not regognize reserved strings like "public".
// works only in WPF!
public static bool CheckIfStringIsValidVarName(string stringToCheck)
{
if (string.IsNullOrWhiteSpace(stringToCheck))
return false;
TextBox textBox = new TextBox();
try
{
// stringToCheck == ""; // !!! does NOT throw !!!
// stringToCheck == "Name$"; // throws
// stringToCheck == "0"; // throws
// stringToCheck == "name with blank"; // throws
// stringToCheck == "public"; // does NOT throw
// stringToCheck == "ValidName";
textBox.Name = stringToCheck;
}
catch (ArgumentException ex)
{
return false;
}
return true;
}
© 2022 - 2024 — McMap. All rights reserved.