How to convert camel case to snake case with two capitals next to each other
Asked Answered
U

10

20

I am trying to convert camel case to snake case.

Like this:

"LiveKarma" -> "live_karma"
"youGO" -> "you_g_o"

I cannot seem to get the second example working like that. It always outputs as 'you_go' . How can I get it to output 'you_g_o'

My code:

(Regex.Replace(line, "(?<=[a-z0-9])[A-Z]", "_$0", RegexOptions.Compiled)).ToLowerInvariant()
Urion answered 23/7, 2020 at 13:36 Comment(9)
Do you need to use a regex? Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems ~Jamie ZawinskiGrip
No I don't need regexUrion
Your regular expression looks for a lowercase letter or a number followed by an uppercase letter. That seems at odds with what you're asking for in your question title.Febrifuge
Perhaps change [a-z0-9] to [a-zA-Z0-9] regex101.com/r/Otna7T/1Sisely
I'm not even sure that your code treats LiveKarma correctly: is the L really replaced with l in your tests?Interment
Sorry I forgot to say I also do ToLowerInvariant()Urion
Oh, so your code is only for adding underscores!Interment
(?<!^)[A-Z] might be a clearer regex. i.e., match an uppercase character if you aren't at the beginning of the string.Kingkingbird
I ran into the same issue - I started trying to find out, what's the actual standard when having two subsequently upper case characters and I couldn't find any documentation. Looking at how Newton Soft does, they don't add separators between upper case characters. github.com/JamesNK/Newtonsoft.Json/blob/… If this is the standard, then the mapping isn't bijective since ex. FOoBAr -> foo_bar -> FooBar and FooBar ->foo_bar -> FooBar. I changed to avoid subsequently upper case characters.Wispy
R
22

Here is an extension method that transforms the text into a snake case:

using System.Text;

public static string ToSnakeCase(this string text)
{
    if(text == null) {
        throw new ArgumentNullException(nameof(text));
    }
    if(text.Length < 2) {
        return text.ToLowerInvariant();
    }
    var sb = new StringBuilder();
    sb.Append(char.ToLowerInvariant(text[0]));
    for(int i = 1; i < text.Length; ++i) {
        char c = text[i];
        if(char.IsUpper(c)) {
            sb.Append('_');
            sb.Append(char.ToLowerInvariant(c));
        } else {
            sb.Append(c);
        }
    }
    return sb.ToString();
}

Put it into a static class somewhere (named for example StringExtensions) and use it like this:

string text = "LiveKarma";
string snakeCaseText = text.ToSnakeCase();
// snakeCaseText => "live_karma"
Reeva answered 23/7, 2020 at 13:55 Comment(1)
In "if(text.Length < 2)" I would go also for "return text.ToLower();"Doherty
E
18

Since the option that converts abbreviations as separate words is not suitable for many, I found a complete solution in the EF Core codebase.

Here are a couple of examples of how the code works:

TestSC -> test_sc
testSC -> test_sc
TestSnakeCase -> test_snake_case
testSnakeCase -> test_snake_case
TestSnakeCase123 -> test_snake_case123
_testSnakeCase123 -> _test_snake_case123
test_SC -> test_sc

I rewrote it a bit so you can copy it as a ready-to-use string extension:

using System;
using System.Globalization;
using System.Text;

namespace Extensions
{
    public static class StringExtensions
    {
        public static string ToSnakeCase(this string text)
        {
            if (string.IsNullOrEmpty(text))
            {
                return text;
            }

            var builder = new StringBuilder(text.Length + Math.Min(2, text.Length / 5));
            var previousCategory = default(UnicodeCategory?);

            for (var currentIndex = 0; currentIndex < text.Length; currentIndex++)
            {
                var currentChar = text[currentIndex];
                if (currentChar == '_')
                {
                    builder.Append('_');
                    previousCategory = null;
                    continue;
                }

                var currentCategory = char.GetUnicodeCategory(currentChar);
                switch (currentCategory)
                {
                    case UnicodeCategory.UppercaseLetter:
                    case UnicodeCategory.TitlecaseLetter:
                        if (previousCategory == UnicodeCategory.SpaceSeparator ||
                            previousCategory == UnicodeCategory.LowercaseLetter ||
                            previousCategory != UnicodeCategory.DecimalDigitNumber &&
                            previousCategory != null &&
                            currentIndex > 0 &&
                            currentIndex + 1 < text.Length &&
                            char.IsLower(text[currentIndex + 1]))
                        {
                            builder.Append('_');
                        }

                        currentChar = char.ToLower(currentChar, CultureInfo.InvariantCulture);
                        break;

                    case UnicodeCategory.LowercaseLetter:
                    case UnicodeCategory.DecimalDigitNumber:
                        if (previousCategory == UnicodeCategory.SpaceSeparator)
                        {
                            builder.Append('_');
                        }
                        break;

                    default:
                        if (previousCategory != null)
                        {
                            previousCategory = UnicodeCategory.SpaceSeparator;
                        }
                        continue;
                }

                builder.Append(currentChar);
                previousCategory = currentCategory;
            }

            return builder.ToString();
        }
    }
}

You can find the original code here: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

UPD 27.04.2022:

Also, you can use Newtonsoft library if you're looking for a ready to use third party solution. The output of the code is the same as the code above.

// using Newtonsoft.Json.Serialization;
var snakeCaseStrategy = new SnakeCaseNamingStrategy();
var snakeCaseResult = snakeCaseStrategy.GetPropertyName(text, false);
Entrant answered 30/4, 2021 at 11:19 Comment(2)
This also produces the same results: Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()Jacquiline
The question is about how to add a separator in-between two subsequent upper case characters and this answer doesn't provide a solution for this.Wispy
F
9

Simple Linq based solution... no idea if its faster or not. basically ignores consecutive uppercases

public static string ToUnderscoreCase(this string str)
    => string.Concat((str ?? string.Empty).Select((x, i) => i > 0 && i < str.Length - 1 && char.IsUpper(x) && !char.IsUpper(str[i-1]) ? $"_{x}" : x.ToString())).ToLower();
Fry answered 22/5, 2021 at 17:53 Comment(1)
I just tried to use this one, and it throws exceptions if the string ends with a capital letter because it's attempting to look at the next character in the string, but it's outside of the bounds of the array.Cybil
B
7

using Newtonsoft package

    public static string? ToCamelCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new CamelCaseNamingStrategy() }.GetResolvedPropertyName(str);

    public static string? ToSnakeCase(this string? str) => str is null
        ? null
        : new DefaultContractResolver() { NamingStrategy = new SnakeCaseNamingStrategy() }.GetResolvedPropertyName(str);
Berg answered 25/8, 2021 at 6:45 Comment(2)
Maybe mention that you need to import the Newtonsoft JSON package for this solutionBignoniaceous
In a situation where we already use the package (NewtonSoft.Json), I think this can be used though.Winslow
B
4

RegEx Solution

A quick internet search turned up this site which has an answer using RegEx, which I had to modify to grab the Value portion in order for it to work on my machine (but it has the RegEx you're looking for). I also modified it to handle null input, rather than throwing an exception:

public static string ToSnakeCase2(string str)
{
    var pattern = 
        new Regex(@"[A-Z]{2,}(?=[A-Z][a-z]+[0-9]*|\b)|[A-Z]?[a-z]+[0-9]*|[A-Z]|[0-9]+");

    return str == null
        ? null
        : string
            .Join("_", pattern.Matches(str).Cast<Match>().Select(m => m.Value))
            .ToLower();
}

Non-RegEx Solution

For a non-regex solution, we can do the following:

  1. Reduce all whitespace to a single space by
    • using string.Split to split with an empty array as the first parameter to split on all whitespace
    • joining those parts back together with the '_' character
  2. Prefix all upper-case characters with '_' and lower-case them
  3. Split and re-join the resulting string on the _ character to remove any instances of multiple concurrent underscores ("__") and to remove any leading or trailing instances of the character.

For example:

public static string ToSnakeCase(string str)
{
    return str == null
        ? null
        : string.Join("_", string.Concat(string.Join("_", str.Split(new char[] {},
            StringSplitOptions.RemoveEmptyEntries))
            .Select(c => char.IsUpper(c)
                ? $"_{c}".ToLower()
                : $"{c}"))
            .Split(new[] {'_'}, StringSplitOptions.RemoveEmptyEntries));
}
Byroad answered 23/7, 2020 at 13:54 Comment(1)
I haven't tested your's, but this one works on all the cases I threw at it: Regex.Replace(Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2"), "([a-z0-9])([A-Z])", "$1_$2").ToLower()Jacquiline
A
2

if you're into micro-optimaizations and want to prevent unneccessary conversions wherever possible, this one might also work:

    public static string ToSnakeCase(this string text)
    {
        static IEnumerable<char> Convert(CharEnumerator e)
        {
            if (!e.MoveNext()) yield break;
            yield return char.ToLower(e.Current);
            while (e.MoveNext())
            {
                if (char.IsUpper(e.Current))
                {
                    yield return '_';
                    yield return char.ToLower(e.Current);
                }
                else
                {
                    yield return e.Current;
                }
            }
        }

        return new string(Convert(text.GetEnumerator()).ToArray());
    }
Araby answered 23/9, 2021 at 9:9 Comment(0)
F
2

There is a well maintained EF Core community project that implements a number of naming convention rewriters called EFCore.NamingConventions. The rewriters don't have any internal dependencies, so if you don't want to bring in an EF Core related package you can just copy the rewriter code out.

Here is the snake case rewriter: https://github.com/efcore/EFCore.NamingConventions/blob/main/EFCore.NamingConventions/Internal/SnakeCaseNameRewriter.cs

Foushee answered 21/11, 2021 at 15:22 Comment(0)
G
1

pseudo code below. In essence check if each char is upper case, then if it is add a _, then add the char to lower case

var newString = s.subString(0,1).ToLower();
foreach (char c in s.SubString(1,s.length-1))
{
    if (char.IsUpper(c))
    {
        newString = newString + "_";
    }
    newString = newString + c.ToLower();
}
Grip answered 23/7, 2020 at 13:44 Comment(1)
You'd have to treat first character differently, as you do not want _live_karma with input LiveKarma. Also, when building a string like this, you might prefer to use a StringBuilderInterment
J
1

May has well toss this one out. Very simple and worked for me.

public static string ToSnakeCase(this string text)
{
    text = Regex.Replace(text, "(.)([A-Z][a-z]+)", "$1_$2");
    text = Regex.Replace(text, "([a-z0-9])([A-Z])", "$1_$2");
    return text.ToLower();
}

Testing it with some samples (borrowed from @GeekInside's answer):

var samples = new List<string>() { "TestSC", "testSC", "TestSnakeCase", "testSnakeCase", "TestSnakeCase123", "_testSnakeCase123", "test_SC" };
var results = new List<string>() { "test_sc", "test_sc", "test_snake_case", "test_snake_case", "test_snake_case123", "_test_snake_case123", "test_sc" };
for (int i = 0; i < samples.Count; i++)
{
    var sample = samples[i];
    Console.WriteLine("Test success: " + (sample.ToSnakeCase() == results[i] ? "true" : "false"));
}

Produced the following output:

Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true 
Test success: true
Jacquiline answered 6/10, 2022 at 11:15 Comment(0)
V
1

Something like this ;)

static class StringExtensions {
  public static string ToSnakeCase(this string text) => 
    Regex.Replace(text, "(?<=[a-z0-9])[A-Z]|(?<=[A-Z])[A-Z][a-z]", "_$0").ToLower();
}

Test it:

var a = "TestSnakeCase11_MabyISTheBEST_SnakeIS22case_IS33CaseZZZ0";
var b = a.ToSnakeCase();
Console.WriteLine(a);
Console.WriteLine(b);

Console out:

TestSnakeCase11_MabyISTheBEST_SnakeIS22case_IS33CaseSNAKE00

test_snake_case11_maby_is_the_best_snake_is22case_is33_case_snake00

(?<=[a-z0-9])[A-Z] - this will find all situations except one ISThe

(?<=[A-Z])[A-Z][a-z] - this will find that one situation


shortest

one linest

enjoy it!


Vasiliu answered 24/3 at 9:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.