Split PascalCase string except for acronyms
Asked Answered
T

3

7

I have a list of words that need to be made human readable, such as FirstName to First Name, LastName to Last Name, and in some cases, acronyms like ARBs to remain as is. The latter was recently introduced and has caused a display issue since our regular expression returns AR Bs. Here's what we have, which I know is insufficient for acronyms:

([A-Z][a-z]+)

I've found other expressions on SO and on other sites that are able to work with acronyms, however they work on strings where the acronym is within the string rather than being the entire string. I can do simple regular expressions, but this is too tricky for my skills. I would provide other examples for testing if I had them, but all of the strings work fine except the new one, ARBs. Thank you.

Update: Here's the code usage

string friendlyName = Regex.Replace(field.Name, "([A-Z][a-z]+)", " $1", RegexOptions.Compiled).Trim();
Tetrafluoroethylene answered 13/12, 2011 at 20:6 Comment(1)
Unfortunately this problem may end up with unresolved ambiguity, for example does ConfigureIDEAsUser split to "Configure IDE as user" or "Configure IDEAs user" ?Ikeda
A
12

Wouldn't [A-Z]+[a-z]* do it? That should match one or more upper-case letters followed by zero or more lower-case letters. So ARBs would remain a single entity, but CamelCase would be split into Camel Case.

Alb answered 13/12, 2011 at 21:44 Comment(3)
Great solution. Is there a way to get "HereIsGPSData" to be "Here Is GPS Data"? I can see where it would be difficult. But I would assume that if there is a lowercase character after a set of uppercase characters that the Last upper is not part of the acronym.Savill
Actually figured it out. Use (?<!^)([A-Z][a-z]|(?<=[a-z])[A-Z]) Got this from hereSavill
@Grandizer: The question says otherwise -- "ARBs" has a lowercase character after a group of uppercase, and the last upper (B) is part of the acronym.Ikeda
R
0

How about this?

[A-Z][a-z]+|[A-Z]
Rabush answered 13/12, 2011 at 21:46 Comment(0)
I
0

A string/paragraph/sentence including Acronyms can be converted to Human readable sentences/string. I was just trying for formatting of Pascal Cased string, i investigated more and tried even to convert Acronyms in to Understandable format.

Test Data:

Input: "QWERTYSomeThing OmitTRYSomeThing MayBeWorkingFYI"

Output: "QWERTY Some Thing Omit TRY Some Thing May Be Working FYI"

Code: Pass Input String to Method Given Below.

    private static string FormatPascalAndAcronym(string input)
    {
        var builder = new StringBuilder(input[0].ToString());
        if (builder.Length > 0)
        {
            for (var index = 1; index < input.Length; index++)
            {
                char prevChar = input[index - 1];
                char nextChar = index + 1 < input.Length ? input[index + 1] : '\0';

                bool isNextLower = Char.IsLower(nextChar);
                bool isNextUpper = Char.IsUpper(nextChar);
                bool isPresentUpper = Char.IsUpper(input[index]);
                bool isPrevLower = Char.IsLower(prevChar);
                bool isPrevUpper = Char.IsUpper(prevChar);

                if(!string.IsNullOrWhiteSpace(prevChar.ToString()) && 
                    ((isPrevUpper&& isPresentUpper && isNextLower) || 
                    (isPrevLower&&isPresentUpper&&isNextLower)||
                    (isPrevLower&&isPresentUpper&&isNextUpper)))
                {
                    builder.Append(' ');
                    builder.Append(input[index]);
                }
                else{
                builder.Append(input[index]);
                }
            }
        }
        return builder.ToString();
    }
Itchy answered 30/5, 2014 at 19:56 Comment(1)
This does the wrong thing for the only test case mentioned in the question. Even though it's probably the right rule for everyone else.Ikeda

© 2022 - 2025 — McMap. All rights reserved.