Add spaces before Capital Letters
Asked Answered
A

32

222

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"

Here is my attempt with a RegEx

System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0")
Aromatic answered 7/11, 2008 at 16:33 Comment(5)
Do you have a particular complaint about the approach you've taken? That might help us improve upon your method.Assessor
If the regex works, then I'd stick with that. Regex is optamized for string manipulation.Peake
I am just curious is there is a better or perhaps even a built in approach. I'd even be curious to see other approachs with other languages.Aromatic
Your code simply didn't work because the modified string is the return value of the 'Replace' function. With this code line: 'System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0").Trim();' it would work perfectly. (Just commenting because I stumbled over this post and nobody really saw, what was wrong with your code.)Square
Regex.Replace("ThisStringHasNoSpacesButItDoesHaveCapitals", @"\B[A-Z]", m => " " + m);Jemma
U
224

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)

This function

string AddSpacesToSentence(string text, bool preserveAcronyms)
{
        if (string.IsNullOrWhiteSpace(text))
           return string.Empty;
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]))
                if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
                    (preserveAcronyms && char.IsUpper(text[i - 1]) && 
                     i < text.Length - 1 && !char.IsUpper(text[i + 1])))
                    newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}

Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).

It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.

Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).

On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.

Update 2 Will it take Acronyms into account? It will now! The logic of the if statment is fairly obscure, as you can see expanding it to this ...

if (char.IsUpper(text[i]))
    if (char.IsUpper(text[i - 1]))
        if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
            newText.Append(' ');
        else ;
    else if (text[i - 1] != ' ')
        newText.Append(' ');

... doesn't help at all!

Here's the original simple method that doesn't worry about Acronyms

string AddSpacesToSentence(string text)
{
        if (string.IsNullOrWhiteSpace(text))
           return "";
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]) && text[i - 1] != ' ')
                newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}
Ussery answered 7/11, 2008 at 16:34 Comment(13)
if (char.IsUpper (text [i]) && text[i - 1] != ' ') If you re-run the code above it keeps adding spaces, this will stop spaces being added if there is a space before the capital letter.Apparel
I am not sure so I thought I would ask, does this method handle acronyms as described in Martin Brown's answer "DriveIsSCSICompatible" would ideally become "Drive Is SCSI Compatible"Bloodstock
That made it 1 character by replacing the contents of your for statement with the newly updated if statements, I may be doing something wrong?Bloodstock
I think so, I just pasted the full function into a test project and it worked at treat, sorry.Ussery
with this solution, "407 ETR Customer Service" is converted to "407 ET R Customer Service" and "PAR-MED" is converted to "PA R-ME D", both incorrectAzole
"2ND" gets changed to "2 ND", "SPECTRE-DVD" gets changed to "SPECTRE- DVD", both seem incorrect to meAzole
Expensive? I've never had a regular expression cause a performance issue of any kind. In many cases, they are faster than the long form algorithm.Clowers
Adding a check for char.IsLetter(text[i + 1]) helps with acronyms with special characters and digits (i.e. ABC_DEF wont get split as AB C_DEF).Oilstone
I am getting space before some acronyms. I will suggest to add text[i + 1] != ' ' in the last. if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) || (preserveAcronyms && char.IsUpper(text[i - 1]) && i < text.Length - 1 && !char.IsUpper(text[i + 1]) && text[i + 1] != ' '))Avon
I'm not sure the acronyms part is correct when its turned OFF. I just ran a test "ASentenceABC" expands to "ASentence A B C". Should be "A Sentence A B C"Orb
You should make it a string extension this string textTeraterai
I think this is what you're after as with the acronym part you just care if the previous value wasn't upper I think var previousWasntUpper = previous != ' ' && !char.IsUpper(previous); if (preserveAcronyms || previousWasntUpper)Grater
In AddSpacesToSentence(string text, bool preserveAcronyms), once you determine text[i] is uppercase, the first condition works better as (text[i - 1] != ' ' && (!preserveAcronyms || !char.IsUpper(text[i - 1]))) so the word A gets handled correctly, e.g. YouAreAGenius becomes You Are A Genius rather than You Are AGenius.Trimer
U
181

Your solution has an issue in that it puts a space before the first letter T so you get

" This String..." instead of "This String..."

To get around this look for the lower case letter preceding it as well and then insert the space in the middle:

newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");

Edit 1:

If you use @"(\p{Ll})(\p{Lu})" it will pick up accented characters as well.

Edit 2:

If your strings can contain acronyms you may want to use this:

newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");

So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"

Upsurge answered 7/11, 2008 at 17:19 Comment(3)
Couldn't you also just keep the original RegEx and Trim() the result?Erminiaerminie
@Erminiaerminie you could but it would require another memory allocation and string copy. That said if performance is a worry a Regex is probably not the best way to go anyhow.Upsurge
Could you also use "([^A-Z\\s])([A-Z])", even with acronyms?Pastoral
C
103

Didn't test performance, but here in one line with linq:

var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');
Cleliaclellan answered 16/2, 2011 at 20:10 Comment(0)
T
25

I know this is an old one, but this is an extension I use when I need to do this:

public static class Extensions
{
    public static string ToSentence( this string Input )
    {
        return new string(Input.SelectMany((c, i) => i > 0 && char.IsUpper(c) ? new[] { ' ', c } : new[] { c }).ToArray());
    }
}

This will allow you to use MyCasedString.ToSentence()

Tranquil answered 27/9, 2013 at 13:31 Comment(3)
I like the idea of this as an extension method, if you add TrimStart(' ') it will remove the leading space.Canter
Thanks @user1069816. I have changed the extension to use the overload of SelectMany which includes an index, this way it avoids the first letter and the unnecessary potential overhead of an additional call to TrimStart(' '). Rob.Tranquil
Does not handle acronyms. HasCICDHidden => Has C I C D HiddenBackup
A
11

I set out to make a simple extension method based on Binary Worrier's code which will handle acronyms properly, and is repeatable (won't mangle already spaced words). Here is my result.

public static string UnPascalCase(this string text)
{
    if (string.IsNullOrWhiteSpace(text))
        return "";
    var newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
    for (int i = 1; i < text.Length; i++)
    {
        var currentUpper = char.IsUpper(text[i]);
        var prevUpper = char.IsUpper(text[i - 1]);
        var nextUpper = (text.Length > i + 1) ? char.IsUpper(text[i + 1]) || char.IsWhiteSpace(text[i + 1]): prevUpper;
        var spaceExists = char.IsWhiteSpace(text[i - 1]);
        if (currentUpper && !spaceExists && (!nextUpper || !prevUpper))
                newText.Append(' ');
        newText.Append(text[i]);
    }
    return newText.ToString();
}

Here are the unit test cases this function passes. I added most of tchrist's suggested cases to this list. The three of those it doesn't pass (two are just Roman numerals) are commented out:

Assert.AreEqual("For You And I", "ForYouAndI".UnPascalCase());
Assert.AreEqual("For You And The FBI", "ForYouAndTheFBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "AManAPlanACanalPanama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNSServer".UnPascalCase());
Assert.AreEqual("For You And I", "For You And I".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "MountMᶜKinleyNationalPark".UnPascalCase());
Assert.AreEqual("El Álamo Tejano", "ElÁlamoTejano".UnPascalCase());
Assert.AreEqual("The Ævar Arnfjörð Bjarmason", "TheÆvarArnfjörðBjarmason".UnPascalCase());
Assert.AreEqual("Il Caffè Macchiato", "IlCaffèMacchiato".UnPascalCase());
//Assert.AreEqual("Mister Dženan Ljubović", "MisterDženanLjubović".UnPascalCase());
//Assert.AreEqual("Ole King Henry Ⅷ", "OleKingHenryⅧ".UnPascalCase());
//Assert.AreEqual("Carlos Ⅴº El Emperador", "CarlosⅤºElEmperador".UnPascalCase());
Assert.AreEqual("For You And The FBI", "For You And The FBI".UnPascalCase());
Assert.AreEqual("A Man A Plan A Canal Panama", "A Man A Plan A Canal Panama".UnPascalCase());
Assert.AreEqual("DNS Server", "DNS Server".UnPascalCase());
Assert.AreEqual("Mount Mᶜ Kinley National Park", "Mount Mᶜ Kinley National Park".UnPascalCase());
Ancelin answered 4/10, 2012 at 17:33 Comment(1)
Similar to other solution posted here, it fails with string "RegularOTs". It returns "Regular O Ts"Ferryboat
O
9

Welcome to Unicode

All these solutions are essentially wrong for modern text. You need to use something that understands case. Since Bob asked for other languages, I'll give a couple for Perl.

I provide four solutions, ranging from worst to best. Only the best one is always right. The others have problems. Here is a test run to show you what works and what doesn’t, and where. I’ve used underscores so that you can see where the spaces have been put, and I’ve marked as wrong anything that is, well, wrong.

Testing TheLoneRanger
               Worst:    The_Lone_Ranger
               Ok:       The_Lone_Ranger
               Better:   The_Lone_Ranger
               Best:     The_Lone_Ranger
Testing MountMᶜKinleyNationalPark
     [WRONG]   Worst:    Mount_MᶜKinley_National_Park
     [WRONG]   Ok:       Mount_MᶜKinley_National_Park
     [WRONG]   Better:   Mount_MᶜKinley_National_Park
               Best:     Mount_Mᶜ_Kinley_National_Park
Testing ElÁlamoTejano
     [WRONG]   Worst:    ElÁlamo_Tejano
               Ok:       El_Álamo_Tejano
               Better:   El_Álamo_Tejano
               Best:     El_Álamo_Tejano
Testing TheÆvarArnfjörðBjarmason
     [WRONG]   Worst:    TheÆvar_ArnfjörðBjarmason
               Ok:       The_Ævar_Arnfjörð_Bjarmason
               Better:   The_Ævar_Arnfjörð_Bjarmason
               Best:     The_Ævar_Arnfjörð_Bjarmason
Testing IlCaffèMacchiato
     [WRONG]   Worst:    Il_CaffèMacchiato
               Ok:       Il_Caffè_Macchiato
               Better:   Il_Caffè_Macchiato
               Best:     Il_Caffè_Macchiato
Testing MisterDženanLjubović
     [WRONG]   Worst:    MisterDženanLjubović
     [WRONG]   Ok:       MisterDženanLjubović
               Better:   Mister_Dženan_Ljubović
               Best:     Mister_Dženan_Ljubović
Testing OleKingHenryⅧ
     [WRONG]   Worst:    Ole_King_HenryⅧ
     [WRONG]   Ok:       Ole_King_HenryⅧ
     [WRONG]   Better:   Ole_King_HenryⅧ
               Best:     Ole_King_Henry_Ⅷ
Testing CarlosⅤºElEmperador
     [WRONG]   Worst:    CarlosⅤºEl_Emperador
     [WRONG]   Ok:       CarlosⅤº_El_Emperador
     [WRONG]   Better:   CarlosⅤº_El_Emperador
               Best:     Carlos_Ⅴº_El_Emperador

BTW, almost everyone here has selected the first way, the one marked "Worst". A few have selected the second way, marked "OK". But no one else before me has shown you how to do either the "Better" or the "Best" approach.

Here is the test program with its four methods:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

# First I'll prove these are fine variable names:
my (
    $TheLoneRanger              ,
    $MountMᶜKinleyNationalPark  ,
    $ElÁlamoTejano              ,
    $TheÆvarArnfjörðBjarmason   ,
    $IlCaffèMacchiato           ,
    $MisterDženanLjubović         ,
    $OleKingHenryⅧ              ,
    $CarlosⅤºElEmperador        ,
);

# Now I'll load up some string with those values in them:
my @strings = qw{
    TheLoneRanger
    MountMᶜKinleyNationalPark
    ElÁlamoTejano
    TheÆvarArnfjörðBjarmason
    IlCaffèMacchiato
    MisterDženanLjubović
    OleKingHenryⅧ
    CarlosⅤºElEmperador
};

my($new, $best, $ok);
my $mask = "  %10s   %-8s  %s\n";

for my $old (@strings) {
    print "Testing $old\n";
    ($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;

    ($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Worst:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Ok:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Better:", $new;

    ($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Best:", $new;
}

When you can score the same as the "Best" on this dataset, you’ll know you’ve done it correctly. Until then, you haven’t. No one else here has done better than "Ok", and most have done it "Worst". I look forward to seeing someone post the correct ℂ♯ code.

I notice that StackOverflow’s highlighting code is miserably stoopid again. They’re making all the same old lame as (most but not all) of the rest of the poor approaches mentioned here have made. Isn’t it long past time to put ASCII to rest? It doens’t make sense anymore, and pretending it’s all you have is simply wrong. It makes for bad code.

Observer answered 1/3, 2011 at 0:57 Comment(3)
your 'Best' answer seems the closest so far, but it doesn't seem like it accounts for leading punctuation or other leading non-lowercase letters. This seems to work best for me (in java): replaceAll("(?<=[^^\\p{javaUpperCase}])(?=[\\p{javaUpperCase}])"," ");Tyus
Hmm. I'm not sure roman numerals should really count as uppercase in this example. The letter modifer example definitely shouldn't be counted. If you go to McDonalds.com you will see it is written without a space.Upsurge
It should also be noted that you will never get this to be perfect. For example I would like to see an example that sorts out "AlexandervonHumboldt", which should end up as "Alexander von Humboldt". Then there are of course languages that don't have the destinction of Capital and Lowercase.Upsurge
B
5

This Regex places a space character in front of every capital letter:

using System.Text.RegularExpressions;

const string myStringWithoutSpaces = "ThisIsAStringWithoutSpaces";
var myStringWithSpaces = Regex.Replace(myStringWithoutSpaces, "([A-Z])([a-z]*)", " $1$2");

Mind the space in front if "$1$2", this is what will get it done.

This is the outcome:

"This Is A String Without Spaces"
Baggott answered 18/2, 2016 at 15:54 Comment(1)
If you want numbers also to be seperated, use this regex pattern instead: "([A-Z0-9])([a-z]*)"Baggott
C
4

Binary Worrier, I have used your suggested code, and it is rather good, I have just one minor addition to it:

public static string AddSpacesToSentence(string text)
{
    if (string.IsNullOrEmpty(text))
        return "";
    StringBuilder newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
            for (int i = 1; i < result.Length; i++)
            {
                if (char.IsUpper(result[i]) && !char.IsUpper(result[i - 1]))
                {
                    newText.Append(' ');
                }
                else if (i < result.Length)
                {
                    if (char.IsUpper(result[i]) && !char.IsUpper(result[i + 1]))
                        newText.Append(' ');

                }
                newText.Append(result[i]);
            }
    return newText.ToString();
}

I have added a condition !char.IsUpper(text[i - 1]). This fixed a bug that would cause something like 'AverageNOX' to be turned into 'Average N O X', which is obviously wrong, as it should read 'Average NOX'.

Sadly this still has the bug that if you have the text 'FromAStart', you would get 'From AStart' out.

Any thoughts on fixing this?

Canterbury answered 24/3, 2009 at 13:57 Comment(4)
Maybe something like this would work: char.IsUpper(text[i]) && (char.IsLower(text[i - 1]) || (char.IsLower(text[i+1]))Upsurge
This is the correct one: if (char.IsUpper(text[i]) && !(char.IsUpper(text[i - 1]) && char.IsUpper(text[i + 1]))) Test result: "From Start", "From THE Start", "From A Start" but you need i < text.Length - 1 in the for loop condition to ignore the last character and prevent out of range exception.Pressurecook
Oh it just the same. !(a && b) and (!a || !b) because lower = !upper.Pressurecook
what is the result?Leonie
F
4

Make sure you aren't putting spaces at the beginning of the string, but you are putting them between consecutive capitals. Some of the answers here don't address one or both of those points. There are other ways than regex, but if you prefer to use that, try this:

Regex.Replace(value, @"\B[A-Z]", " $0")

The \B is a negated \b, so it represents a non-word-boundary. It means the pattern matches "Y" in XYzabc, but not in Yzabc or X Yzabc. As a little bonus, you can use this on a string with spaces in it and it won't double them.

Fluidextract answered 16/2, 2011 at 20:24 Comment(0)
E
4

Inspired from @MartinBrown, Two Lines of Simple Regex, which will resolve your name, including Acyronyms anywhere in the string.

public string ResolveName(string name)
{
   var tmpDisplay = Regex.Replace(name, "([^A-Z ])([A-Z])", "$1 $2");
   return Regex.Replace(tmpDisplay, "([A-Z]+)([A-Z][^A-Z$])", "$1 $2").Trim();
}
Engedus answered 20/4, 2016 at 21:11 Comment(13)
I like this solution. It is short and fast. However, similar to other solutions, It fails with string "RegularOTs". Every solution I tried here returns "Regular O Ts"Ferryboat
@PateeGutee the OP wanted space before capitols, he didn’t mention abbreviations, we have a fix for that in production codEngedus
Can you show the fix? I have strings like this in my data and it is giving me incorrect result. Thanks.Ferryboat
@PateeGutee Sorry, I misread what you wanted. Pluralization is a different issues, `RegularOTs' what are you expecting to happen "Regular OTs" or "Regular OT s"Engedus
I'm expecting something like the following: "RegularOTs" -> "Regular OTs"... "BrowseFAQsPartly" -> "Browse FAQs Partly"Ferryboat
@PateeGutee Are you expecting this to work with only s, what about 'es'? Basically what rules are you expecting, if there is an Acronym trailing by an 's' or if there is an Acryonym trail by a single lower case?Engedus
@PateeGutee I've updated my answer for you, I believe that should workEngedus
Your updated solution handles plural acronyms pretty well. However, it fails with string "G7799CertifiedFRs". I am expecting "G7799 Certified FRs" but your updated solution only gave "Certified FRs" missing "G7799". Your original solution gave "G7799 Certified F Rs".Ferryboat
Try replacing [A-Z] with [A-Z1-9]Engedus
Getting there. "Y2000CertifiedTRs" -> "Y2000Certified TRs", "Y2000TRs" -> "Y2000TRs" which are both missing single space after "Y2000".Ferryboat
The following tests are successful: "AdvanceABCs" -> "Advance ABCs", "Advance123s" -> "Advance 123s", "ABCsAdvance -> "ABCs Advance", "123sAdvance" -> "123s Advance"Ferryboat
I’ve removed the edit since my solution was incomplete, just wondering why you’re using regex for this? Wouldn’t it be a whole lot quicker to just manually write a function that splits after the last capitol letter number or s?Engedus
@PateeGutee, the real issue with regex is that it's time consuming to understand, and hard to maintain. Imagine 1 year from now you need to add something else like support for pluralization for es or something, and it also enforces a standard that the people who are working on the code should know regex. It seems you have alot of tests for this already why not implement some TDD, and outline a set of definitive rules, Such as handling accented characters etc...Engedus
Q
3

Here's mine:

private string SplitCamelCase(string s) 
{ 
    Regex upperCaseRegex = new Regex(@"[A-Z]{1}[a-z]*"); 
    MatchCollection matches = upperCaseRegex.Matches(s); 
    List<string> words = new List<string>(); 
    foreach (Match match in matches) 
    { 
        words.Add(match.Value); 
    } 
    return String.Join(" ", words.ToArray()); 
}
Quagga answered 7/11, 2008 at 17:38 Comment(3)
Is that supposed to be C#? If so what namespace is List in? Do you mean ArrayList or List<string>?Upsurge
List<string> would be fine. Sorry about that.Quagga
@Martin He always had the correct syntax, it was just hidden in a <pre><code>code</code></pre> block instead of Markdown syntax. No need to downvote him (if that was you).Florous
G
2

What you have works perfectly. Just remember to reassign value to the return value of this function.

value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " $0");
Gombach answered 7/11, 2008 at 16:44 Comment(0)
V
2

Here is how you could do it in SQL

create  FUNCTION dbo.PascalCaseWithSpace(@pInput AS VARCHAR(MAX)) RETURNS VARCHAR(MAX)
BEGIN
    declare @output varchar(8000)

set @output = ''


Declare @vInputLength        INT
Declare @vIndex              INT
Declare @vCount              INT
Declare @PrevLetter varchar(50)
SET @PrevLetter = ''

SET @vCount = 0
SET @vIndex = 1
SET @vInputLength = LEN(@pInput)

WHILE @vIndex <= @vInputLength
BEGIN
    IF ASCII(SUBSTRING(@pInput, @vIndex, 1)) = ASCII(Upper(SUBSTRING(@pInput, @vIndex, 1)))
       begin 

        if(@PrevLetter != '' and ASCII(@PrevLetter) = ASCII(Lower(@PrevLetter)))
            SET @output = @output + ' ' + SUBSTRING(@pInput, @vIndex, 1)
            else
            SET @output = @output +  SUBSTRING(@pInput, @vIndex, 1) 

        end
    else
        begin
        SET @output = @output +  SUBSTRING(@pInput, @vIndex, 1) 

        end

set @PrevLetter = SUBSTRING(@pInput, @vIndex, 1) 

    SET @vIndex = @vIndex + 1
END


return @output
END
Vandavandal answered 15/10, 2012 at 19:22 Comment(0)
T
1
replaceAll("(?<=[^^\\p{Uppercase}])(?=[\\p{Uppercase}])"," ");
Tyus answered 30/6, 2011 at 14:19 Comment(0)
C
1
static string AddSpacesToColumnName(string columnCaption)
    {
        if (string.IsNullOrWhiteSpace(columnCaption))
            return "";
        StringBuilder newCaption = new StringBuilder(columnCaption.Length * 2);
        newCaption.Append(columnCaption[0]);
        int pos = 1;
        for (pos = 1; pos < columnCaption.Length-1; pos++)
        {               
            if (char.IsUpper(columnCaption[pos]) && !(char.IsUpper(columnCaption[pos - 1]) && char.IsUpper(columnCaption[pos + 1])))
                newCaption.Append(' ');
            newCaption.Append(columnCaption[pos]);
        }
        newCaption.Append(columnCaption[pos]);
        return newCaption.ToString();
    }
Chromo answered 14/6, 2012 at 17:5 Comment(0)
S
1

In Ruby, via Regexp:

"FooBarBaz".gsub(/(?!^)(?=[A-Z])/, ' ') # => "Foo Bar Baz"
Shlomo answered 26/7, 2012 at 20:24 Comment(1)
Oops, sorry. I've missed that it's C#-specific question and posted here Ruby answer :(Shlomo
G
1

I took Kevin Strikers excellent solution and converted to VB. Since i'm locked into .NET 3.5, i also had to write IsNullOrWhiteSpace. This passes all of his tests.

<Extension()>
Public Function IsNullOrWhiteSpace(value As String) As Boolean
    If value Is Nothing Then
        Return True
    End If
    For i As Integer = 0 To value.Length - 1
        If Not Char.IsWhiteSpace(value(i)) Then
            Return False
        End If
    Next
    Return True
End Function

<Extension()>
Public Function UnPascalCase(text As String) As String
    If text.IsNullOrWhiteSpace Then
        Return String.Empty
    End If

    Dim newText = New StringBuilder()
    newText.Append(text(0))
    For i As Integer = 1 To text.Length - 1
        Dim currentUpper = Char.IsUpper(text(i))
        Dim prevUpper = Char.IsUpper(text(i - 1))
        Dim nextUpper = If(text.Length > i + 1, Char.IsUpper(text(i + 1)) Or Char.IsWhiteSpace(text(i + 1)), prevUpper)
        Dim spaceExists = Char.IsWhiteSpace(text(i - 1))
        If (currentUpper And Not spaceExists And (Not nextUpper Or Not prevUpper)) Then
            newText.Append(" ")
        End If
        newText.Append(text(i))
    Next
    Return newText.ToString()
End Function
Granule answered 14/10, 2013 at 13:17 Comment(0)
O
1

The question is a bit old but nowadays there is a nice library on Nuget that does exactly this as well as many other conversions to human readable text.

Check out Humanizer on GitHub or Nuget.

Example

"PascalCaseInputStringIsTurnedIntoSentence".Humanize() => "Pascal case input string is turned into sentence"
"Underscored_input_string_is_turned_into_sentence".Humanize() => "Underscored input string is turned into sentence"
"Underscored_input_String_is_turned_INTO_sentence".Humanize() => "Underscored input String is turned INTO sentence"

// acronyms are left intact
"HTML".Humanize() => "HTML"
Originality answered 12/6, 2014 at 5:37 Comment(1)
Just tried that and the first link is now broken. NuGet works, but the package doesn't compile in my solution. A nice idea, if it worked.Saransarangi
F
1

Seems like a good opportunity for Aggregate. This is designed to be readable, not necessarily especially fast.

someString
.Aggregate(
   new StringBuilder(),
   (str, ch) => {
      if (char.IsUpper(ch) && str.Length > 0)
         str.Append(" ");
      str.Append(ch);
      return str;
   }
).ToString();
Fairy answered 5/4, 2017 at 18:25 Comment(0)
C
1

Found a lot of these answers to be rather obtuse but I haven't fully tested my solution, but it works for what I need, should handle acronyms, and is much more compact/readable than the others IMO:

private string CamelCaseToSpaces(string s)
    {
        if (string.IsNullOrEmpty(s)) return string.Empty;

        StringBuilder stringBuilder = new StringBuilder();
        for (int i = 0; i < s.Length; i++)
        {
            stringBuilder.Append(s[i]);

            int nextChar = i + 1;
            if (nextChar < s.Length && char.IsUpper(s[nextChar]) && !char.IsUpper(s[i]))
            {
                stringBuilder.Append(" ");
            }
        }

        return stringBuilder.ToString();
    }
Campos answered 27/8, 2020 at 14:48 Comment(0)
D
1

I wanna to use this one

thanks to @Sean

string InsertSpace(string text) {
        return string.Join("", text.Select(ch => (char.IsUpper(ch) ? " " : "") + ch));
    }

in 2023

i want to use this one to keep uppercase words

class USpace {
    public static string Create(string text) {
        int l = 0; return string.Join("", text.Select(ch => { if (char.IsUpper(ch)) { string res = (l==0 ? "" : " ")  + ch; l = 0; return res; } l++; return ch.ToString(); }));
    }
}
Demetri answered 31/10, 2021 at 9:20 Comment(0)
D
0

In addition to Martin Brown's Answer, I had an issue with numbers as well. For Example: "Location2", or "Jan22" should be "Location 2", and "Jan 22" respectively.

Here is my Regular Expression for doing that, using Martin Brown's answer:

"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))|((?<=[\p{Ll}\p{Lu}])\p{Nd})|((?<=\p{Nd})\p{Lu})"

Here are a couple great sites for figuring out what each part means as well:

Java Based Regular Expression Analyzer (but works for most .net regex's)

Action Script Based Analyzer

The above regex won't work on the action script site unless you replace all of the \p{Ll} with [a-z], the \p{Lu} with [A-Z], and \p{Nd} with [0-9].

Diplosis answered 16/2, 2012 at 18:11 Comment(0)
H
0

Here's my solution, based on Binary Worriers suggestion and building in Richard Priddys' comments, but also taking into account that white space may exist in the provided string, so it won't add white space next to existing white space.

public string AddSpacesBeforeUpperCase(string nonSpacedString)
    {
        if (string.IsNullOrEmpty(nonSpacedString))
            return string.Empty;

        StringBuilder newText = new StringBuilder(nonSpacedString.Length * 2);
        newText.Append(nonSpacedString[0]);

        for (int i = 1; i < nonSpacedString.Length; i++)
        {
            char currentChar = nonSpacedString[i];

            // If it is whitespace, we do not need to add another next to it
            if(char.IsWhiteSpace(currentChar))
            {
                continue;
            }

            char previousChar = nonSpacedString[i - 1];
            char nextChar = i < nonSpacedString.Length - 1 ? nonSpacedString[i + 1] : nonSpacedString[i];

            if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) 
                && !(char.IsUpper(previousChar) && char.IsUpper(nextChar)))
            {
                newText.Append(' ');
            }
            else if (i < nonSpacedString.Length)
            {
                if (char.IsUpper(currentChar) && !char.IsWhiteSpace(nextChar) && !char.IsUpper(nextChar))
                {
                    newText.Append(' ');
                }
            }

            newText.Append(currentChar);
        }

        return newText.ToString();
    }
Hinch answered 24/9, 2012 at 13:48 Comment(0)
E
0

For anyone who is looking for a C++ function answering this same question, you can use the following. This is modeled after the answer given by @Binary Worrier. This method just preserves Acronyms automatically.

using namespace std;

void AddSpacesToSentence(string& testString)
        stringstream ss;
        ss << testString.at(0);
        for (auto it = testString.begin() + 1; it != testString.end(); ++it )
        {
            int index = it - testString.begin();
            char c = (*it);
            if (isupper(c))
            {
                char prev = testString.at(index - 1);
                if (isupper(prev))
                {
                    if (index < testString.length() - 1)
                    {
                        char next = testString.at(index + 1);
                        if (!isupper(next) && next != ' ')
                        {
                            ss << ' ';
                        }
                    }
                }
                else if (islower(prev)) 
                {
                   ss << ' ';
                }
            }

            ss << c;
        }

        cout << ss.str() << endl;

The tests strings I used for this function, and the results are:

  • "helloWorld" -> "hello World"
  • "HelloWorld" -> "Hello World"
  • "HelloABCWorld" -> "Hello ABC World"
  • "HelloWorldABC" -> "Hello World ABC"
  • "ABCHelloWorld" -> "ABC Hello World"
  • "ABC HELLO WORLD" -> "ABC HELLO WORLD"
  • "ABCHELLOWORLD" -> "ABCHELLOWORLD"
  • "A" -> "A"
Edmundoedmunds answered 26/7, 2013 at 18:35 Comment(0)
A
0

A C# solution for an input string that consists only of ASCII characters. The regex incorporates negative lookbehind to ignore a capital (upper case) letter that appears at the beginning of the string. Uses Regex.Replace() to return the desired string.

Also see regex101.com demo.

using System;
using System.Text.RegularExpressions;

public class RegexExample
{
    public static void Main()
    {
        var text = "ThisStringHasNoSpacesButItDoesHaveCapitals";

        // Use negative lookbehind to match all capital letters
        // that do not appear at the beginning of the string.
        var pattern = "(?<!^)([A-Z])";

        var rgx = new Regex(pattern);
        var result = rgx.Replace(text, " $1");
        Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
    }
}

Expected Output:

Input: [ThisStringHasNoSpacesButItDoesHaveCapitals]
Output: [This String Has No Spaces But It Does Have Capitals]

Update: Here's a variation that will also handle acronyms (sequences of upper-case letters).

Also see regex101.com demo and ideone.com demo.

using System;
using System.Text.RegularExpressions;

public class RegexExample
{
    public static void Main()
    {
        var text = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";

        // Use positive lookbehind to locate all upper-case letters
        // that are preceded by a lower-case letter.
        var patternPart1 = "(?<=[a-z])([A-Z])";

        // Used positive lookbehind and lookahead to locate all
        // upper-case letters that are preceded by an upper-case
        // letter and followed by a lower-case letter.
        var patternPart2 = "(?<=[A-Z])([A-Z])(?=[a-z])";

        var pattern = patternPart1 + "|" + patternPart2;
        var rgx = new Regex(pattern);
        var result = rgx.Replace(text, " $1$2");

        Console.WriteLine("Input: [{0}]\nOutput: [{1}]", text, result);
    }
}

Expected Output:

Input: [ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ]
Output: [This String Has No Spaces ASCII But It Does Have Capitals LINQ]
Allina answered 6/6, 2014 at 19:12 Comment(0)
T
0

This one includes acronyms and acronym plurals and is a bit faster than the accepted answer:

public string Sentencify(string value)
{
    if (string.IsNullOrWhiteSpace(value))
        return string.Empty;

    string final = string.Empty;
    for (int i = 0; i < value.Length; i++)
    {
        if (i != 0 && Char.IsUpper(value[i]))
        {
            if (!Char.IsUpper(value[i - 1]))
                final += " ";
            else if (i < (value.Length - 1))
            {
                if (!Char.IsUpper(value[i + 1]) && !((value.Length >= i && value[i + 1] == 's') ||
                                                     (value.Length >= i + 1 && value[i + 1] == 'e' && value[i + 2] == 's')))
                    final += " ";
            }
        }

        final += value[i];
    }

    return final;
}

Passes these tests:

string test1 = "RegularOTs";
string test2 = "ThisStringHasNoSpacesASCIIButItDoesHaveCapitalsLINQ";
string test3 = "ThisStringHasNoSpacesButItDoesHaveCapitals";
Trista answered 5/12, 2014 at 23:56 Comment(5)
the accepted answer deals with the case where value is nullChemotropism
This adds an extra space in front of the output, ie HireDate => " Hire Date". Needs a final.TrimStart or something. I think that's what one of the other answers is pointing out below but because of the reordering I'm not sure if he was talking to you since his answer is RegEx based.Vmail
Good catch...should have added a start and end marker to my tests...fixed now.Trista
Similar to other solution posted here, it fails with string "RegularOTs". It returns "Regular O Ts"Ferryboat
Thanks for bringing up abbreviation plurals, I've updated to work for this as well.Trista
G
0

Here is a more thorough solution that doesn't put spaces in front of words:

Note: I have used multiple Regexs (not concise but it will also handle acronyms and single letter words)

Dim s As String = "ThisStringHasNoSpacesButItDoesHaveCapitals"
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z](?=[A-Z])[a-z]*)", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([A-Z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2")
s = System.Text.RegularExpressions.Regex.Replace(s, "([a-z])([A-Z][a-z])", "$1 $2") // repeat a second time

In:

"ThisStringHasNoSpacesButItDoesHaveCapitals"
"IAmNotAGoat"
"LOLThatsHilarious!"
"ThisIsASMSMessage"

Out:

"This String Has No Spaces But It Does Have Capitals"
"I Am Not A Goat"
"LOL Thats Hilarious!"
"This Is ASMS Message" // (Difficult to handle single letter words when they are next to acronyms.)
Gutta answered 23/3, 2015 at 22:59 Comment(2)
This outputs "This String Has NoSpaces But ItDoes Have Capitals"Hittite
Hi @AndyRobinson, thanks. I changed to use multiple Regex replaces. Not sure if there is a more concise way, but it works now.Gutta
M
0

All the previous responses looked too over complicated.

I had string that had a mixture of capitals and _ so used, string.Replace() to make the _, " " and used the following to add a space at the capital letters.

for (int i = 0; i < result.Length; i++)
{
    if (char.IsUpper(result[i]))
    {
        counter++;
        if (i > 1) //stops from adding a space at if string starts with Capital
        {
            result = result.Insert(i, " ");
            i++; //Required** otherwise stuck in infinite 
                 //add space loop over a single capital letter.
        }
    }
}
Motion answered 18/1, 2016 at 14:24 Comment(0)
N
0

Inspired by Binary Worrier answer I took a swing at this.

Here's the result:

/// <summary>
/// String Extension Method
/// Adds white space to strings based on Upper Case Letters
/// </summary>
/// <example>
/// strIn => "HateJPMorgan"
/// preserveAcronyms false => "Hate JP Morgan"
/// preserveAcronyms true => "Hate JPMorgan"
/// </example>
/// <param name="strIn">to evaluate</param>
/// <param name="preserveAcronyms" >determines saving acronyms (Optional => false) </param>
public static string AddSpaces(this string strIn, bool preserveAcronyms = false)
{
    if (string.IsNullOrWhiteSpace(strIn))
        return String.Empty;

    var stringBuilder = new StringBuilder(strIn.Length * 2)
        .Append(strIn[0]);

    int i;

    for (i = 1; i < strIn.Length - 1; i++)
    {
        var c = strIn[i];

        if (Char.IsUpper(c) && (Char.IsLower(strIn[i - 1]) || (preserveAcronyms && Char.IsLower(strIn[i + 1]))))
            stringBuilder.Append(' ');

        stringBuilder.Append(c);
    }

    return stringBuilder.Append(strIn[i]).ToString();
}

Did test using stopwatch running 10000000 iterations and various string lengths and combinations.

On average 50% (maybe a bit more) faster than Binary Worrier answer.

Nonjoinder answered 8/3, 2017 at 12:37 Comment(0)
G
0
    private string GetProperName(string Header)
    {
        if (Header.ToCharArray().Where(c => Char.IsUpper(c)).Count() == 1)
        {
            return Header;
        }
        else
        {
            string ReturnHeader = Header[0].ToString();
            for(int i=1; i<Header.Length;i++)
            {
                if (char.IsLower(Header[i-1]) && char.IsUpper(Header[i]))
                {
                    ReturnHeader += " " + Header[i].ToString();
                }
                else
                {
                    ReturnHeader += Header[i].ToString();
                }
            }

            return ReturnHeader;
        }

        return Header;
    }
Glisten answered 4/7, 2017 at 7:41 Comment(0)
E
0

An implementation with fold, also known as Aggregate:

    public static string SpaceCapitals(this string arg) =>
       new string(arg.Aggregate(new List<Char>(),
                      (accum, x) => 
                      {
                          if (Char.IsUpper(x) &&
                              accum.Any() &&
                              // prevent double spacing
                              accum.Last() != ' ' &&
                              // prevent spacing acronyms (ASCII, SCSI)
                              !Char.IsUpper(accum.Last()))
                          {
                              accum.Add(' ');
                          }

                          accum.Add(x);

                          return accum;
                      }).ToArray());

In addition to the request, this implementation correctly saves leading, inner, trailing spaces and acronyms, for example,

" SpacedWord " => " Spaced Word ",  

"Inner Space" => "Inner Space",  

"SomeACRONYM" => "Some ACRONYM".
Enarthrosis answered 23/5, 2018 at 5:34 Comment(1)
Does not actually handle acronyms: HasCICDHidden => Has CICDHiddenBackup
T
0

A simple way to add spaces after lower case letters, upper case letters or digits.

    string AddSpacesToSentence(string value, bool spaceLowerChar = true, bool spaceDigitChar = true, bool spaceSymbolChar = false)
    {
        var result = "";

        for (int i = 0; i < value.Length; i++)
        {
            char currentChar = value[i];
            char nextChar = value[i < value.Length - 1 ? i + 1 : value.Length - 1];

            if (spaceLowerChar && char.IsLower(currentChar) && !char.IsLower(nextChar))
            {
                result += value[i] + " ";
            }
            else if (spaceDigitChar && char.IsDigit(currentChar) && !char.IsDigit(nextChar))
            {
                result += value[i] + " ";
            }
            else if(spaceSymbolChar && char.IsSymbol(currentChar) && !char.IsSymbol(nextChar))
            {
                result += value[i];
            }
            else
            {
                result += value[i];
            }
        }

        return result;
    }
Teodor answered 8/4, 2019 at 1:8 Comment(1)
Code-only answers are discouraged. Please click on edit and add some words summarising how your code addresses the question, or perhaps explain how your answer differs from the previous answer/answers. From ReviewAdditament

© 2022 - 2024 — McMap. All rights reserved.