How to do CamelCase split in python
Asked Answered
C

16

77

What I was trying to achieve, was something like this:

>>> camel_case_split("CamelCaseXYZ")
['Camel', 'Case', 'XYZ']
>>> camel_case_split("XYZCamelCase")
['XYZ', 'Camel', 'Case']

So I searched and found this perfect regular expression:

(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])

As the next logical step I tried:

>>> re.split("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")
['CamelCaseXYZ']

Why does this not work, and how do I achieve the result from the linked question in python?

Edit: Solution summary

I tested all provided solutions with a few test cases:

string:                 ''
AplusKminus:            ['']
casimir_et_hippolyte:   []
two_hundred_success:    []
kalefranz:              string index out of range # with modification: either [] or ['']

string:                 ' '
AplusKminus:            [' ']
casimir_et_hippolyte:   []
two_hundred_success:    [' ']
kalefranz:              [' ']

string:                 'lower'
all algorithms:         ['lower']

string:                 'UPPER'
all algorithms:         ['UPPER']

string:                 'Initial'
all algorithms:         ['Initial']

string:                 'dromedaryCase'
AplusKminus:            ['dromedary', 'Case']
casimir_et_hippolyte:   ['dromedary', 'Case']
two_hundred_success:    ['dromedary', 'Case']
kalefranz:              ['Dromedary', 'Case'] # with modification: ['dromedary', 'Case']

string:                 'CamelCase'
all algorithms:         ['Camel', 'Case']

string:                 'ABCWordDEF'
AplusKminus:            ['ABC', 'Word', 'DEF']
casimir_et_hippolyte:   ['ABC', 'Word', 'DEF']
two_hundred_success:    ['ABC', 'Word', 'DEF']
kalefranz:              ['ABCWord', 'DEF']

In summary you could say the solution by @kalefranz does not match the question (see the last case) and the solution by @casimir et hippolyte eats a single space, and thereby violates the idea that a split should not change the individual parts. The only difference among the remaining two alternatives is that my solution returns a list with the empty string on an empty string input and the solution by @200_success returns an empty list. I don't know how the python community stands on that issue, so I say: I am fine with either one. And since 200_success's solution is simpler, I accepted it as the correct answer.

Commentate answered 28/4, 2015 at 9:52 Comment(7)
Other Qs to do what you're trying to do: first, second and I'm pretty sure there are others.Affirmative
How is it ABC CamelCase?!Convection
@Mihai I do not understand your question. If you wonder how the regex performs on "ABCCamelCase", it works as expected: ['ABC', 'Camel', 'Case']. If you interpreted ABC to stand for AbstractBaseClass, then I am sorry for the confusion, as ABC is just three arbitrary uppercase letters in my question.Commentate
Read my answer to a similar question.Punctilious
Also a good answer, but I did not find the question as the wording was too specific for my search. Also your answer does not quite do what is asked for here, as it produces a converted string with an arbitrary separation character which you would need to split with str.split(' '), instead of a (more versatile) list of its parts.Commentate
@SheridanVespo, ABC is just uppercase, not camel case.Convection
Look at the questions linked. I included the upper case part to address the common wish of being able to split something like "someHTMLFile" into ['some', 'HTML', 'File'].Commentate
L
67

As @AplusKminus has explained, re.split() never splits on an empty pattern match. Therefore, instead of splitting, you should try finding the components you are interested in.

Here is a solution using re.finditer() that emulates splitting:

def camel_case_split(identifier):
    matches = finditer('.+?(?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)', identifier)
    return [m.group(0) for m in matches]
Lectionary answered 28/4, 2015 at 12:49 Comment(8)
I found one difference (according to my test cases) between your solution and mine: camel_case_split("") returns []in your case and [""] in mine. The question is, which of those you would rather consider to be expected. Since either one works in my application, I consider this to be a valid answer!Commentate
Another question that remains, is whether this, or my proposed solution performs better. I am no expert on the complexity of regular expressions, so this would have to be evaluated by someone else.Commentate
Our regexes are basically the same, except that mine starts with a .+? that captures the text instead of discarding it, and ends with a $ to make it go all the way to the end. Neither change changes the search strategy.Lectionary
Doesn't support digits. For example, "L2S" is not split into ["L2", "S"] . Use [a-z0-9] rather than [a-z] in the above regular expression to fix this.Cimbalom
@Cimbalom The question seemed not to want a split there.Lectionary
Parse 1 .+? My Doubt : What is the use of .+? here .(any character) +(one or more) ?(zero or one) This is highly level group (?:(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$)Rebus
Parse 2 ?: My Doubt : What is the use of ?: here ? (?:...) Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern This part contains 3 regular expression with or (?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])|$) (?<=[a-z])(?=[A-Z]) (?<=...) Matches if the current position in the string is preceded by a match for ... (?=[A-Z][a-z]) (?=...) Matches if ... matches next, but doesn’t consume any of the string. '$' Matches the end of the stringRebus
@Lectionary Parse 1 and parse 2 are my analysis and I din't really get the regular expression . Can you help on this here ?Rebus
L
55

Use re.sub() and split()

import re

name = 'CamelCaseTest123'
split = re.sub('([A-Z][a-z]+)', r' \1', re.sub('([A-Z]+)', r' \1', name)).split()

Result

'CamelCaseTest123' -> ['Camel', 'Case', 'Test123']
'CamelCaseXYZ' -> ['Camel', 'Case', 'XYZ']
'XYZCamelCase' -> ['XYZ', 'Camel', 'Case']
'XYZ' -> ['XYZ']
'IPAddress' -> ['IP', 'Address']
Literature answered 8/6, 2016 at 8:26 Comment(4)
Best answer so far IMHO, elegant and effective, should be the selected answer.Tweak
Nice, even just re.sub('([A-Z]+)', r' \1', name).split() works for simple cases when you don't have inputs like 'XYZCamelCase' and 'IPAddress' (or if you're ok with getting ['XYZCamel', 'Case'] and ['IPAddress'] for them). The other re.sub accounts for these cases too (making each sequence of lowercase letters be attached to only one preceding uppercase letter).Autoradiograph
@PierrickBruneau, while I agree that this answer is elegant and effective, I find it lacking in an important aspect of general Q&A-site etiquette: It does not answer the question. Well, at least not fully, since no explanation is given as to why the attempt of the question does not work.Commentate
@AplusKminus, I'm answering new visitors who google "python camel case split" and land here. IMO they seek a general copy-pasteable snippet and do not have your specific issue (since they start from scratch). Therefore no need for such an explanation. This is why all of my "late" answers are like this. I'm doing this purposely. If I were answering in 2015 and targeting this answer to you, you would see such an explanationLiterature
G
14

Most of the time when you don't need to check the format of a string, a global research is more simple than a split (for the same result):

re.findall(r'[A-Z](?:[a-z]+|[A-Z]*(?=[A-Z]|$))', 'CamelCaseXYZ')

returns

['Camel', 'Case', 'XYZ']

To deal with dromedary too, you can use:

re.findall(r'[A-Z]?[a-z]+|[A-Z]+(?=[A-Z]|$)', 'camelCaseXYZ')

Note: (?=[A-Z]|$) can be shorten using a double negation (a negative lookahead with a negated character class): (?![^A-Z])

Greta answered 28/4, 2015 at 14:13 Comment(3)
@SheridanVespo: This is a way only for camel, not for dromedary (as asked). But it's possible to do it in the same way with few changes.Greta
@SheridanVespo: Yes "dromedary-case" doesn't exist, but since the dromedary has only one hump, and the camel two... About efficiency: it is not the pattern itself but all the code after that you avoid since you obtain directly the list of strings you want. About lookarounds in general: lookarounds do not come straight from hell and are not so slow (they can slow down a pattern only if they are badly used). As I was saying to an other SO user there's a few minutes, there are cases where you can optimize a pattern with lookaheads.Greta
Measured all posted solutions. Yours and mnesarco's one passed all of the Setop's tests and turned out to be the fastest.Coetaneous
C
14

Working solution, without regexp

I am not that good at regexp. I like to use them for search/replace in my IDE but I try to avoid them in programs.

Here is a quite straightforward solution in pure python:

def camel_case_split(s):
    idx = list(map(str.isupper, s))
    # mark change of case
    l = [0]
    for (i, (x, y)) in enumerate(zip(idx, idx[1:])):
        if x and not y:  # "Ul"
            l.append(i)
        elif not x and y:  # "lU"
            l.append(i+1)
    l.append(len(s))
    # for "lUl", index of "U" will pop twice, have to filter that
    return [s[x:y] for x, y in zip(l, l[1:]) if x < y]






And some tests

TESTS = [
    ("XYZCamelCase", ['XYZ', 'Camel', 'Case']),
    ("CamelCaseXYZ", ['Camel', 'Case', 'XYZ']),
    ("CamelCaseXYZa", ['Camel', 'Case', 'XY', 'Za']),
    ("XYZCamelCaseXYZ", ['XYZ', 'Camel', 'Case', 'XYZ']),
    ("aCamelCaseWordT", ['a', 'Camel', 'Case', 'Word', 'T']),
    ("CamelCaseWordT", ['Camel', 'Case', 'Word', 'T']),
    ("CamelCaseWordTa", ['Camel', 'Case', 'Word', 'Ta']),
    ("aCamelCaseWordTa", ['a', 'Camel', 'Case', 'Word', 'Ta']),
    ("Ta", ['Ta']),
    ("aT", ['a', 'T']),
    ("a", ['a']),
    ("T", ['T']),
    ("", []),
]

def test():
    for (q,a) in TESTS:
        assert camel_case_split(q) == a

if __name__ == "__main__":
    test()

Edit: a solution which streams data in one pass

This solution leverages the fact that the decision to split word or not can be taken locally, just considering the current character and the previous one.

def camel_case_split(s):
    u = True  # case of previous char
    w = b = ''  # current word, buffer for last uppercase letter
    for c in s:
        o = c.isupper()
        if u and o:
            w += b
            b = c
        elif u and not o:
            if len(w)>0:
                yield w
            w = b + c
            b = ''
        elif not u and o:
            yield w
            w = ''
            b = c
        else:  # not u and not o:
            w += c
        u = o
    if len(w)>0 or len(b)>0:  # flush
        yield w + b

It is theoretically faster and lesser memory usage.

same tests suite applies

but list must be built by caller

def test():
    for (q,a) in TESTS:
        r = list(camel_case_split(q))
        print(q,a,r)
        assert r == a

Try it online

Christianachristiane answered 22/11, 2019 at 14:43 Comment(3)
Thank you, this is readable, it works, and it has tests! Much better than the regexp solutions, in my opinion.Nowt
Just a heads up this breaks on World_Wide_Web => ['World_', 'Wide_', 'Web']. Also it breaks here ISO100 => ['IS', 'O100']Dehiscence
@stwhite, these inputs are not considered in the original question. And if underscore and digits are considered lowercase, output is correct. So this does not break, this just does what is has to do. Other solutions may have different behaviors but again, this is not part of the initial problem.Christianachristiane
N
6

I just stumbled upon this case and wrote a regular expression to solve it. It should work for any group of words, actually.

RE_WORDS = re.compile(r'''
    # Find words in a string. Order matters!
    [A-Z]+(?=[A-Z][a-z]) |  # All upper case before a capitalized word
    [A-Z]?[a-z]+ |  # Capitalized words / all lower case
    [A-Z]+ |  # All upper case
    \d+  # Numbers
''', re.VERBOSE)

The key here is the lookahead on the first possible case. It will match (and preserve) uppercase words before capitalized ones:

assert RE_WORDS.findall('FOOBar') == ['FOO', 'Bar']
Natality answered 6/1, 2017 at 16:32 Comment(1)
I like this one because it's clearer, and it does a better job for "strings people enter in real-life" like URLFinder and listURLReader.Celloidin
E
6
import re

re.split('(?<=[a-z])(?=[A-Z])', 'camelCamelCAMEL')
# ['camel', 'Camel', 'CAMEL'] <-- result

# '(?<=[a-z])'         --> means preceding lowercase char (group A)
# '(?=[A-Z])'          --> means following UPPERCASE char (group B)
# '(group A)(group B)' --> 'aA' or 'aB' or 'bA' and so on
Emersion answered 10/6, 2021 at 12:56 Comment(1)
Why not just use re.split('(?<=[a-z])(?=[A-Z])', 'camelCamelCAMEL')Apposite
G
4

This solution also supports numbers, spaces, and auto remove underscores:

def camel_terms(value):
    return re.findall('[A-Z][a-z]+|[0-9A-Z]+(?=[A-Z][a-z])|[0-9A-Z]{2,}|[a-z0-9]{2,}|[a-zA-Z0-9]', value)

Some tests:

tests = [
    "XYZCamelCase",
    "CamelCaseXYZ",
    "Camel_CaseXYZ",
    "3DCamelCase",
    "Camel5Case",
    "Camel5Case5D",
    "Camel Case XYZ"
]

for test in tests:
    print(test, "=>", camel_terms(test))

results:

XYZCamelCase => ['XYZ', 'Camel', 'Case']
CamelCaseXYZ => ['Camel', 'Case', 'XYZ']
Camel_CaseXYZ => ['Camel', 'Case', 'XYZ']
3DCamelCase => ['3D', 'Camel', 'Case']
Camel5Case => ['Camel', '5', 'Case']
Camel5Case5D => ['Camel', '5', 'Case', '5D']
Camel Case XYZ => ['Camel', 'Case', 'XYZ']
Gentoo answered 10/2, 2021 at 0:38 Comment(2)
Is this regex utilizing the fact that the first matching alternative will stop the processor from looking at the others? Otherwise I don't understand [a-z0-9]{2,} or [a-zA-Z0-9].Commentate
It is because in my usecase, i need to support "3D", but also need to support "3 D" if the input is already separated with spaces or underscores. This solution comes from my own requirement which has more cases than the original question. And yes, I use the fact that first match wins.Gentoo
C
3

The documentation for python's re.split says:

Note that split will never split a string on an empty pattern match.

When seeing this:

>>> re.findall("(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])", "CamelCaseXYZ")
['', '']

it becomes clear, why the split does not work as expected. The remodule finds empty matches, just as intended by the regular expression.

Since the documentation states that this is not a bug, but rather intended behavior, you have to work around that when trying to create a camel case split:

def camel_case_split(identifier):
    matches = finditer('(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])', identifier)
    split_string = []
    # index of beginning of slice
    previous = 0
    for match in matches:
        # get slice
        split_string.append(identifier[previous:match.start()])
        # advance index
        previous = match.start()
    # get remaining string
    split_string.append(identifier[previous:])
    return split_string
Commentate answered 28/4, 2015 at 9:54 Comment(0)
N
2

Simple solution:

re.sub(r"([a-z0-9])([A-Z])", r"\1 \2", str(text))
Nikolai answered 5/5, 2021 at 21:48 Comment(1)
This creates spaces between the parts, however the question asked to create an array of the parts.Commentate
D
2

Based on @Setop's answer, I added support for numbers, whitespaces, underscores and dots:

def _camel_case_split_iter(string: str) -> Iterable[str]:
    previous_char_upper = True
    previous_char_digit = True
    curr_word = ""
    upper_buffer = ""  # buffer for last uppercase letter
    for c in string:
        curr_char_upper = c.isupper()
        curr_char_digit = c.isdigit()
        if c.isspace() or c in ["_", "."]:
            if len(curr_word) > 0 or len(upper_buffer) > 0:
                yield curr_word + upper_buffer
                curr_word = upper_buffer = ""
        elif previous_char_upper and curr_char_upper:
            curr_word += upper_buffer
            upper_buffer = c
        elif previous_char_upper and not curr_char_upper and not curr_char_digit:
            if len(curr_word) > 0:
                yield curr_word
            curr_word = upper_buffer + c
            upper_buffer = ""
        elif not previous_char_upper and curr_char_upper:
            if len(curr_word) > 0:
                yield curr_word
                curr_word = ""
            upper_buffer = c
        elif (not previous_char_digit and curr_char_digit) or (previous_char_digit and not curr_char_digit):
            if len(curr_word) > 0 or len(upper_buffer) > 0:
                yield curr_word + upper_buffer
                upper_buffer = ""
            curr_word = c
        else:
            curr_word += c
        previous_char_upper = curr_char_upper
        previous_char_digit = curr_char_digit
    if len(curr_word) > 0 or len(upper_buffer) > 0:  # flush
        yield curr_word + upper_buffer


def camel_case_split(string: str) -> list[str]:
    """
    Split CamelCase string to words.

    >>> camel_case_split("XYZCamelCaseXYZ")
    ['XYZ', 'Camel', 'Case', 'XYZ']
    >>> camel_case_split("Ta")
    ['Ta']
    >>> camel_case_split("aT")
    ['a', 'T']
    >>> camel_case_split("_aAa_bBb__CCC__")
    ['a', 'Aa', 'b', 'Bb', 'CCC']
    >>> camel_case_split("10Camel20CaseXYZ30")
    ['10', 'Camel', '20', 'Case', 'XYZ', '30']
    >>> camel_case_split(" CamelCase camel case ")
    ['Camel', 'Case', 'camel', 'case']
    """
    return list(_camel_case_split_iter(string))

All tests:

@pytest.mark.parametrize(
    "string,expected",
    [
        ("XYZCamelCase", ["XYZ", "Camel", "Case"]),
        ("CamelCaseXYZ", ["Camel", "Case", "XYZ"]),
        ("CamelCaseXYZa", ["Camel", "Case", "XY", "Za"]),
        ("XYZCamelCaseXYZ", ["XYZ", "Camel", "Case", "XYZ"]),
        ("aCamelCaseWordT", ["a", "Camel", "Case", "Word", "T"]),
        ("CamelCaseWordT", ["Camel", "Case", "Word", "T"]),
        ("CamelCaseWordTa", ["Camel", "Case", "Word", "Ta"]),
        ("aCamelCaseWordTa", ["a", "Camel", "Case", "Word", "Ta"]),
        ("Ta", ["Ta"]),
        ("aT", ["a", "T"]),
        ("a", ["a"]),
        ("T", ["T"]),
        ("", []),
        ("A_B", ["A", "B"]),
        ("a_b", ["a", "b"]),
        ("Camel_CaseXYZ", ["Camel", "Case", "XYZ"]),
        ("aAa_bBb", ["a", "Aa", "b", "Bb"]),
        ("aAaTTT_b", ["a", "Aa", "TTT", "b"]),
        ("__CCcCccc__DDD__eee_fGG__", ["C", "Cc", "Cccc", "DDD", "eee", "f", "GG"]),
        ("__a", ["a"]),
        ("__A", ["A"]),
        ("a__", ["a"]),
        ("A__", ["A"]),
        ("____", []),
        ("3DCamelCase", ["3", "D", "Camel", "Case"]),
        ("330DCamelCase", ["330", "D", "Camel", "Case"]),
        ("330CamelCase", ["330", "Camel", "Case"]),
        ("Camel5Case", ["Camel", "5", "Case"]),
        ("Camel50Case", ["Camel", "50", "Case"]),
        ("Camel501Case", ["Camel", "501", "Case"]),
        ("CamelCase501", ["Camel", "Case", "501"]),
        ("CamelCaseA501", ["Camel", "Case", "A", "501"]),
        ("CamelCaseAA501", ["Camel", "Case", "AA", "501"]),
        ("CamelCase501a", ["Camel", "Case", "501", "a"]),
        ("Camel5Case5D", ["Camel", "5", "Case", "5", "D"]),
        ("Camel5Case50DC", ["Camel", "5", "Case", "50", "DC"]),
        ("Camel5Case50DCCase", ["Camel", "5", "Case", "50", "DC", "Case"]),
        ("camel.case", ["camel", "case"]),
        ("Camel Case XYZ", ["Camel", "Case", "XYZ"]),
        (" Camel Case 1 3XYZ _ AA ", ["Camel", "Case", "1", "3", "XYZ", "AA"]),
        ("camel\ncase", ["camel", "case"]),
    ],
)
def test_camel_case_split(string, expected):
    res = camel_case_split(string)
    assert res == expected

But I believe @mnesarco's answer is also very good, it's X5 faster and behaves almost the same.

The only difference (that I know) is how numbers with uppercase are handled:

"3DAndD3ARESoComplicated" -> 
# My answer:
['3', 'D', 'And', 'D', '3', 'ARE', 'So', 'Complicated'] 
# mnesarco's answer:
['3D', 'And', 'D3ARE', 'So', 'Complicated'] 
Dundee answered 19/3, 2023 at 13:33 Comment(0)
S
1

Here's another solution that requires less code and no complicated regular expressions:

def camel_case_split(string):
    bldrs = [[string[0].upper()]]
    for c in string[1:]:
        if bldrs[-1][-1].islower() and c.isupper():
            bldrs.append([c])
        else:
            bldrs[-1].append(c)
    return [''.join(bldr) for bldr in bldrs]

Edit

The above code contains an optimization that avoids rebuilding the entire string with every appended character. Leaving out that optimization, a simpler version (with comments) might look like

def camel_case_split2(string):
    # set the logic for creating a "break"
    def is_transition(c1, c2):
      return c1.islower() and c2.isupper()

    # start the builder list with the first character
    # enforce upper case
    bldr = [string[0].upper()]
    for c in string[1:]:
        # get the last character in the last element in the builder
        # note that strings can be addressed just like lists
        previous_character = bldr[-1][-1]
        if is_transition(previous_character, c):
            # start a new element in the list
            bldr.append(c)
        else:
            # append the character to the last string
            bldr[-1] += c
    return bldr
Sombrous answered 28/4, 2015 at 12:24 Comment(3)
@SheridanVespo I think the first version may have had an extraneous ) that you caught and corrected for me :)Sombrous
@SheridanVespo Apparently there are varied definitions for camel case. Some definitions (and the one I was originally assuming) enforce the first letter being capitalized. No worries; the "bug" is an easy fix. Just remove the .upper() call when initializing the list.Sombrous
Can you create a version that satisfies the cases in the linked answer? Also, is there a way to compare performance of your method and the one by @Casimir et Hippolyte?Commentate
S
0

I know that the question added the tag of regex. But still, I always try to stay as far away from regex as possible. So, here is my solution without regex:

def split_camel(text, char):
    if len(text) <= 1: # To avoid adding a wrong space in the beginning
        return text+char
    if char.isupper() and text[-1].islower(): # Regular Camel case
        return text + " " + char
    elif text[-1].isupper() and char.islower() and text[-2] != " ": # Detect Camel case in case of abbreviations
        return text[:-1] + " " + text[-1] + char
    else: # Do nothing part
        return text + char

text = "PathURLFinder"
text = reduce(split_camel, a, "")
print text
# prints "Path URL Finder"
print text.split(" ")
# prints "['Path', 'URL', 'Finder']"

EDIT: As suggested, here is the code to put the functionality in a single function.

def split_camel(text):
    def splitter(text, char):
        if len(text) <= 1: # To avoid adding a wrong space in the beginning
            return text+char
        if char.isupper() and text[-1].islower(): # Regular Camel case
            return text + " " + char
        elif text[-1].isupper() and char.islower() and text[-2] != " ": # Detect Camel case in case of abbreviations
            return text[:-1] + " " + text[-1] + char
        else: # Do nothing part
            return text + char
    converted_text = reduce(splitter, text, "")
    return converted_text.split(" ")

split_camel("PathURLFinder")
# prints ['Path', 'URL', 'Finder']
Shakhty answered 20/12, 2018 at 9:56 Comment(1)
Could you incorporate the reduce and the split into the method? Would make your method better testableCommentate
T
0

Putting a more comprehensive approach otu ther. It takes care of several issues like numbers, string starting with lower case, single letter words etc.

def camel_case_split(identifier, remove_single_letter_words=False):
    """Parses CamelCase and Snake naming"""
    concat_words = re.split('[^a-zA-Z]+', identifier)

    def camel_case_split(string):
        bldrs = [[string[0].upper()]]
        string = string[1:]
        for idx, c in enumerate(string):
            if bldrs[-1][-1].islower() and c.isupper():
                bldrs.append([c])
            elif c.isupper() and (idx+1) < len(string) and string[idx+1].islower():
                bldrs.append([c])
            else:
                bldrs[-1].append(c)

        words = [''.join(bldr) for bldr in bldrs]
        words = [word.lower() for word in words]
        return words
    words = []
    for word in concat_words:
        if len(word) > 0:
            words.extend(camel_case_split(word))
    if remove_single_letter_words:
        subset_words = []
        for word in words:
            if len(word) > 1:
                subset_words.append(word)
        if len(subset_words) > 0:
            words = subset_words
    return words
Timmi answered 31/8, 2019 at 1:0 Comment(1)
Could you add more comments to the code, so a person not well-versed in python will have it easier to understand what it does?Commentate
M
0

My requirement was a bit more specific than the OP. In particular, in addition to handling all OP cases, I needed the following which the other solutions do not provide: - treat all non-alphanumeric input (e.g. !@#$%^&*() etc) as a word separator - handle digits as follows: - cannot be in the middle of a word - cannot be at the beginning of the word unless the phrase starts with a digit

def splitWords(s):
    new_s = re.sub(r'[^a-zA-Z0-9]', ' ',                  # not alphanumeric
        re.sub(r'([0-9]+)([^0-9])', '\\1 \\2',            # digit followed by non-digit
            re.sub(r'([a-z])([A-Z])','\\1 \\2',           # lower case followed by upper case
                re.sub(r'([A-Z])([A-Z][a-z])', '\\1 \\2', # upper case followed by upper case followed by lower case
                    s
                )
            )
        )
    )
    return [x for x in new_s.split(' ') if x]

Output:

for test in ['', ' ', 'lower', 'UPPER', 'Initial', 'dromedaryCase', 'CamelCase', 'ABCWordDEF', 'CamelCaseXYZand123.how23^ar23e you doing AndABC123XYZdf']:
    print test + ':' + str(splitWords(test))
:[]
 :[]
lower:['lower']
UPPER:['UPPER']
Initial:['Initial']
dromedaryCase:['dromedary', 'Case']
CamelCase:['Camel', 'Case']
ABCWordDEF:['ABC', 'Word', 'DEF']
CamelCaseXYZand123.how23^ar23e you doing AndABC123XYZdf:['Camel', 'Case', 'XY', 'Zand123', 'how23', 'ar23', 'e', 'you', 'doing', 'And', 'ABC123', 'XY', 'Zdf']
Maier answered 12/11, 2019 at 21:10 Comment(0)
H
0

Maybe this will be enough to for some people:

a = "SomeCamelTextUpper"
def camelText(val):
    return ''.join([' ' + i if i.isupper() else i for i in val]).strip()
print(camelText(a))

It dosen't work with the type "CamelXYZ", but with 'typical' CamelCase scenario should work just fine.

Hurryscurry answered 25/10, 2022 at 10:24 Comment(0)
A
-2

I think below is the optimim

Def count_word(): Return(re.findall(‘[A-Z]?[a-z]+’, input(‘please enter your string’))

Print(count_word())

Azure answered 1/7, 2018 at 14:55 Comment(1)
Can you elaborate please?Diarist

© 2022 - 2024 — McMap. All rights reserved.