All text from camelCase to SNAKE_CASE

Asked 3/5, 2017 at 19:26 Answered 25/8, 2022 at 9:58

Solved regex notepad++camelcasing snakecasing

I am trying to do some text manipulations using Notepad++ macros. My last step is converting camelCase strings to SNAKE_CASE. So far no luck. I'm not very familiar with regex so can't write my own solution.

Example text file input:

firstLine(874),
secondLine(15),
thirdLineOfText87(0x0001);

Desired output:

FIRST_LINE(874),
SECOND_LINE(15),
THIRD_LINE_OF_TEXT_87(0x0001);

Regex or any plugin is an acceptable answer.

Chapa answered 3/5, 2017 at 19:26 Comment(1)

FYI, according to a post at the Notepad++ forum, there are conventions about the case types and its names. So, the correct names for the cases asked are lowerCamelCase and SCREAMING_SNAKE_CASE. – Puny 5/3, 2020 at 16:43

I suggest the following regex approach:

Find What: (\b[a-z]+|\G(?!^))((?:[A-Z]|\d+)[a-z]*)
Replace With: \U\1_\2
Match Case: ON.

This will turn camelCase87LikeThis words to CAMEL_CASE_87_LIKE_THIS. If you need to add support for those camel words that start with an uppercase letter, use the following regex modification:

(\G(?!^)|\b[a-zA-Z][a-z]*)([A-Z][a-z]*|\d+)

See the regex demo (also tested in Notepad++). Note the placement of the \G inside the regex and added A-Z.

Details:

(\b[a-z]+|\G(?!^)) - Group 1 capturing either of the two alternatives:
- \b[a-z]+ - start of a word (\b is the initial word boundary here) followed with 1+ lowercase ASCII letters
- |- or
- \G(?!^) - the end position of the previous successful match
((?:[A-Z]|\d+)[a-z]*) - Group 2 capturing:
- (?:[A-Z]|\d+) - either an uppercase ASCII letter ([A-Z]) or (|) 1+ digits (\d+)
- [a-z]* - 0+ lowercase ASCII letters.

The \U\1_\2 replacement pattern turns all the chars to uppercase with \U and inserts a _ between the two groups (inserted with \1 and \2 backreferences).

Stereobate answered 3/5, 2017 at 19:52 Comment(10)

How should find what look like if camelCases started with upper letter, example: FirstLine(874)? – Chapa 4/5, 2017 at 8:38

First approach misses all _ except last one, and second approach generates from ThirdLineOfText87(0x0001); -> THIRD_LINEOF_TEXT_87(0x0001); missed second _ – Chapa 4/5, 2017 at 8:49

You may use (\G(?!^)|\b[a-zA-Z][a-z]*)([A-Z][a-z]*|\d+), the trick was to put \G branch as the first alternative in the alternation group and adding support for the uppercase letter at the beginning of the word. – Iphagenia 4/5, 2017 at 9:28

This replaces FfdffDF with FFDFF_D_F which, I believe, isn't the desired behavior. Here's a tweaked version that takes into account cases like SomethingIO and IOSomething: (\G(?!^)|\b(?:[A-Z]{2}|[a-zA-Z][a-z]*))(?=[a-zA-Z]{2,}|\d)([A-Z](?:[A-Z]|[a-z]*)|\d+). – Aryanize 21/8, 2018 at 8:18

hmm.. MySuperCoolTest.js becomes MY_SUPERCoolTest.js -- i can't get the regex to affect the entire string? – Lifegiving 31/12, 2019 at 1:38

@Lifegiving The regex in the answer is for strings like caMelCase, not CaMelCase. If you use the regex from my comment you will get MY_SUPER_COOL_TEST.js (just make sure Match case is on) – Iphagenia 31/12, 2019 at 15:7

@MoacirSchmidt Then you probably want (\G(?!^)|\b[[:alpha:]][[:lower:]]*)([[:upper:]][[:lower:]]*|\d+)|\b([[:upper:]][[:lower:]]*)\b and replace with \U(?{3}$3:$1_$2) – Iphagenia 22/4, 2021 at 11:28

@MoacirSchmidt Not the one in the previous comment. – Iphagenia 22/4, 2021 at 11:57

On the link you referenced Pascal is replaced by a single _ (underscore) – Meerschaum 22/4, 2021 at 12:18

@MoacirSchmidt Because regex101.com is not Notepad++. \U(?{3}$3:$1_$2) replacement will work in Notepad++. – Iphagenia 22/4, 2021 at 12:20

There is an alternate solution. I mean, it saves not only digits together, and abbreviations too (in PHP):

preg_replace(
  '/(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/',
  '_$1',
  $str
)

This regex will work for these cases:

'fat' ---> 'fat'
'fatBat' ---> 'fat_bat'
'FatBat' ---> 'fat_bat'
'camera360' ---> 'camera_360'
'camera360all' ---> 'camera_360all'
'camera360All' ---> 'camera_360_all'
'cameraABC' ---> 'camera_abc'
'cameraABCAll' ---> 'camera_abc_all'
'thirdLineOfText87' ---> 'third_line_of_text_87'

This solution to lower case. But if we want upper case, we may use \U-modifier as in above solution in notepad++:

Find What: /(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/
Replace With: _\1
Match Case: ON.

I found this solution on doc page of php function preg_replace: https://www.php.net/manual/en/function.preg-replace.php#111695 .

Dusa answered 25/8, 2022 at 9:58 Comment(0)

Recommended topics

Hot tags