All text from camelCase to SNAKE_CASE
Asked Answered
C

2

21

I am trying to do some text manipulations using Notepad++ macros. My last step is converting camelCase strings to SNAKE_CASE. So far no luck. I'm not very familiar with regex so can't write my own solution.

Example text file input:

firstLine(874),
secondLine(15),
thirdLineOfText87(0x0001);

Desired output:

FIRST_LINE(874),
SECOND_LINE(15),
THIRD_LINE_OF_TEXT_87(0x0001);

Regex or any plugin is an acceptable answer.

Chapa answered 3/5, 2017 at 19:26 Comment(1)
FYI, according to a post at the Notepad++ forum, there are conventions about the case types and its names. So, the correct names for the cases asked are lowerCamelCase and SCREAMING_SNAKE_CASE.Puny
S
36

I suggest the following regex approach:

Find What:      (\b[a-z]+|\G(?!^))((?:[A-Z]|\d+)[a-z]*)
Replace With: \U\1_\2
Match Case: ON.

This will turn camelCase87LikeThis words to CAMEL_CASE_87_LIKE_THIS. If you need to add support for those camel words that start with an uppercase letter, use the following regex modification:

(\G(?!^)|\b[a-zA-Z][a-z]*)([A-Z][a-z]*|\d+)

See the regex demo (also tested in Notepad++). Note the placement of the \G inside the regex and added A-Z.

Details:

  • (\b[a-z]+|\G(?!^)) - Group 1 capturing either of the two alternatives:
    • \b[a-z]+ - start of a word (\b is the initial word boundary here) followed with 1+ lowercase ASCII letters
    • |- or
    • \G(?!^) - the end position of the previous successful match
  • ((?:[A-Z]|\d+)[a-z]*) - Group 2 capturing:
    • (?:[A-Z]|\d+) - either an uppercase ASCII letter ([A-Z]) or (|) 1+ digits (\d+)
    • [a-z]* - 0+ lowercase ASCII letters.

The \U\1_\2 replacement pattern turns all the chars to uppercase with \U and inserts a _ between the two groups (inserted with \1 and \2 backreferences).

enter image description here

Stereobate answered 3/5, 2017 at 19:52 Comment(10)
How should find what look like if camelCases started with upper letter, example: FirstLine(874)?Chapa
First approach misses all _ except last one, and second approach generates from ThirdLineOfText87(0x0001); -> THIRD_LINEOF_TEXT_87(0x0001); missed second _Chapa
You may use (\G(?!^)|\b[a-zA-Z][a-z]*)([A-Z][a-z]*|\d+), the trick was to put \G branch as the first alternative in the alternation group and adding support for the uppercase letter at the beginning of the word.Iphagenia
This replaces FfdffDF with FFDFF_D_F which, I believe, isn't the desired behavior. Here's a tweaked version that takes into account cases like SomethingIO and IOSomething: (\G(?!^)|\b(?:[A-Z]{2}|[a-zA-Z][a-z]*))(?=[a-zA-Z]{2,}|\d)([A-Z](?:[A-Z]|[a-z]*)|\d+).Aryanize
hmm.. MySuperCoolTest.js becomes MY_SUPERCoolTest.js -- i can't get the regex to affect the entire string?Lifegiving
@Lifegiving The regex in the answer is for strings like caMelCase, not CaMelCase. If you use the regex from my comment you will get MY_SUPER_COOL_TEST.js (just make sure Match case is on)Iphagenia
@MoacirSchmidt Then you probably want (\G(?!^)|\b[[:alpha:]][[:lower:]]*)([[:upper:]][[:lower:]]*|\d+)|\b([[:upper:]][[:lower:]]*)\b and replace with \U(?{3}$3:$1_$2)Iphagenia
@MoacirSchmidt Not the one in the previous comment.Iphagenia
On the link you referenced Pascal is replaced by a single _ (underscore)Meerschaum
@MoacirSchmidt Because regex101.com is not Notepad++. \U(?{3}$3:$1_$2) replacement will work in Notepad++.Iphagenia
D
1

There is an alternate solution. I mean, it saves not only digits together, and abbreviations too (in PHP):

preg_replace(
  '/(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/',
  '_$1',
  $str
)

This regex will work for these cases:

'fat' ---> 'fat'
'fatBat' ---> 'fat_bat'
'FatBat' ---> 'fat_bat'
'camera360' ---> 'camera_360'
'camera360all' ---> 'camera_360all'
'camera360All' ---> 'camera_360_all'
'cameraABC' ---> 'camera_abc'
'cameraABCAll' ---> 'camera_abc_all'
'thirdLineOfText87' ---> 'third_line_of_text_87'

This solution to lower case. But if we want upper case, we may use \U-modifier as in above solution in notepad++:

  • Find What: /(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])/
  • Replace With: _\1
  • Match Case: ON.

I found this solution on doc page of php function preg_replace: https://www.php.net/manual/en/function.preg-replace.php#111695 .

Dusa answered 25/8, 2022 at 9:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.