Regex to change to sentence case
Asked Answered
E

4

44

I'm using Notepad++ to do some text replacement in a 5453-row language file. The format of the file's rows is:

variable.name = Variable Value Over Here, that''s for sure, Really

Double apostrophe is intentional.

I need to convert the value to sentence case, except for the words "Here" and "Really" which are proper and should remain capitalized. As you can see, the case within the value is typically mixed to begin with.

I've worked on this for a little while. All I've got so far is:

 (. )([A-Z])(.+)

which seems to at least select the proper strings. The replacement piece is where I'm struggling.

Earthshaker answered 24/6, 2009 at 15:49 Comment(1)
Why are "Here" and "Really" proper?Berri
T
12

Regex replacement cannot execute function (like capitalization) on matches. You'd have to script that, e.g. in PHP or JavaScript.

Update: See Jonas' answer.

I built myself a Web page called Text Utilities to do that sort of things:

  • paste your text
  • go in "Find, regexp & replace" (or press Ctrl+Shift+F)
  • enter your regex (mine would be ^(.*?\=\s*\w)(.*)$)
  • check the "^$ match line limits" option
  • choose "Apply JS function to matches"
  • add arguments (first is the match, then sub patterns), here s, start, rest
  • change the return statement to return start + rest.toLowerCase();

The final function in the text area looks like this:

return function (s, start, rest) {
     return start + rest.toLowerCase();
};

Maybe add some code to capitalize some words like "Really" and "Here".

Tasso answered 24/6, 2009 at 15:55 Comment(3)
Thanks for the help streetpc. In Notepad++ I can apply the replace function using regex which is suh-weet. Then again, your site is suh-weet too. This pretty much nails it unless I have a new sentence within the variable portion, but I can fix that by finding ". [a-z]" and fixing the case on the first letter following a period-space combination. I'm going to leave the question open for a bit to see if any Notepad++ people respond but you definitely solved my problem. Thanks!Earthshaker
This can be done in Vim. vim.wikia.com/wiki/Changing_case_with_regular_expressionsGlutinous
nedit can also do this with its regex search and replace.Advowson
L
162
Find:    (. )([A-Z])(.+)
Replace: \1\U\2\L\3

In Notepad++ 6.0 or better (which comes with built-in PCRE support).

Laconism answered 30/5, 2013 at 7:28 Comment(9)
Oh wow, I've wanted something like this for so long. Thank you!!Malaspina
I'll add that \u uppercases just the first character of the match. Likewise \l will lowercase just the first character.Malaspina
If you are using the \U and \L commands, you can also use \E to end them. So for example if you wanted to only change part of the replace string and not all of if, you would put a \E at the end of the bit whose case you want to change.Ultramontanism
It should be noted that, at least in version 6.9.1, the \U and \L (and \u, \l) commands do not convert accented letters, only ASCII upper/lowercase letters.Remorseful
Should be the answerCorrinnecorrival
For anyone like me who's wondering how to use this more generally, the \U, \L go before the match marker (eg \1). So the modifier prefixes the match.Compost
Note that if you have "match case" checked, you'll want [a-z] not [A-Z]Campion
Wow! You can even do something like \L\u to title-case a word using a single capture group. But you can't do \e to suppress case conversion for a single character - this just emits an ESC character.Lyall
@JohnC I can confirm that this is still the case in v7.5.9.Lyall
T
12

Regex replacement cannot execute function (like capitalization) on matches. You'd have to script that, e.g. in PHP or JavaScript.

Update: See Jonas' answer.

I built myself a Web page called Text Utilities to do that sort of things:

  • paste your text
  • go in "Find, regexp & replace" (or press Ctrl+Shift+F)
  • enter your regex (mine would be ^(.*?\=\s*\w)(.*)$)
  • check the "^$ match line limits" option
  • choose "Apply JS function to matches"
  • add arguments (first is the match, then sub patterns), here s, start, rest
  • change the return statement to return start + rest.toLowerCase();

The final function in the text area looks like this:

return function (s, start, rest) {
     return start + rest.toLowerCase();
};

Maybe add some code to capitalize some words like "Really" and "Here".

Tasso answered 24/6, 2009 at 15:55 Comment(3)
Thanks for the help streetpc. In Notepad++ I can apply the replace function using regex which is suh-weet. Then again, your site is suh-weet too. This pretty much nails it unless I have a new sentence within the variable portion, but I can fix that by finding ". [a-z]" and fixing the case on the first letter following a period-space combination. I'm going to leave the question open for a bit to see if any Notepad++ people respond but you definitely solved my problem. Thanks!Earthshaker
This can be done in Vim. vim.wikia.com/wiki/Changing_case_with_regular_expressionsGlutinous
nedit can also do this with its regex search and replace.Advowson
A
5

In Notepad++ you can use a plugin called PythonScript to do the job. If you install the plugin, create a new script like so:

enter image description here

Then you can use the following script, replacing the regex and function variables as you see fit:

import re

#change these
regex = r"[a-z]+sym"
function = str.upper

def perLine(line, num, total):
for match in re.finditer(regex, line):
    if match:
        s, e = match.start(), match.end()
        line = line[:s] + function(line[s:e]) + line[e:]
        editor.replaceWholeLine(num, line)

editor.forEachLine(perLine)

This particular example works by finding all the matches in a particular line, then applying the function each each match. If you need multiline support, the Python Script "Conext-Help" explains all the functions offered including pymlsearch/pymlreplace functions defined under the 'editor' object.

When you're ready to run your script, go to the file you want it to run on first, then go to "Scripts >" in the Python Script menu and run yours.

Note: while you will probably be able to use notepad++'s undo functionality if you mess up, it might be a good idea to put the text in another file first to verify it works.

P.S. You can 'find' and 'mark' every occurrence of a regular expression using notepad++'s built-in find dialog, and if you could select them all you could use TextFX's "Characters->UPPER CASE" functionality for this particular problem, but I'm not sure how to go from marked or found text to selected text. But, I thought I would post this in case anyone does...

Edit: In Notepad++ 6.0 or higher, you can use "PCRE (Perl Compatible Regular Expression) Search/Replace" (source: http://sourceforge.net/apps/mediawiki/notepad-plus/?title=Regular_Expressions) So this could have been solved using a regex like (. )([A-z])(.+) with a replacement argument like \1\U\2\3.

Amphitryon answered 19/2, 2013 at 20:51 Comment(1)
For those looking for a good reference to PCRE search and replace syntax (including case conversion etc.), you may look at this Perldoc site: perldoc.perl.org/perlre.html -- I wasn't able to find any other location where things like \U were documented!Sportsman
P
4

The questioner had a very specific case in mind. As a general "change to sentence case" in notepad++ the first regexp suggestion did not work properly for me. while not perfect, here is a tweaked version which was a big improvement on the original for my purposes :

find:    ([\.\r\n][ ]*)([A-Za-z\r])([^\.^\r^\n]+) 
replace: \1\U\2\L\3

You still have a problem with lower case nouns, names, dates, countries etc. but a good spellchecker can help with that.

Participate answered 31/1, 2015 at 1:43 Comment(1)
thanks! \U worked for me. I needed to change a bunch of variable names from underscore to camel case, then in sublime I searched the expresion (.+)_(.+) (<- this isn't a baby monster eyes) and repleaced it with $1\U$2Quadrennium

© 2022 - 2024 — McMap. All rights reserved.