Vim Regex Capture Groups [bau -> byau : ceu -> cyeu]
Asked Answered
E

5

207

I have a list of words:

bau
ceu
diu
fou
gau

I want to turn that list into:

byau
cyeu
dyiu
fyou
gyau

I unsuccessfully tried the command:

:%s/(\w)(\w\w)/\1y\2/g

Given that this doesn't work, what do I have to change to make the regex capture groups work in Vim?

Extend answered 11/11, 2013 at 8:43 Comment(3)
possible duplicate of Matching an expression including arbitrary lines with regex in Vim and #18628393Reflect
It's a little bit off-topic so I put it here as a comment but… I'd do :%norm ay<CR>.Hepburn
In your case (if it's exactly like described), it's an option to: move to 2nd column with l, enter Visual Block mode with Ctrl+v, mark whole column with Shift+g followed by l, then enter Insert mode with Shift+iand input 'y'. 7 keystrokes including finishing Esc to exit Insert mode. Not posting as an answer because it's not really about capture groups (which is what I searched for when I found this). :-)Starter
D
335

One way to fix this is by ensuring the pattern is enclosed by escaped parentheses:

:%s/\(\w\)\(\w\w\)/\1y\2/g

Slightly shorter (and more magic-al) is to use \v, meaning that in the pattern after it all ASCII characters except '0'-'9', 'a'-'z', 'A'-'Z' and '_' have a special meaning:

:%s/\v(\w)(\w\w)/\1y\2/g

See:

Diabolic answered 11/11, 2013 at 8:46 Comment(0)
K
71

You can also use this pattern which is shorter:

:%s/^./&y
  • %s applies the pattern to the whole file.
  • ^. matches the first character of the line.
  • &y adds the y after the pattern.
Koeninger answered 28/5, 2015 at 15:38 Comment(2)
Its amazing how after more than 10 years and a quite a bit of expertise in vim, I still learn new tricks like using "&" to add rather than to substitute. thanksErme
@Erme & is actually just another name for \0, which is the capture group containing the entire sequence that was matched.Griggs
R
57

If you don't want to escape the capturing groups with backslashes (this is what you've missed), prepend \v to turn Vim's regular expression engine into very magic mode:

:%s/\v(\w)(\w\w)/\1y\2/g
Reflect answered 11/11, 2013 at 8:47 Comment(5)
Ingo, sorry for the placing a question in the wrong place: This works find in :exmode; is there a way to do it in gvim find/replace dialogue box?Gradation
@JJoao: No, the find/replace box is for literal search and replacement only. You shouldn't be using that, anyway; it's just training wheels for Notepad users.Reflect
Ingo, thank you (it is not for me: I am happy with exmode, but for linguists colaborators in a dictionary project): it almost work - with \v... regexp work find; in the replacement string, & works but \ are protected (\1\r are lost)Gradation
@JJoao: Yes, that's what I found out while playing with it, too. I'm still skeptical whether using Vim without Ex mode is a good idea, but you could easily build your own search-and-replace dialog (internally powered by :s) via inputdialog() and a bit of Vimscript.Reflect
Ingo: Thank you again; I agree with your skeptical opinion. Inputdialg+:s+vimscript is probably the way gvim's find replace is built. For me \1 \r treatment is a gvim bug. I will try to post it in some vim specific list. In the meanwhile I will try my one vimscript-inputdialog.Gradation
W
18

You also have to escape the Grouping paranthesis:

:%s/\(\w\)\(\w\w\)/\1y\2/g

That does the trick.

Wenger answered 11/11, 2013 at 8:46 Comment(4)
Coming from Sublime Text 3, this is horrible. Why is the syntax like this? It doesn't make sense to escape characters that aren't literal, normal text.Retroact
@Retroact the parenthesis in this case aren't literal text. they are meta characters that delimit the groups to save for the replace expression. placing a non-escaped paren in an expression will match the literal character, as one would expect (this was what tripped up the OP).Dutch
I'm a regular vim user and I also think this is terrible. @RetroactScornful
@Retroact because vim is older than the normal regex syntax that we all use nowadays. Most people that use vim just use the \v version described in other answers though, rather than escape every little thing in their regexKevin
A
7

In Vim, on a selection, the following

:'<,'>s/^\(\w\+ - \w\+\).*/\1/

or

:'<,'>s/\v^(\w+ - \w+).*/\1/

parses

Space - Commercial - Boeing

to

Space - Commercial

Similarly,

apple - banana - cake - donuts - eggs

is parsed to

apple - banana

Explanation

  • ^ : match start of line
  • \-escape (, +, ) per the first regex (accepted answer) -- or prepend with \v (@ingo-karkat's answer)
  • \w\+ finds a word (\w will find the first character): in this example, I search for a word followed by - followed by another word)
  • .* after the capturing group is needed to find / match / exclude the remaining text

Addendum. This is a bit off topic, but I would suggest that Vim is not well-suited for the execution of more complex regex expressions / captures. [I am doing something similar to the following, which is how I found this thread.]

In those instances, it is likely better to dump the lines to a text file and edit it "in place"

sed -i ...

or in a redirect

sed ... > out.txt

In a terminal (or BASH script, ...):


echo 'Space Sciences - Private Industry - Boeing' | sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'

Space Sciences - Private Industry 

cat in.txt

Space Sciences - Private Industry - Boeing

sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt > ~/out.txt

cat ~/out.txt 

Space Sciences - Private Industry

## Caution: if you forget the > redirect, you'll edit your source.
## Subsequent > redirects also overwrite the output; use >> to append
## subsequent iterations to the output (preserving the previous output).
 
## To edit "in place" (`-i` argument/flag):

sed -i -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/' ~/in.txt

cat in.txt

Space Sciences - Private Industry 

sed -r 's/^((\w+ ){1,2}- (\w+ ){1,2}).*/\1/'

(note the {1,2}) allows the flexibility of finding {x,y} repetitions of a word(s) -- see https://www.gnu.org/software/sed/manual/html_node/Regular-Expressions.html .

Here, since my phrases are separated by -, I can simply tweak those parameters to get what I want.

Aubreir answered 24/3, 2021 at 19:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.