Getting the text that follows after the regex match
Asked Answered
F

5

116

I'm new to using Regex, I've been going through a rake of tutorials but I haven't found one that applies to what I want to do,

I want to search for something, but return everything following it but not the search string itself

e.g. "Some lame sentence that is awesome"

search for "sentence"

return "that is awesome"

Any help would be much appreciated

This is my regex so far

sentence(.*) 

but it returns: sentence that is awesome

Pattern pattern = Pattern.compile("sentence(.*)");

Matcher matcher = pattern.matcher("some lame sentence that is awesome");

boolean found = false;
while (matcher.find())
{
    System.out.println("I found the text: " + matcher.group().toString());
    found = true;
}
if (!found)
{
    System.out.println("I didn't find the text");
}
Flor answered 15/2, 2011 at 16:52 Comment(6)
What is your actual call? Are you using Matcher?Outlive
I'm using matcher and patternFlor
... and we'd still like to see your actual Java code in order to help evaluate what's wrong.Bequest
System.out.println("I found the text: " + "some lame sentance that is aweomse".substring(end()));Laflam
+1 if you're a grammar nazi like meOria
@DavidIsNotHere Nazi should have a capital N...Sentinel
M
201

You can do this with "just the regular expression" as you asked for in a comment:

(?<=sentence).*

(?<=sentence) is a positive lookbehind assertion. This matches at a certain position in the string, namely at a position right after the text sentence without making that text itself part of the match. Consequently, (?<=sentence).* will match any text after sentence.

This is quite a nice feature of regex. However, in Java this will only work for finite-length subexpressions, i. e. (?<=sentence|word|(foo){1,4}) is legal, but (?<=sentence\s*) isn't.

Mccoy answered 15/2, 2011 at 18:17 Comment(4)
You state that it should not include the positive lookbehind assertion. So I assume that ".*(?<=sentence)" should return everything up to, but not including "sentence". But it doesn't, it returns "sentence" as well. What am I missing?Whet
@user2184214: That's because it's a lookbehind assertion. .* matches any text, and then (?<=...) looks backwards for the word sentence, asserting in this case that the match ends with that word. If you want to stop before that word, you need to look ahead: .*(?=sentence) will match any text that is followed by sentence.Mccoy
For anyone looking for a way to match any text after one or another string, regexps like (?<=sentence1|sentence2).*, (?:(?<=sentence1)|(?<=sentence2)).* or even (?:sentence1|sentence2)(.*) might work.Urology
Great thanks! I was using your answer to find everything after a plus sign. So just for another example: (?<=\+).*Gerontocracy
H
22

Your regex "sentence(.*)" is right. To retrieve the contents of the group in parenthesis, you would call:

Pattern p = Pattern.compile( "sentence(.*)" );
Matcher m = p.matcher( "some lame sentence that is awesome" );
if ( m.find() ) {
   String s = m.group(1); // " that is awesome"
}

Note the use of m.find() in this case (attempts to find anywhere on the string) and not m.matches() (would fail because of the prefix "some lame"; in this case the regex would need to be ".*sentence(.*)")

Halinahalite answered 15/2, 2011 at 17:1 Comment(3)
Thanks, But what if I just want it to return "that is awesome"Flor
Thanks man, this worked great, I was hoping there was a way to do this with just the regular expression, if I cant find a way to do it that way, this will work aswellFlor
Likely a bad idea to add a "(.*)" at the end of the regexp for the performance...Siccative
H
10

if Matcher is initialized with str, after the match, you can get the part after the match with

str.substring(matcher.end())

Sample Code:

final String str = "Some lame sentence that is awesome";
final Matcher matcher = Pattern.compile("sentence").matcher(str);
if(matcher.find()){
    System.out.println(str.substring(matcher.end()).trim());
}

Output:

that is awesome

Heirdom answered 15/2, 2011 at 17:2 Comment(2)
matcher.find() is required before this, IMO.Laflam
@Laflam that's what I wrote: "after the match". Added sample code to illustrateHeirdom
J
2

You need to use the group(int) of your matcher - group(0) is the entire match, and group(1) is the first group you marked. In the example you specify, group(1) is what comes after "sentence".

Jackofalltrades answered 15/2, 2011 at 17:1 Comment(0)
B
2

You just need to put "group(1)" instead of "group()" in the following line and the return will be the one you expected:

System.out.println("I found the text: " + matcher.group(**1**).toString());
Brigade answered 17/5, 2012 at 15:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.