Match the nth word in a line
Asked Answered
S

4

5

In the app I use, I cannot select a match Group 1.

The result that I can use is the full match from a regex.

but I need the 5th word "jumps" as a match result and not the complete match "The quick brown fox jumps"

^(?:[^ ]*\ ){4}([^ ]*)

The quick brown fox jumps over the lazy dog

Here is a link https://regex101.com/r/nB9yD9/6

Sphene answered 24/1, 2019 at 16:19 Comment(12)
What language/tool are you using? Does it support variable length lookbehind?Kentiggerma
the app is ecoDMS 18.09 the programming language as i know is qt 5.7, i dont know if it supports lookbehind, i can try it, can you show me a sample regex?Sphene
Have a try with ^(?<=(?:\S+ ){4})\S+ or ^(?:\S+ ){4}\S+Kentiggerma
i try this "(?<=(\b\s))(\w*)" and it gives me the second word "quick" but how can i get the third word "brown" and the so on "fox" ...Sphene
Which word do you need? In the question you've said the 5th but in comment you say the 3rd, could you clarify by editing the question?Kentiggerma
i need all words from a string with six words, for every word i need a single pattern, the only restriction is, that the app takes only the full match from regex, no group1 group2 etcSphene
If I well understand, you want a regex for the first word, another one for the second and so on? 1srt: \w+, 2nd: ^(?<=\w+ )\w+, 3rd: ^(?<=(?:\w+ ){2})\w+, 4th: ^(?<=(?:\w+ ){3})\w+ and so on... Is that what you want?Kentiggerma
Yes, that is what i need, but if i use something like this "^(?<=(?:\w+ ){2})\w+" i got "? The preceding token is not quantifiable"Sphene
I guess your app doesn't support variable length lookbehind. Have a try with: ^\w+ \K\w+ and ^(?:\w+ ){2}\K\w+ and so on...Kentiggerma
\K like this "^(?:\w+ ){2}\K\w+" does not work, without it like this "^(?:\w+ ){2}\K\w+" i got a full match with three words. ( regex101.com/r/pR22LK/1 )Sphene
It works regex101.com/r/pR22LK/2 with PCRE. Your app doesn't seem to support it, but I don't know how it works. I think you have to extract all the words in an array then select the ones you want.Kentiggerma
Hello Toto, your solution works in the the App too, like PCRE, thanks !!!Sphene
S
1

It works https://regex101.com/r/pR22LK/2 with PCRE. Your app doesn't seem to support it, but I don't know how it works. I think you have to extract all the words in an array then select the ones you want. – Toto 23 hours ago

Hello Toto, your solution works in the the App too, like PCRE, thanks !!! – gsxr1300 just now edit

Sphene answered 29/1, 2019 at 13:30 Comment(0)
G
6

Since you need the entire match to be only the n-th word, you can try to use 'positive lookbehind', which allows you to only match something, if it is preceded by something else.

To match only the fifth word, you want to match the first word that has four words before it.

To match four words (i.e. word characters followed by a space character):

(\w+\s){4}

To match a single word, but only if it was preceded by four other words:

(?<=(\w+\s){4})(\w+)

Test the result here https://regex101.com/r/QIPEkm/1

Glaze answered 25/1, 2019 at 1:43 Comment(2)
? The preceding token is not quantifiable ,Your pattern contains one or more errors, please see the explanation section above.Sphene
The regex functions just fine for me on regex101, in RegexBuddy and in a simple Python script. I'm assuming you changed something that broke it?Glaze
K
2

To find the 3rd word of sentence, use:

^(?:\w+ ){2}\K\w+

Explanation:

^               # beginning of line
    (?:         # start non capture group
        \w+     # 1 or more word character
                # a space
    ){2}        # group must appear twice (change {2} in {3} to get the 4th word and so on)
    \K          # forget all we have seen until this position
    \w+         # 1 or more word character

Demo

Kentiggerma answered 29/1, 2019 at 13:48 Comment(0)
S
1

It works https://regex101.com/r/pR22LK/2 with PCRE. Your app doesn't seem to support it, but I don't know how it works. I think you have to extract all the words in an array then select the ones you want. – Toto 23 hours ago

Hello Toto, your solution works in the the App too, like PCRE, thanks !!! – gsxr1300 just now edit

Sphene answered 29/1, 2019 at 13:30 Comment(0)
T
0

To match "the first" four words (i.e. word characters followed by a space character):

^(\w+\s){4}

To match a single word, but only if it was preceded by "the first" four other words:

(?<=^(\w+\s){4})(\w+)

Note the ^ difference

If you want to know what this "?<=" mean, check this: https://mcmap.net/q/25940/-regex-lookahead-lookbehind-and-atomic-groups

Thaxter answered 23/3, 2021 at 17:10 Comment(1)
Can you edit your answer and explain the difference? What does ?<= do in a regular expression and why is that needed here?Ditzel

© 2022 - 2024 — McMap. All rights reserved.