return the second instance of a regex search in a line
Asked Answered
H

2

10

i have a file that has a specific line of interest (say, line 12) that looks like this:

conform: 244216 (packets) exceed: 267093 (packets)

i've written a script to pull the first number via regex and dump the value into a new file:

getexceeds = open("file1.txt", "r").readlines()[12]
output = re.search(r"\d+", getexceeds).group(0)

with open("file2.txt", "w") as outp:
    outp.write(output)

i am not quite good enough yet to return the second number in that line into a new file -- can anyone suggest a way?

thanks as always for any help!

Hirokohiroshi answered 18/9, 2014 at 13:26 Comment(1)
re.findall(r"\d+", getexceeds)[1]Lillielilliputian
D
14

Another possibility would be to use re.findall() which returns a list:

>>>m = re.findall("\d+", strg) 
>>>m
['244216', '267093']
Disintegrate answered 18/9, 2014 at 13:42 Comment(2)
This answer will also work, yes -- but your regex \d{6} is no good. OP's input suggests it's a number of packets. There's no reason to assume that it will always be a 6-digit number.Neuron
Good point. Thanks for that FrobberOfBits. I edited my post accordingly.Disintegrate
N
12

You've got it almost all right; your regex is only looking for the first match though.

match = re.search(r"(\d+).*?(\d+)", getexceeds)
firstNumber = match.group(1)
secondNumber = match.group(2)

Notice that the regex is looking for two capturing groups (in parens) both a sequence of digits. What's between is just anything - .*? means some minimal number of any characters.

Here's a little test I ran from the shell:

>>> str = 'conform: 244216 (packets) exceed: 267093 (packets)'
>>> match = re.search(r"(\d+).*?(\d+)", str)
>>> print match.group(1)
244216
>>> print match.group(2)
267093
Neuron answered 18/9, 2014 at 13:32 Comment(1)
One suggestion: do NOT call a variable str (or a name of a type, like list), because this will overwrite the str default type and will cause improper algorithm behavior.Loella

© 2022 - 2024 — McMap. All rights reserved.