Python Regular Expression Named Capture Groups
Asked Answered
P

2

5

Im learning regular expressions, specifically named capture groups.

Having an issue where I'm not able to figure out how to write an if/else statement for my function findVul().

Basically how the code works or should work is that findVul() goes through data1 and data2, which has been added to the list myDATA.

If the regex finds a match for the entire named group, then it should print out the results. It currently works perfectly.

CODE:

import re

data1 = '''

dwadawa231d .2 vulnerabilities discovered dasdfadfad .One vulnerability discovered 123e2121d21 .12 vulnerabilities discovered sgwegew342 dawdwadasf

2r3232r32ee

'''

data2 = ''' d21d21 .2 vul discovered adqdwdawd .One vulnerability disc d12d21d .two vulnerabilities discovered 2e1e21d1d f21f21

'''

def findVul(data):
    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
    match = re.finditer(pattern, data)

    for x in match:
        print(x.group())


myDATA = [data1,data2] count_data = 1

for x in myDATA:
    print('\n--->Reading data{0}\n'.format(count_data))
    count_data+=1
    findVul(x)

OUTPUT:

--->Reading data1

2 vulnerabilities discovered
One vulnerability discovered
12 vulnerabilities discovered

--->Reading data2

Now I want to add an if/else statement to check if there are any matches for the entire named group.

I tried something like this, but it doesn't seem to be working.

CODE:

def findVul(data):
    pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
    match = re.finditer(pattern, data)

    if len(list(match)) != 0:
        print('\nVulnerabilities Found!\n')
        for x in match:
            print(x.group())

    else:
        print('No Vulnerabilities Found!\n')

OUTPUT:

--->Reading data1


Vulnerabilities Found!


--->Reading data2

No Vulnerabilities Found!

As you can see it does not print the vulnerabilities that should be in data1.

Could someone please explain the correct way to do this and why my logic is wrong. Thanks so much :) !!

Petula answered 12/10, 2018 at 3:29 Comment(0)
P
4

I did some more research after @AdamKG response.

I wanted to utlize the re.findall() function.

re.findall() will return a list of all matched substrings. In my case I have capture groups inside of my named capture group. This will return a list with tuples.

For example the following regex with data1:

pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+ 
(vulnerabilities|vulnerability)\s+discovered)')

match = re.findall(pattern, data)

Will return a list with tuples:

[('2 vulnerabilities discovered', '2', 'vulnerabilities'), ('One vulnerability 
discovered', 'One', 'vulnerability'), ('12 vulnerabilities discovered', '12', 
'vulnerabilities')]

My Final Code for findVul():

pattern = re.compile(r'(?P<VUL>(\d{1,2}|One)\s+(vulnerabilities|vulnerability)\s+discovered)')
match = re.findall(pattern, data)

if len(match) != 0:
    print('Vulnerabilties Found!\n')
    for x in match:
        print('--> {0}'.format(x[0]))
else:
    print('No Vulnerability Found!\n')
Petula answered 12/10, 2018 at 5:11 Comment(1)
Just a side note: you can shorten your Regex a bit with (vulnerabilit(?:y|ies)) part :) But on the other side it becomes a bit less legible. Demo on Regex101Barcellona
A
2

The problem is that re.finditer() returns an iterator that is evaluated when you do the len(list(match)) != 0 test; when you iterate over it again in the for-loop, it is already exhausted and there are no items left. The simple fix is just to add a match = list(match) line after the finditer() call.

Allogamy answered 12/10, 2018 at 3:37 Comment(1)
thanks so much AdamKG ! , this is my first time using the re.finditer() function , i usually use re.findall() . Would implementing re.finditer() be the best way to find my named capture group or is there a better way ?Petula

© 2022 - 2024 — McMap. All rights reserved.