Extract IBAN from text with Python
Asked Answered
B

4

6

I want to extract IBAN numbers from text with Python. The challenge here is, that the IBAN itself can be written in so many ways with spaces bewteen the numbers, that I find it difficult to translate this in a usefull regex pattern.

I have written a demo version which tries to match all German and Austrian IBAN numbers from text.

^DE([0-9a-zA-Z]\s?){20}$

I have seen similar questions on stackoverflow. However, the combination of different ways to write IBAN numbers and also extracting these numbers from text, makes it very difficult to solve my problem.

Hope you can help me with that!

Bookrack answered 15/1, 2021 at 11:9 Comment(4)
\b(?:DE|AT)(?:\s?[0-9a-zA-Z]){20}\b? See regex101.com/r/PRDDaT/2Adagietto
Wow, this looks like a perfect match!!! Awesome!Bookrack
German IBAN numbers are 22 chars long, Austrian are 20. So you can not treat them the same.Lubin
Interesting, it looks like that's correct, so it should be \b(?:DE|AT)(?:\s?[0-9a-zA-Z]){18}(?:(?:\s?[0-9a-zA-Z]){2})?\bAdagietto
S
1

In general, to match German and Austrian IBAN codes, you can use

codes = re.findall(r'\b(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18})\b(?!\s*[0-9])', text)

Details:

  • \b - word boundary
  • (DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18}) - Group 1: DE and 20 repetitions of a digit with any amount of whitespace in between, or AT and then 18 repetitions of single digits eventaully separated with any amount of whitespaces
  • \b(?!\s*[0-9]) - word boundary that is NOT immediately followed with zero or more whitespaces and an ASCII digit.

See this regex demo.

For the data you showed in the question that includes non-proper IBAN codes, you can use

\b(?:DE|AT)(?:\s?[0-9a-zA-Z]){18}(?:(?:\s?[0-9a-zA-Z]){2})?\b

See the regex demo. Details:

  • \b - word boundary
  • (?:DE|AT) - DE or AT
  • (?:\s?[0-9a-zA-Z]){18} - eighteen occurrences of an optional whitespace and then an alphanumeric char
  • (?:(?:\s?[0-9a-zA-Z]){2})? - an optional occurrence of two sequences of an optional whitespace and an alphanumeric char
  • \b - word boundary.
Speight answered 15/1, 2021 at 11:20 Comment(2)
Just a friendly note, but the IBAN numbers can contain only numbers from the ISO code onwards. You might return false positives here including Austrian IBAN numbers of over 18 digits.Lubin
@Lubin Thanks, noted. However, it appears to me OP does not have pure IBAN numbers in their data, so proper IBAN details might not be relevant.Adagietto
L
4
ISO landcode Verification# Bank# Account#
Germany 2a 2n 8n 10n
Austria 2a 2n 5n 11n

Note: a - alphabets (letters only), n - numbers (numbers only)

So the main difference is really the length in digits. That means you could try:

\b(?:DE(?:\s*\d){20}|AT(?:\s*\d){18})\b(?!\s*\d)

See the online demo.


  • \b - Word-boundary.
  • (?: - Open 1st non-capturing group.
    • DE - Match uppercase "DE" literally.
    • (?:- Open 2nd non-capturing group.
      • \s*\d - Zero or more spaces upto a single digit.
      • ){20} - Close 2nd non-capturing group and match it 20 times.
    • | - Or:
    • AT - Match uppercase "AT" literally.
    • (?:- Open 3rd non-capturing group.
      • \s*\d - Zero or more spaces upto a single digit.
      • ){18} - Close 2nd non-capturing group and match it 20 times.
    • ) - Close 1st non-capturing group.
  • \b - Word-boundary.
  • (?!\s*\d) - Negative lookahead to prevent any trailing digits.

It does show that your Austrian IBAN numbers are invalid. If you wish to extract up to the point where they would still be valid, I guess you can remove \b(?!\s*\d)

Lubin answered 15/1, 2021 at 11:28 Comment(0)
S
1

In general, to match German and Austrian IBAN codes, you can use

codes = re.findall(r'\b(DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18})\b(?!\s*[0-9])', text)

Details:

  • \b - word boundary
  • (DE(?:\s*[0-9]){20}|AT(?:\s*[0-9]){18}) - Group 1: DE and 20 repetitions of a digit with any amount of whitespace in between, or AT and then 18 repetitions of single digits eventaully separated with any amount of whitespaces
  • \b(?!\s*[0-9]) - word boundary that is NOT immediately followed with zero or more whitespaces and an ASCII digit.

See this regex demo.

For the data you showed in the question that includes non-proper IBAN codes, you can use

\b(?:DE|AT)(?:\s?[0-9a-zA-Z]){18}(?:(?:\s?[0-9a-zA-Z]){2})?\b

See the regex demo. Details:

  • \b - word boundary
  • (?:DE|AT) - DE or AT
  • (?:\s?[0-9a-zA-Z]){18} - eighteen occurrences of an optional whitespace and then an alphanumeric char
  • (?:(?:\s?[0-9a-zA-Z]){2})? - an optional occurrence of two sequences of an optional whitespace and an alphanumeric char
  • \b - word boundary.
Speight answered 15/1, 2021 at 11:20 Comment(2)
Just a friendly note, but the IBAN numbers can contain only numbers from the ISO code onwards. You might return false positives here including Austrian IBAN numbers of over 18 digits.Lubin
@Lubin Thanks, noted. However, it appears to me OP does not have pure IBAN numbers in their data, so proper IBAN details might not be relevant.Adagietto
A
0

Suppose you're using this validation in a class with self.input as the input string, use the following code. Though if you'd only like to validate the German and Austrian IBAN's, I'd suggest to delete all the other countries from the dictionary:

country_dic = {
                "AL": [28, "Albania"],
                "AD": [24, "Andorra"],
                "AT": [20, "Austria"],
                "BE": [16, "Belgium"],
                "BA": [20, "Bosnia"],
                "BG": [22, "Bulgaria"],
                "HR": [21, "Croatia"],
                "CY": [28, "Cyprus"],
                "CZ": [24, "Czech Republic"],
                "DK": [18, "Denmark"],
                "EE": [20, "Estonia"],
                "FO": [18, "Faroe Islands"],
                "FI": [18, "Finland"],
                "FR": [27, "France"],
                "DE": [22, "Germany"],
                "GI": [23, "Gibraltar"],
                "GR": [27, "Greece"],
                "GL": [18, "Greenland"],
                "HU": [28, "Hungary"],
                "IS": [26, "Iceland"],
                "IE": [22, "Ireland"],
                "IL": [23, "Israel"],
                "IT": [27, "Italy"],
                "LV": [21, "Latvia"],
                "LI": [21, "Liechtenstein"],
                "LT": [20, "Lithuania"],
                "LU": [20, "Luxembourg"],
                "MK": [19, "Macedonia"],
                "MT": [31, "Malta"],
                "MU": [30, "Mauritius"],
                "MC": [27, "Monaco"],
                "ME": [22, "Montenegro"],
                "NL": [18, "Netherlands"],
                "NO": [15, "Northern Ireland"],
                "PO": [28, "Poland"],
                "PT": [25, "Portugal"],
                "RO": [24, "Romania"],
                "SM": [27, "San Marino"],
                "SA": [24, "Saudi Arabia"],
                "RS": [22, "Serbia"],
                "SK": [24, "Slovakia"],
                "SI": [19, "Slovenia"],
                "ES": [24, "Spain"],
                "SE": [24, "Sweden"],
                "CH": [21, "Switzerland"],
                "TR": [26, "Turkey"],
                "TN": [24, "Tunisia"],
                "GB": [22, "United Kingdom"]
        } # dictionary with IBAN-length per country-code
    def eval_iban(self):
        # Evaluates how many IBAN's are found in the input string
        try:
            if self.input:
                hits = 0
                for word in self.input.upper().split():
                    iban = word.strip()
                    letter_dic = {ord(d): str(i) for i, d in enumerate(
                        string.digits + string.ascii_uppercase)} # Matches letter to number for 97-proof method
                    correct_length = country_dic[iban[:2]]
                    if len(iban) == correct_length[0]: # checks whether country-code matches IBAN-length
                        if int((iban[4:] + iban[:4]).translate(letter_dic)) % 97 == 1:
                            # checks whether converted letters to numbers result in 1 when divided by 97
                            # this validates the IBAN
                            hits += 1
                return hits
            return 0
        except KeyError:
            return 0
        except Exception:
             # logging.exception('Could not evaluate IBAN')
            return 0
Astonied answered 22/6, 2021 at 12:41 Comment(0)
L
0

if iban in chaine ==> (?<=(?i)IBAN.)CH\w{19} if iban not in chaine ==> CH\w{19}

Lutes answered 30/6, 2022 at 23:47 Comment(1)
I suspect that this is incorrect syntac for a python solution. Try stackoverflow.com/editing-helpSunrise

© 2022 - 2024 — McMap. All rights reserved.