RegEx to match Bitcoin addresses?
Asked Answered
K

10

50

I am trying to come up with a regular expression to match Bitcoin addresses according to these specs:

A Bitcoin address, or simply address, is an identifier of 27-34 alphanumeric characters, beginning with the number 1 or 3 [...]

I figured it would look something like this

/^[13][a-zA-Z0-9]{27,34}/

Thing is, I'm not good with regular expressions and I haven't found a single source to confirm this would not create false negatives.

I've found one online that's ^1[1-9A-Za-z][^OIl]{20,40}, but I don't even know what the [^OIl] part means and it doesn't seem to match the 3 a Bitcoin address could start with.

Kaffir answered 10/2, 2014 at 17:15 Comment(7)
Your referenced page has a section "Address validation". Why not use the technique provided in the link over there? (Quote: "[...] it is advisable to use a method from this thread rather than to just check for string length, allowed characters, or that the address starts with a 1 or 3")Ecumenicalism
@Ecumenicalism If all bitcoin addresses have that format, then I don't see a reason why this wouldn't work. Besides, I'm not looking for a rigurous validation (after all, it could be a valid address and not yet exist) but rather something that discards addresses that are clearly invalid.Kaffir
@fedeetz: bitcoin addresses do contain a checksum. You can't validate a bitcoin address using a regexp because all bitcoin addresses have that checksum. It is true that you regexp will discard many addresses which are clearly invalid... But your regexp will also accept an insane number of invalid ones. The very purpose of that checksum is to prevent people from using invalid addresses and I'd tend to think that the author(s) of bitcoins are very smart people and knew what they were doing. Doing "validation" without verifying the checksum whose very purpose is validation makes no sense.Malacca
@Malacca That's not a problem for me, as I said, as long as it discards clearly invalid addresses and it doesn't generate false negatives, it's enough. This is not for an application open to the public, only to a couple developers. The whole point is that if they have a typo or copy only half of the address, for the app to warn them.Kaffir
@fedeetz your regex will match invalid Bitcoin addresses, as the characters O, I and l are not valid characters in a Bitcoin address.Bell
To testnet: /^[mn2][a-zA-Z0-9]{27,34}/Luz
rosettacode.org/wiki/Bitcoin/address_validationDaffi
E
13

[^OIl] matches any character that's not O, I or l. The problems in your regex are:

  • You don't have a $ at the end, so it'd match any string beginning with a BC address.
  • You didn't count the first character in your {27,34} - that should be {26,33}

However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

Energetic answered 10/2, 2014 at 17:17 Comment(3)
It seems to me the purpose of the regex is finding potential bitcoin addresses, not necessarily valid ones.Bell
Regex module would be good for light-weight like browser plugin, or webcrawler.Bangor
Or find valid addresses not necessarily existing addresses. Whether or not addresses exist in your block chain is based upon when and how often you sinc'd. Figuring out if the address is valid or not is a completely different exercise.Digraph
B
64
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

will match a string that starts with either 1 or 3 and, after that, 25 to 34 characters of either a-z, A-Z, or 0-9, excluding l, I, O and 0 (not valid characters in a Bitcoin address).

Bell answered 13/6, 2014 at 12:45 Comment(3)
Since a valid Bitcoin candidate must be 26 and 35 characters long, the interval should be {25, 34}, because of the ^[13] at the starts take away a character from the count. See specs: en.bitcoin.it/wiki/AddressPatiencepatient
exception that the uppercase letter "O", uppercase letter "I", lowercase letter "l", and the number "0" are never used to prevent visual ambiguity.Harberd
bc1q5lm8v27uf9v8nz6yczg3gxraflxlas4jvr0zuf comes out as invalid - but it is a valid address...Electro
A
15
^(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$

Based on the new address type Bech32

Automotive answered 6/2, 2018 at 13:13 Comment(3)
The valid address bc1q4r8h8vqk02gnvlus758qmpk8jmajpy2ld23xtr73a39ps0r9z82qq0qqye does not work.Luz
changing last number to 59 catches Felipe's exampleQuickstep
so if address is inside text do I need to remove ^ and $ ?Angloindian
V
14
^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$

bitcoin address is

  • an identifier of 26-35 alphanumeric characters
  • beginning with the number 1 or 3
  • random digits
  • uppercase
  • lowercase letters
  • with the exception that the uppercase letter O, uppercase letter I, lowercase letter l, and the number 0 are never used to prevent visual ambiguity.
Venipuncture answered 22/6, 2015 at 10:30 Comment(0)
E
13

[^OIl] matches any character that's not O, I or l. The problems in your regex are:

  • You don't have a $ at the end, so it'd match any string beginning with a BC address.
  • You didn't count the first character in your {27,34} - that should be {26,33}

However, as mentioned in a comment, a regex is not a good way to validate a bitcoin address.

Energetic answered 10/2, 2014 at 17:17 Comment(3)
It seems to me the purpose of the regex is finding potential bitcoin addresses, not necessarily valid ones.Bell
Regex module would be good for light-weight like browser plugin, or webcrawler.Bangor
Or find valid addresses not necessarily existing addresses. Whether or not addresses exist in your block chain is based upon when and how often you sinc'd. Figuring out if the address is valid or not is a completely different exercise.Digraph
L
7

Based on answer of runeks and Erhard Dinhobl I got this that accepts bech32 and legacy:

\b(bc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[13][a-km-zA-HJ-NP-Z1-9]{25,35})\b

Including testnet address:

\b((bc|tb)(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|([13]|[mn2])[a-km-zA-HJ-NP-Z1-9]{25,39})\b

Only testnet:

\b(tb(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})|[mn2][a-km-zA-HJ-NP-Z1-9]{25,39})\b
Luz answered 15/1, 2020 at 17:46 Comment(0)
P
2

Based on the description here: https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki I would say the regex for a Bech32 bitcoin address for Version 1 and Version 0 (only for mainnet) is:

\bbc(0([ac-hj-np-z02-9]{39}|[ac-hj-np-z02-9]{59})|1[ac-hj-np-z02-9]{8,87})\b

Here are some other links where I found infos:

Punctate answered 17/5, 2018 at 12:43 Comment(0)
B
2

for mainnet bitcoin

/^([13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/

if you don't want to understand the above regex you can skip the detail below

breaking it down

For regular addresses

/[13]{1}/

address will start with 1 or 3, {1} defines that only match one character in square bracket

/[13]{1}[a-km-zA-HJ-NP-Z1-9]/

cannot have l (small el), I (capital eye), O (capital O) and 0 (zero)

/[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}/

can be 27 to 34 characters long, remember we already checked the first character to be 1 or 3, so remaining address will be 26 to 33 characters long

For segwit

/bc1/

starts with bc1

/bc1[a-z0-9]/

can only contain lower case letters and numbers

/bc1[a-z0-9]{39,59}/

can be 42 to 62 characters long, we already checked first three characters to be bc1, so remaining address will be 39 to 59 characters long

Brierroot answered 4/2, 2022 at 8:6 Comment(0)
Z
0

As the OP didn't provide a specific use case (only matching criteria) and I came across this in researching methods to detect BitCoin addresses, wanted to post back and share with the community.

These RegEx provided will find BitCoin addresses either at the start of a line and/or end of the line. My use case was to find BitCoin addresses in the body of an email given the rise of blackmail/sextortion (Reference: https://krebsonsecurity.com/2018/07/sextortion-scam-uses-recipients-hacked-passwords/) - so these weren't effective solutions (as outlined later). The proposed RegEx will catch many FPs in email, due to filenames and other identifiers within URLs. I am not knocking the solutions, as they work for certain use cases, but they simply don't work for mine. One variation caught many spam emails within a short timeframe of passive alerting (examples follow).

Here are my test cases:

--------------------------------------------------------
BitCoin blackmail formats observed (my org and online):
--------------------------------------------------------
BTC Address: 1JHwenDp9A98XdjfYkHKyiE3R99Q72K9X4 
BTC Address: 1Unoc4af6gCq3xzdDFmGLpq18jbTW1nZD
BTC Address: 1A8Ad7VbWDqwmRY6nSHtFcTqfW2XioXNmj
BTC Address: 12CZYvgNZ2ze3fGPFzgbSCELBJ6zzp2cWc
BTC Address: 17drmHLZMsCRWz48RchWfrz9Chx1osLe67

Receiving Bitcoin Address: 15LZALXitpbkK6m2QcbeQp6McqMvgeTnY8
Receiving Bitcoin Address: 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5

--------------------------------------------------------
Other possible BitCoin test cases I added:
--------------------------------------------------------
- What if text comes before and/or after on same line?  Or doesn't contain BitCoin/BTC/etc. anywhere (or anywhere close to the address)?
    Send BitCoin payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5
    1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.
    Send payments here 1MAFzYQhm6msF2Dxo3Nbox7i61XvgQ7og5 to keep your secrets safe.

- Standalone address:
    1Dvd7Wb72JBTbAcfTrxSJCZZuf4tsT8V72

--------------------------------------------------------
Redacted Body content generating FPs from spam emails:
--------------------------------------------------------
src=3D"https://example.com/blah=3D2159024400&t=3DXWP9YVkAYwkmif9RgKeoPhw2b1zdMnMzXZSGRD_Oxkk"

"cursor:pointer;color:#6A6C6D;-webkit-text-size-blahutm_campaign%253Drdboards%2526e_t%253Dd5c2deeaae5c4a8b8d2bff4d0f87ecdd%2526utm_cont=blah

src=3D"https://example.com/blah/74/328e74997261d5228886aab1a2da6874.jpg" 

src=3D"https://example.com/blah-1c779f59948fc5be8a461a4da8d938aa.jpg"

href=3D"https://example.com/blah-0ff3169b28a6e17ae8a369a3161734c1?alert_=id=blah

Some RegEx samples I tested (won't list those I'd knock for greedy globbing with backtraces):

^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
[13][a-km-zA-HJ-NP-Z1-9]{25,34}$
    (Too narrow and misses BitCoin addresses within a paragraph)

(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}$
    (Still misses text after BTC on same line and triples execution time)

\W[13][a-km-zA-HJ-NP-Z1-9]{25,34}\W
    (Too broad and catches URL formats)

The current RegEx I am evaluating which catches all my known/crafted sample cases and eliminates known FPs (specifically avoiding end of sentence period for URL filename FPs):

[13][a-km-zA-HJ-NP-Z1-9]{25,34}\s

One reference point for execution times (shows cost in steps and time): https://regex101.com/

Please feel free to weigh in or provide suggestions on improvements (I am by no means a RegEx master). As I further vet it against email detection of Body content, I will update if other FP cases are observed or more efficient RegEx is derived.

Seth

Zoarah answered 19/7, 2018 at 1:9 Comment(0)
P
-1

I am not into complicated solutions and this regex served the purpose for the most simplest validation, when you just don't want to receive complete nonsense.

\w{25,}
Poikilothermic answered 28/7, 2020 at 18:8 Comment(0)
D
-1

For matching legacy, nested SegWit, and native SegWit addresses:

/^(?:[13]{1}[a-km-zA-HJ-NP-Z1-9]{26,33}|bc1[a-z0-9]{39,59})$/

Source: Regex for Bitcoin Addresses.

Dogmatist answered 30/12, 2021 at 12:6 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.