How to determine if a number is a prime with regex?
Asked Answered
C

4

138

I found the following code example for Java on RosettaCode:

public static boolean prime(int n) {
  return !new String(new char[n]).matches(".?|(..+?)\\1+");
}
  • I don't know Java in particular but understand all aspects of this snippet except for the regex itself
  • I have basic to basic-advanced knowledge of Regex as you find it in the built-in PHP functions

How does .?|(..+?)\\1+ match prime numbers?

Caduceus answered 8/5, 2010 at 17:58 Comment(10)
What does return !new does?Jacquelynejacquelynn
@Amir Rachum: !new String(new char[n]).matches(".?|(..+?)\\1+") is equivalent to !((new String(new char[n])).matches(".?|(..+?)\\1+")).Guesswarp
@kitlite: Interesting subject!Corpus
This is not only computationally expensive but it's also potentially devastatingly memory-expensive. If anyone chooses to use this approach, which i'd advise against since the algorithm for finding primes is so simple (why in the world complicate it and make it so wasteful), a check should be conducted prior to the "new char[n]" to ensure it's below a reasonable threshold. E.g. Call "prime(Integer.MAX_VALUE)" and then file a bug when it throws OutOfMemoryError.Ampere
@nicerobot: Lighten up?Stigmatize
@nicerobot: actually, I take that back. I originally figured the academic nature of this question implied its use only for learning purposes, and that you were being an obnoxious twat. However on second thought that's not the case; it's never mentioned or even implied in the question that the regex is only for learning purposes. In fact my first impression of it is that it's very simple-looking as far as code snippets go, so a beginner might indeed assume it could be used in practice. +1.Stigmatize
@incrediman no worries. I can see how you might think that. It was only my intention to warn of the consequences of using this, not to discourage learning how it works. A simple "Please don't deploy this." prior to the rest of my comment might have made it less condescending sounding from your initial perspective.Ampere
Title & question are really misleading here, regex is complex enough for people who are not using complex regexes everyday. Calling it "determine if number is prime with regex" or saying that "regex match prime numbers" complicates it even more. Here we simply determine whether string of same characters can be presented as combination of equal substrings with length >1 and <n or not. And yes if that is true, then string length isnt prime.Unknown
so, in words, ".?|(..+?)\\1+" matches a string containing "zero or 1 of some character, or two or more characters, repeated twice or more". That's just equivalent to isNotPrime n = n<2 || any (\i-> rem n i==0) [2..div n 2].Gratitude
This question has been added to the Stack Overflow Regular Expression FAQ, under "Advanced Regex-Fu".Virgievirgil
H
123

You said you understand this part, but just to emphasize, the String generated has a length equal to the number supplied. So the string has three characters if and only if n == 3.

.?

The first part of the regex says, "any character, zero or one times". So basically, is there zero or one character-- or, per what I mentioned above, n == 0 || n == 1. If we have the match, then return the negation of that. This corresponds with the fact that zero and one are NOT prime.

(..+?)\\1+

The second part of the regex is a little trickier, relying on groups and backreferences. A group is anything in parentheses, which will then be captured and stored by the regex engine for later use. A backreference is a matched group that is used later on in the same regex.

The group captures 1 character, then 1 or more of any character. (The + character means one or more, but ONLY of the previous character or group. So this is not "two or four or six etc. characters", but rather "two or three etc." The +? is like +, but it tries to match as few characters as possible. + normally tries to gobble the whole string if it can, which is bad in this case because it prevents the backreference part from working.)

The next part is the backreference: That same set of characters (two or more), appearing again. Said backreference appears one or more times.

So. The captured group corresponds to a natural number of characters (from 2 onward) captured. Said group then appears some natural number of times (also from 2 onward). If there IS a match, this implies that it's possible to find a product of two numbers greater than or equal to 2 that match the n-length string... meaning you have a composite n. So again, return the negation of the successful match: n is NOT prime.

If no match can be found, then you can't come up with a your product of two natural numbers greater than or equal to 2... and you have both a non-match and a prime, hence again the returning of the negation of the match result.

Do you see it now? It's unbelievably tricky (and computationally expensive!) but it's also kind of simple at the same time, once you get it. :-)

I can elaborate if you have further questions, like on how regex parsing actually works. But I'm trying to keep this answer simple for now (or as simple as it can ever be).

Homosexuality answered 8/5, 2010 at 18:10 Comment(4)
I tried this logic with JS in the chrome dev console. on the web page. and just passed 5 to check. The page crashed!Strychnine
The comment below gives better explanation. Please read it before you move on!Familiarity
"Better" is subjective- I would say it approaches the problem from a different angle and is a wonderful complement to this answer. :-)Homosexuality
I actually wrote a blog post explaining this with more detail: Demystifying The Regular Expression That Checks If A Number Is Prime.Ferdinand
E
72

I will explain the regex part outside of primality testing: the following regex, given a String s which consists of repeating String t, finds t.

    System.out.println(
        "MamamiaMamamiaMamamia".replaceAll("^(.*)\\1+$", "$1")
    ); // prints "Mamamia"

The way it works is that the regex captures (.*) into \1, and then sees if there's \1+ following it. Using the ^ and $ ensures that a match must be of the whole string.

So, in a way, we're given String s, which is a "multiple" of String t, and the regex will find such t (the longest possible, since \1 is greedy).

Once you understand why this regex works, then (ignoring the first alternate in OP's regex for now) explaining how it's used for primality testing is simple.

  • To test primality of n, first generate a String of length n (filled with the same char)
  • The regex captures a String of some length (say k) into \1, and tries to match \1+ to the rest of the String
    • If there is a match, then n is a proper multiple of k, and therefore n is not prime.
    • If there's no match, then no such k exists that divides n, and n is therefore a prime

How does .?|(..+?)\1+ match prime numbers?

Actually, it doesn't! It matches String whose length is NOT prime!

  • .? : The first part of the alternation matches String of length 0 or 1 (NOT prime by definition)
  • (..+?)\1+ : The second part of the alternation, a variation of the regex explained above, matches String of length n that is "a multiple" of a String of length k >= 2 (i.e. n is a composite, NOT a prime).
    • Note that the reluctant modifier ? is actually not needed for correctness, but it may help speed up the process by trying smaller k first

Note the ! boolean complement operator in the return statement: it negates the matches. It's when the regex DOESN'T match, n is prime! It's a double-negative logic, so no wonder it's kind of confusing!!


Simplification

Here's a simple rewriting of the code to make it more readable:

public static boolean isPrime(int n) {
    String lengthN = new String(new char[n]);
    boolean isNotPrimeN = lengthN.matches(".?|(..+?)\\1+");
    return !isNotPrimeN;
}

The above is essentially the same as the original Java code, but broken apart into multiple statements with assignments to local variables to make the logic easier to understand.

We can also simplify the regex, using finite repetition, as follows:

boolean isNotPrimeN = lengthN.matches(".{0,1}|(.{2,})\\1+");

Again, given a String of length n, filled with the same char,

  • .{0,1} checks if n = 0,1, NOT prime
  • (.{2,})\1+ checks if n is a proper multiple of k >= 2, NOT prime

With the exception of the reluctant modifier ? on \1 (omitted for clarity), the above regex is identical to the original.


More fun regex

The following regex uses similar technique; it should be educational:

System.out.println(
    "OhMyGod=MyMyMyOhGodOhGodOhGod"
        .replaceAll("^(.+)(.+)(.+)=(\\1|\\2|\\3)+$", "$1! $2! $3!")
); // prints "Oh! My! God!"

See also

Elane answered 8/5, 2010 at 18:19 Comment(3)
+1: I think your approach is probably better than mine was. No idea why I got so many upvotes or the check mark... you deserve it more, I think. :-( SorryHomosexuality
@Platinum: Wow, I'd never thought you'd go around saying that publicly! Thanks for the support. Maybe I'll get a [Populist] some day from this.Elane
Well, it's just the truth (as I perceive it)... not a huge deal really. I'm not here for rep (though it's always a bonus and a pleasant surprise)... I'm here to try to answer questions when I can. Thus it should come as no surprise that I can admit when someone has done it better than I have in a particular question.Homosexuality
L
27

Nice regex trick (though very inefficient)... :)

The regex defines non-primes as follows:

N is not prime if and only if N<=1 OR N is divisible by some K>1.

Instead of passing the simple digital representation of N to the regex engine, it is fed with a sequence of length N, composed of a repeating character. The first part of the disjunction checks for N=0 or N=1, and the second one looks for a divisor K>1, using backreferences. It forces the regex engine to find some non empty sub-sequence that can be repeated at least twice in order to form the sequence. If such a subsequence exists, it means that its length divides N, hence N is not prime.

Law answered 8/5, 2010 at 18:48 Comment(1)
Oddly enough, even after repeatedly reading the other lengthier and more technical explanations, I found this explanation to be the one that made it 'click' in my head.Ondrej
W
2
/^1?$|^(11+?)\1+$/

Apply to numbers after conversion to base 1 (1=1, 2=11, 3=111, ...). Non-primes will match this. If it doesn't match, it is prime.

Explanation here.

Walley answered 21/7, 2010 at 17:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.