Is Java Regex Thread Safe?
Asked Answered
C

5

123

I have a function that uses Pattern#compile and a Matcher to search a list of strings for a pattern.

This function is used in multiple threads. Each thread will have a unique pattern passed to the Pattern#compile when the thread is created. The number of threads and patterns are dynamic, meaning that I can add more Patterns and threads during configuration.

Do I need to put a synchronize on this function if it uses regex? Is regex in java thread safe?

Crackle answered 1/9, 2009 at 1:4 Comment(0)
O
156

Yes, from the Java API documentation for the Pattern class

Instances of this (Pattern) class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.

If you are looking at performance centric code, attempt to reset the Matcher instance using the reset() method, instead of creating new instances. This would reset the state of the Matcher instance, making it usable for the next regex operation. In fact, it is the state maintained in the Matcher instance that is responsible for it to be unsafe for concurrent access.

Okajima answered 1/9, 2009 at 1:14 Comment(4)
Pattern objects are thread safe, but the compile() method might not be. There have been two or three bugs over the years that caused compilation to fail in multithreaded environments. I would recommend doing the compilation in a synchronized block.Primp
Yes, there have been concurrency bugs raised in the Pattern class, and your advice of sychronized access is appreciated. However, the original developers of the Pattern class intended to make the Pattern class as thread safe, and that is the contract that any Java programmer should be able to rely on. To be frank, I'd rather have thread local variables and accept the minimal performance hit than rely on thread safe behavior by contract (unless I've seen the code). As they say "Threading is easy, correct synchronization is hard".Okajima
Note that the source of "Pattern" is in the Oracle JDK distribution (According to oracle.com/technetwork/java/faq-141681.html#A14 : "The Java 2 SDK, Standard Edition itself contains a file called src.zip that contains the source code for the public classes in the java package") so one can take a quick peek oneself.Exsanguinate
@DavidTonhofer I think our latest JDK may have the correct bug-free code, but since Java's intermediate .class files can be interpreted on any platform by any compatible VM, you can't be sure those fixes exist in that runtime. Of course most of the time you know which version the server is running, but it's tedious to check every single version.Objectivism
N
14

Thread-safety with regular expressions in Java

SUMMARY:

The Java regular expression API has been designed to allow a single compiled pattern to be shared across multiple match operations.

You can safely call Pattern.matcher() on the same pattern from different threads and safely use the matchers concurrently. Pattern.matcher() is safe to construct matchers without synchronization. Although the method isn't synchronized, internal to the Pattern class, a volatile variable called compiled is always set after constructing a pattern and read at the start of the call to matcher(). This forces any thread referring to the Pattern to correctly "see" the contents of that object.

On the other hand, you shouldn't share a Matcher between different threads. Or at least, if you ever did, you should use explicit synchronization.

Ninny answered 1/9, 2009 at 1:14 Comment(1)
@akf, BTW, you should note that that's a discussion site (much like this one). I'd consider anything you find there no better or worse than information that you'd find here (i.e., it isn't The One True Word From James Gosling).Corinacorine
C
2

While you need to remember that thread safety has to take into account the surrounding code as well, you appear to be in luck. The fact that Matchers are created using the Pattern's matcher factory method and lack public constructors is a positive sign. Likewise, you use the compile static method to create the encompassing Pattern.

So, in short, if you do something like the example:

Pattern p = Pattern.compile("a*b");
Matcher m = p.matcher("aaaaab");
boolean b = m.matches();

you should be doing pretty well.

Follow-up to the code example for clarity: note that this example strongly implies that the Matcher thus created is thread-local with the Pattern and the test. I.e., you should not expose the Matcher thus created to any other threads.

Frankly, that's the risk of any thread-safety question. The reality is that any code can be made thread-unsafe if you try hard enough. Fortunately, there are wonderful books that teach us a whole bunch of ways that we could ruin our code. If we stay away from those mistakes, we greatly reduce our own probability of threading problems.

Corinacorine answered 1/9, 2009 at 1:11 Comment(3)
@Jason S: thread locality is one very straightforward way to achieve thread safety even if the internal code isn't thread safe. If only one method could ever possibly access a particular method at a time, you've enforced thread safety externally.Corinacorine
ok, so you are just saying that re-creating a pattern from a string at the point of use, is better than storing it to be efficient, at the risk of dealing with concurrency issues? i'll grant you that. I was confused with that sentence about factory methods and public constructors, that seems like a red herring w/r/t this topic.Verbality
@Jason S, no, the factory methods and lack of constructors are some of the ways that you can reduce the threat of coupling with other threads. If the only way you can get the Matcher that goes with my Pattern is via p.matcher(), nobody else can side-effect my Matcher. However, I can still cause trouble for myself: if I have a public method that returns that Matcher, another thread could get at it and side-effect it. In short, concurrency is hard (in ANY language).Corinacorine
H
2

A quick look at the code for Matcher.java shows a bunch of member variables including the text that is being matched, arrays for groups, a few indexes for maintain location and a few booleans for other state. This all points to a stateful Matcher that would not behave well if accessed by multiple Threads. So does the JavaDoc:

Instances of this class are not safe for use by multiple concurrent threads.

This is only an issue if, as @Bob Cross points out, you go out of your way to allow use of your Matcher in separate Threads. If you need to do this, and you think that synchronization will be an issue for your code, an option you have is to use a ThreadLocal storage object to maintain a Matcher per working thread.

Hayleyhayloft answered 1/9, 2009 at 2:1 Comment(0)
K
1

To sum up, you can reuse (keep in static variables) the compiled Pattern(s) and tell them to give you new Matchers when needed to validate those regex pattens against some string

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Validation helpers
 */
public final class Validators {

private static final String EMAIL_PATTERN = "^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";

private static Pattern email_pattern;

  static {
    email_pattern = Pattern.compile(EMAIL_PATTERN);
  }

  /**
   * Check if e-mail is valid
   */
  public static boolean isValidEmail(String email) { 
    Matcher matcher = email_pattern.matcher(email);
    return matcher.matches();
  }

}

see http://zoomicon.wordpress.com/2012/06/01/validating-e-mails-using-regular-expressions-in-java/ (near the end) regarding the RegEx pattern used above for validating e-mails (in case it doesn't fit ones needs for e-mail validation as it is posted here)

Kevel answered 1/6, 2012 at 12:41 Comment(3)
Thanks for posting your answer! Please be sure to read the FAQ on Self-Promotion carefully. Someone might see this answer and the linked-to blog post and think you posted the blog post merely so you could link to it from here.Tidbit
Why bother with static {}? You can inline that variable initialization and make the Pattern final as well.Objectivism
I second the oppinion of TWiStErRob: private static final Pattern emailPattern = Pattern.compile(EMAIL_PATTERN); is better.Loring

© 2022 - 2024 — McMap. All rights reserved.