How to cache the compiled regex in Go
Asked Answered
E

1

8

Below is my golang code. Each time validate method is called my compile method gets executed. I want to compile only once, not each time we call validate.

1) How to do it ? 2) My idea was to create an instance variable which would be nil at start. It would be lazy initialized in validate.

if (a != nil) {
  a, err := regexp.Compile(rras.Cfg.WhiteList)
}

However if I declare a variable as an instance variable,

var a *Regexp; // regexp.Compile returns *Regexp

my compiler underlines in red. How to fix it ?

type RRAS struct {
    Cfg       *RRAPIConfig
}

type RRAPIConfig struct {
    WhiteList               string
}

func (rras *RRAS) validate(ctx context.Context) error {
        a, err := regexp.Compile(rras.Cfg.WhiteList)
}
Ensoll answered 20/6, 2019 at 2:17 Comment(7)
golang.org/pkg/regexp/#MustCompileSlaughter
I am still not clear how to cache it in variable a ?Ensoll
var a = regexp.MustCompile(rras.Cfg.WhiteList)Slaughter
The qualified type is *regexp.Regexp, of course. Just like with every other type you have to add the package name.Heptarchy
1) As Peter points out, the type is *regexp.Regexp, not *Regexp. 2) If you want to lazily initialize an already declared variable, then do not use := as that declares a new variable in the current scope and the old variable outside of that scope will remain uninitialized, instead use = to only assign a value to the already existing variable.Denney
3) When lazily initializing a variable that can be accessed by multiple goroutines you must also use some type of synchronization to ensure that no two goroutines attempt to initialize the variable at the same time or else your program is bound to experience a data race.Denney
4) As pointed out by peterSO, just use var a = regexp.MustCompile(...) at the package level to initialize the variable once at program start up and be done with it. This also automatically avoids the data-race issue mentioned above. Here are some examples from the standard library.Denney
P
6

Static initialization

var whitelistRegexp = regexp.MustCompile(Cfg.WhiteList)

func (rras *RRAS) validate(ctx context.Context) error {
  if !whitelistRegexp.Match(...) {...}
}

This will compile the Regexp as soon as the package is imported, which is usually at the startup of the program, before any code in the main-method executes.

Benefits

  • Your program will crash immediately if the regex is broken, which helps to find bugs very quickly.
  • Very small and clean code, without any pitfalls
  • No need to worry about go-routines

Drawbacks

  • Potentially slow compilation may slow down the startup of the whole program (or server)
  • Only works if the regex is static and present at startup
  • Only works if a single regex (or a few static regexes) is used for all cases

Synchronization and Caching

var whitelistR struct{
  rex *regexp.Regexp
  once sync.Once
  err error
}

func (rras *RRAS) validate(ctx context.Context) error {
  whitelistR.once.Do(func() {
    whitelistR.ex, whitelistR.err = regexp.Compile(rras.Cfg.WhiteList)
  })

  if whitelistR.err != nil {
    return fmt.Errorf("could not compile regex: %w", err)
  }

  if !whitelistR.rex.Match(...) {...}
}

This will layzily compile the Regexp on the first call to the method. The sync.Once is very important, because it is a synchronization point, which guarantees access to the regexp is not a race condition. Every call to the method has to wait until the Regexp is compiled for the first time. After that the synchronization is very fast, because it uses only an atomic load.

You can also call go once.Do(...) in your main method to initialize the regexp in parallel to speed up the first call, without blocking other methods.

Benefits

  • Program (or server) startup is not impacted by the compilation time
  • Compilation is only done if it is actually needed
  • You can create the String for the Regexp dynamically on demand, which can reduce binary file size and speed up your program
  • Possible to cache many different Regexes in a Caching-Map

Drawbacks

  • Errors in the Regexp will only show up in tests which actually use this method, not on startup
  • Code is more complex (10 lines instead of one)
  • Someone developer might forget the call to sync.Once in another method and introduce a hard-to-catch race condition
  • Someone might try to be clever and wrap the sync.Once call into an if and will introduce a hard-to-catch race condition

Conclusion

Almost always use the easy static initialization. Only if you are sure you have a performance impact (benchmarking) use the synchronized initialization. When synchronizing access always try to use the helpers which go provides (sync.Once, Mutex, RWMutex, ...) because they are optimized and less error prone.

Recommended Reading:

The Go Memory Model details about synchronization and best practices

Go Data Race Detector you should race-test every complex multi routine go program

Puree answered 10/5, 2022 at 11:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.