MYSQL error : #3699 - Timeout exceeded in regular expression match
Asked Answered
B

3

9

Running a MYSQL query with REGEX and I get a timeout error.

I have a Bitnami NGINX Wordpress stack on AWS with Ubuntu 16.

Using MYSQL 8.0.16

What settings in MYSQL (or whereever else) do I need to change to stop this error?

help appreciated..

tried playing with the my.cnf file and restarting MYSQL but I dont know which timeout setting, there seems to be several

this is the SQL query

REPLACE INTO wp_term_relationships (term_taxonomy_id, object_id) SELECT '272',  ID
FROM  `wp_posts` 
   WHERE (
   post_title REGEXP 'German|Deutsch|Berlin|Hamburg|Munich|München|Cologne|Köln|Frankfurt|Stuttgart|Düsseldorf|Dortmund|Essen|Leipzig|Bremen|Dresden|Hanover|Hannover|Nuremberg|Nürnberg|Duisburg|Bochum|Wuppertal|Bielefeld|Bonn|Münster|Karlsruhe|Mannheim|Augsburg|Wiesbaden|Gelsenkirchen|Mönchengladbach|Braunschweig|Chemnitz|Kiel|Aachen|Halle|Saale|Magdeburg|Freiburg|Krefeld|Lübeck|Oberhausen|Erfurt|Mainz|Rostock|Kassel|Hagen|Hamm|Saarbrücken|Mülheim|Potsdam|Ludwigshafen|Oldenburg|Leverkusen|Osnabrück|Solingen|Heidelberg|Herne|Neuss|Darmstadt|Paderborn|Regensburg|Ingolstadt|Würzburg|Fürth|Wolfsburg|Offenbach|Ulm|Heilbronn|Pforzheim|Göttingen|Bottrop|Trier|Recklinghausen|Reutlingen|Bremerhaven|Koblenz|Bergisch Gladbach|Jena|Remscheid|Erlangen|Moers|Siegen|Hildesheim|Salzgitter|Baden-Württemberg|Bavaria|Bayern|Berlin|Brandenburg|Bremen|Hesse|Hessen|Lower Saxony|Niedersachsen|Mecklenburg-Vorpommern|North Rhine-|Westphalia|Nordrhein-Westfalen|Rhineland-Palatinate|Rheinland-Pfalz|Saarland|Saxony|Sachsen|Saxony-Anhalt|Sachsen-Anhalt|Schleswig-Holstein|Thuringia|Thüringen' 
   OR post_content REGEXP 'German|Deutsch|Berlin|Hamburg|Munich|München|Cologne|Köln|Frankfurt|Stuttgart|Düsseldorf|Dortmund|Essen|Leipzig|Bremen|Dresden|Hanover|Hannover|Nuremberg|Nürnberg|Duisburg|Bochum|Wuppertal|Bielefeld|Bonn|Münster|Karlsruhe|Mannheim|Augsburg|Wiesbaden|Gelsenkirchen|Mönchengladbach|Braunschweig|Chemnitz|Kiel|Aachen|Halle|Saale|Magdeburg|Freiburg|Krefeld|Lübeck|Oberhausen|Erfurt|Mainz|Rostock|Kassel|Hagen|Hamm|Saarbrücken|Mülheim|Potsdam|Ludwigshafen|Oldenburg|Leverkusen|Osnabrück|Solingen|Heidelberg|Herne|Neuss|Darmstadt|Paderborn|Regensburg|Ingolstadt|Würzburg|Fürth|Wolfsburg|Offenbach|Ulm|Heilbronn|Pforzheim|Göttingen|Bottrop|Trier|Recklinghausen|Reutlingen|Bremerhaven|Koblenz|Bergisch Gladbach|Jena|Remscheid|Erlangen|Moers|Siegen|Hildesheim|Salzgitter|Baden-Württemberg|Bavaria|Bayern|Berlin|Brandenburg|Bremen|Hesse|Hessen|Lower Saxony|Niedersachsen|Mecklenburg-Vorpommern|North Rhine-|Westphalia|Nordrhein-Westfalen|Rhineland-Palatinate|Rheinland-Pfalz|Saarland|Saxony|Sachsen|Saxony-Anhalt|Sachsen-Anhalt|Schleswig-Holstein|Thuringia|Thüringen'
   )
   AND post_type = 'post'

#3699 - Timeout exceeded in regular expression match.

*update - not sure how to add an image to comments, so will add here.

variable settings, cheers @Rick James on how list them

enter image description here

Brand answered 2/7, 2019 at 18:21 Comment(7)
i would suggest using FULLTEXT indexes and using full text search with MATCH(...) AGAINST(...) instead ..Dinette
Regular expression matching can't be indexed, so this will require very expensive full-table scans.Muscadel
"Regular expression matching can't be indexed, so this will require very expensive full-table scans" @Muscadel You mean in general or MySQL only as iam pretty sure PostgreSQL supports indexing for regular expression matching..Dinette
@RaymondNijland I meant MySQL. It won't even use an index for something like REGEXP '^foo.*', even though this is equivalent to LIKE 'foo%', which will use an index.Muscadel
"I meant MySQL." @Muscadel well Oracle Corp has added lots or features/performance/better optimizer to MySQL 8 so who knows maybe indexing for regular expression matching is on the list..Dinette
"It won't even use an index for something like REGEXP '^foo.', even though this is equivalent to LIKE 'foo%', which will use an index"* True @Muscadel the MySQL optimizer now does not even see if a REGEX is really needed or that can be possibly replaced with a LIKE which might can use a index. .Dinette
@RaymondNijland I read through the list of 8.0 changes a week or two ago, I don't recall anything like that. I just tried it: db-fiddle.com/f/wg5Khg8kuTMSzw35wXc7LP/0Muscadel
G
15
SET GLOBAL regexp_time_limit=1024;

(see here for the documentation)

Glori answered 16/10, 2019 at 0:10 Comment(0)
U
1

"Timeout" is presumably due to a slow query against a large table. So, let's address the speed of the query.

  • REPLACE is DELETE + INSERT.
  • REPLACE is slow due to theDELETE. And it will loseterm_order` if you have already set it.
  • Consider using INSERT IGNORE instead.
  • The WHERE clause, due to the OR and the REGEXP, will need to scan the entire table. A FULLTEXT index could avoid that; see below.
  • Refactoring the REGEXP won't help much; changing to LIKE is likely to be slower; etc.
  • If the timeout is truely inside the REGEXP engine, then you may need to run the query a few cities at a time.

FULLTEXT would greatly improve the speed. But, if you don't want to go that direction, the next best is changing REPLACE to INSERT IGNORE.

The FULLTEXT approach is:

ALTER TABLE wp_posts ADD FULLTEXT(post_title, post_content);  -- one-time task

INSERT IGNORE INTO ... (...)
    SELECT 272, ID
        FROM wp_posts
        WHERE MATCH(post_title, post_content)
              AGAINST('German Deutsch Berlin Hamburg Munich München ...' IN BOOLEAN MODE)

(There may be an issue with handling of accents, depending on the version of MySQL and which ENGINE is used.)

Uncouple answered 2/7, 2019 at 20:35 Comment(10)
HI thanks, is it possible to do your statement above and still use regex? I managed to get the regex working by using the answer above (by reformatting the regex with a trie). I also found an online tool to make a trie. kemio.com.ar/tools/lst-trie-re.php however my code is still clumsy because I am repeating the regex, once for post_title and once for post_content, your code is clean and neat. How can I use INSERT IGNORE INTO and WHERE MATCH(post_title, post_content) like you did but with REGEX? sorry I am just a script kiddy. help appreciated :)Brand
also.. I am still wondering though - why don't I get a timeout on my local stack (with my old code)? I only get a timeout on my AWS version. What is the mysql setting to increase timeout limit?Brand
@BenedictHarris - SHOW VARIABLES LIKE '%timeout%'; -- do on both servers, then compare. Are they both 8.0.16?Uncouple
thanks Rick, that isolated the difference. Turns out my local stack just misses these settings mysqlx_connect_timeout 30 mysqlx_idle_worker_thread_timeout 60 mysqlx_interactive_timeout 28800 mysqlx_port_open_timeout 0 mysqlx_read_timeout 30 mysqlx_wait_timeout 28800 mysqlx_write_timeout 60Brand
out of interest, can WHERE MATCH do word boundaries? is it like LIKE?Brand
MATCH, together with FULLTEXT does do word boundaries. It is very different than LIKE, which has no concept of words unless you search for exactly what is before and after the 'word'.Uncouple
@BenedictHarris - and FULLTEXT has limitations -- no "short" words, no words in the "stoplist" (which is aimed at common English words).Uncouple
@BenedictHarris - Where does "mysqlx" come from??Uncouple
so MATCH would be no good for country codes of two letters, such as DE? too short? No idea where mysqlx is from. using My SQL 8.0.16 see screenshot i.sstatic.net/65zhQ.pngBrand
@BenedictHarris - Might there be country_codes in the text string? You could set the "min_word_len = 2".Uncouple
D
0

You should make the regex into a trie.
This is less work for the engine, it only has to do at most 16 steps to fail
instead of the current 110 steps it's doing now.

Try

A(?:achen|ugsburg)|B(?:a(?:den-Württemberg|varia|yern)|er(?:gisch[ ]Gladbach|lin)|ielefeld|o(?:chum|nn|ttrop)|r(?:a(?:ndenbur|unschwei)g|eme(?:rhave)?n))|C(?:hemnitz|ologne)|D(?:armstadt|eutsch|ortmund|resden|uisburg|üsseldorf)|E(?:r(?:furt|langen)|ssen)|F(?:r(?:ankfurt|eiburg)|ürth)|G(?:e(?:lsenkirche|rma)n|öttingen)|H(?:a(?:gen|lle|m(?:burg|m)|n(?:n)?over)|e(?:i(?:delberg|lbronn)|rne|ssen?)|ildesheim)|Ingolstadt|Jena|K(?:a(?:rlsruhe|ssel)|iel|oblenz|refeld|öln)|L(?:e(?:ipzig|verkusen)|ower[ ]Saxony|udwigshafen|übeck)|M(?:a(?:gdeburg|inz|nnheim)|ecklenburg-Vorpommern|oers|unich|önchengladbach|ü(?:lheim|n(?:chen|ster)))|N(?:euss|iedersachsen|or(?:drhein-Westfalen|th[ ]Rhine-)|uremberg|ürnberg)|O(?:berhausen|ffenbach|ldenburg|snabrück)|P(?:aderborn|forzheim|otsdam)|R(?:e(?:cklinghausen|gensburg|mscheid|utlingen)|h(?:einland-Pfalz|ineland-Palatinate)|ostock)|S(?:a(?:a(?:le|r(?:brücken|land))|chsen(?:-Anhalt)?|lzgitter|xony(?:-Anhalt)?)|chleswig-Holstein|iegen|olingen|tuttgart)|T(?:h(?:uringia|üringen)|rier)|Ulm|W(?:estphalia|iesbaden|olfsburg|uppertal|ürzburg)

Change the (?: to ( if your engine doesn't support non-capture syntax.
Also, you can change [ ] to just space if you want to on the compressed
regex.

https://regex101.com/r/iEzw33/1

Expanded

    A
    (?: achen | ugsburg )
 |  B
    (?:
         a
         (?: den-Württemberg | varia | yern )
      |  er
         (?: gisch [ ] Gladbach | lin )
      |  ielefeld
      |  o
         (?: chum | nn | ttrop )
      |  r
         (?:
              a
              (?: ndenbur | unschwei )
              g
           |  eme
              (?: rhave )?
              n
         )
    )
 |  C
    (?: hemnitz | ologne )
 |  D
    (?:
         armstadt
      |  eutsch
      |  ortmund
      |  resden
      |  uisburg
      |  üsseldorf
    )
 |  E
    (?:
         r
         (?: furt | langen )
      |  ssen
    )
 |  F
    (?:
         r
         (?: ankfurt | eiburg )
      |  ürth
    )
 |  G
    (?:
         e
         (?: lsenkirche | rma )
         n
      |  öttingen
    )
 |  H
    (?:
         a
         (?:
              gen
           |  lle
           |  m
              (?: burg | m )
           |  n
              (?: n )?
              over
         )
      |  e
         (?:
              i
              (?: delberg | lbronn )
           |  rne
           |  ssen?
         )
      |  ildesheim
    )
 |  Ingolstadt
 |  Jena
 |  K
    (?:
         a
         (?: rlsruhe | ssel )
      |  iel
      |  oblenz
      |  refeld
      |  öln
    )
 |  L
    (?:
         e
         (?: ipzig | verkusen )
      |  ower [ ] Saxony
      |  udwigshafen
      |  übeck
    )
 |  M
    (?:
         a
         (?: gdeburg | inz | nnheim )
      |  ecklenburg-Vorpommern
      |  oers
      |  unich
      |  önchengladbach
      |  ü
         (?:
              lheim
           |  n
              (?: chen | ster )
         )
    )
 |  N
    (?:
         euss
      |  iedersachsen
      |  or
         (?: drhein-Westfalen | th [ ] Rhine- )
      |  uremberg
      |  ürnberg
    )
 |  O
    (?:
         berhausen
      |  ffenbach
      |  ldenburg
      |  snabrück 
    )
 |  P
    (?: aderborn | forzheim | otsdam )
 |  R
    (?:
         e
         (?:
              cklinghausen
           |  gensburg
           |  mscheid
           |  utlingen 
         )
      |  h
         (?: einland-Pfalz | ineland-Palatinate )
      |  ostock
    )
 |  S
    (?:
         a
         (?:
              a
              (?:
                   le
                |  r
                   (?: brücken | land )
              )
           |  chsen
              (?: -Anhalt )?
           |  lzgitter
           |  xony
              (?: -Anhalt )?
         )
      |  chleswig-Holstein
      |  iegen
      |  olingen
      |  tuttgart
    )
 |  T
    (?:
         h
         (?: uringia | üringen )
      |  rier
    )
 |  Ulm
 |  W
    (?:
         estphalia
      |  iesbaden
      |  olfsburg
      |  uppertal
      |  ürzburg
    )

Lets check the speed:

Regex1:   A(?:achen|ugsburg)|B(?:a(?:den-Württemberg|varia|yern)|er(?:gisch[ ]Gladbach|lin)|ielefeld|o(?:chum|nn|ttrop)|r(?:a(?:ndenbur|unschwei)g|eme(?:rhave)?n))|C(?:hemnitz|ologne)|D(?:armstadt|eutsch|ortmund|resden|uisburg|üsseldorf)|E(?:r(?:furt|langen)|ssen)|F(?:r(?:ankfurt|eiburg)|ürth)|G(?:e(?:lsenkirche|rma)n|öttingen)|H(?:a(?:gen|lle|m(?:burg|m)|n(?:n)?over)|e(?:i(?:delberg|lbronn)|rne|ssen?)|ildesheim)|Ingolstadt|Jena|K(?:a(?:rlsruhe|ssel)|iel|oblenz|refeld|öln)|L(?:e(?:ipzig|verkusen)|ower[ ]Saxony|udwigshafen|übeck)|M(?:a(?:gdeburg|inz|nnheim)|ecklenburg-Vorpommern|oers|unich|önchengladbach|ü(?:lheim|n(?:chen|ster)))|N(?:euss|iedersachsen|or(?:drhein-Westfalen|th[ ]Rhine-)|uremberg|ürnberg)|O(?:berhausen|ffenbach|ldenburg|snabrück)|P(?:aderborn|forzheim|otsdam)|R(?:e(?:cklinghausen|gensburg|mscheid|utlingen)|h(?:einland-Pfalz|ineland-Palatinate)|ostock)|S(?:a(?:a(?:le|r(?:brücken|land))|chsen(?:-Anhalt)?|lzgitter|xony(?:-Anhalt)?)|chleswig-Holstein|iegen|olingen|tuttgart)|T(?:h(?:uringia|üringen)|rier)|Ulm|W(?:estphalia|iesbaden|olfsburg|uppertal|ürzburg)
Completed iterations:   50  /  50     ( x 1000 )
Matches found per iteration:   110
Elapsed Time:    5.74 s,   5736.58 ms,   5736578 µs
Matches per sec:   958,759

Wow, a million words matched per second !

Dimity answered 2/7, 2019 at 19:11 Comment(4)
oh wow that worked. I also found an online tool to make a trie. kemio.com.ar/tools/lst-trie-re.php I am still wondering though - why don't I get a timeout on my local stack? I only get it on my AWS version. what is the mysql setting to increase timeout limit?Brand
another Q..does: (?: as a non-capturing group save processor power or is it faster than "(" ?Brand
The online trie tool is riddled with errors. It reminds be of where I was about 5 years ago when I designed mine Tern Tool. Now its a mature tool with multiple uses, like creating Emoji Regex and stuff like a 175,000 Word Dictionary. Recommend you use the tool in the link.Dimity
About the grouping performance. Capture groups require overhead to maintain them. Mega-Bench 8: Regex1: Using Non-Capturing group notation (?: ) Completed iterations: 30 / 30 ( x 1000 ) Matches found per iteration: 110 Elapsed Time: 3.43 s, 3434.41 ms, 3434411 µs Matches per sec: 960,863 Regex2: Using Capturing group notation ( ) Completed iterations: 30 / 30 ( x 1000 ) Matches found per iteration: 110 Elapsed Time: 10.33 s, 10333.74 ms, 10333744 µs Matches per sec: 319,342 .Dimity

© 2022 - 2024 — McMap. All rights reserved.