How to match text inside starting and closing curly brace, the tags and the specified attributes
Asked Answered
H

1

1

I am implementing a plugin code for my CMS system. Something like a shortcode but will be applicable in many scenarios. I want a case where an admin writes his code like this:

Example 1:

{COMMAND_NAME}Strings of texts that conatains htmltags,symbols,just anything{/COMMAND_NAME}

Example 2

{COMMAND_NAME}

Example 3

{COMMAND_NAME{attriute1=value attribute2=value}}

Example 4

{COMMAND_NAME{attriute1=value attribute2=value}}Strings of anything including texts, htmltags and anything at all {/COMMAND_NAME}

Regex can match the the above string. Get the COMMAND_NAME, get the text in between and get the closing {/COMMAND_NAME} from a single regex pattern.

In the regex , I want to capture the COMMAND_NAME, the attributes if provided, the text in between if the {COMMAND_NAME} has a closing {/COMMAND_NAME} and the closing {/COMMAND_NAME} if provided.

See what I've done so far and go some incomplete result.

$regex = #\{(RAW|ACCESS|DWNLINK|MODL)[\{]{0,1}([\w\W\s]*?)\}{0}\}([\w\s]+)([\{/RAW|ACCESS|DWNLINK|MODL]*)\}#i

$strings = '<div class="blog-list-item blog"><header class="entry-title">
        <h1>Welcome to our website</h1>
    </header><article id="entry-72" class="entry post-72 page et-bg-layout-dark et-white-bg"><div class="jumbotron row">
<div class="col-md-8">
<ul>
<li>You have a pending job on your neck?&hellip;</li>
<li>Do your company need a website makeover ?&hellip;</li>
<li>Or a competitive web application ? ?&hellip;</li>
<li>Do you need a customized plugin, or a tweak ?&hellip;</li>
<li>Maybe you want a personal website ?&hellip;</li>
<li>Or a graphic for your new project ?&hellip;</li>
</ul>
<div class="bg-primary well">
<h4 class="text-center text-white shadow">Track your project as we work it         to perfection...</h4>
</div>
</div>
<div class="pull-right col-md-4">
<h4 class="bg-primary text-white well">Other services we offer</h4>
{ACCESS{type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}} {RAW} <a class="btn button large tall green"     href="client-area">Place new Job now as we deliver at the quickest   <em>reasonable time</em></a>{/RAW}</div></article></div>';

And doing a php var_dump, gives the following result:
array(5) {
  [0]=>
  array(1) {
    [0]=>
    string(224) "{ACCESS{type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}} {RAW}"
  }
  [1]=>
  array(1) {
    [0]=>
    string(6) "ACCESS"
  }
  [2]=>
  array(1) {
    [0]=>
    string(209) "type=500}}
<ul>
<li>SEO work for an existing website or new</li>
<li>Bulk SMS</li>
<li>E-currency exchange</li>
<li>Facebook AD</li>
<li>Google AD</li>
</ul>
{/ACCESS}</div>
{RAW{say=email,access=500}"
  }
  [3]=>
  array(1) {
    [0]=>
    string(1) " "
  }
  [4]=>
  array(1) {
    [0]=>
    string(4) "{RAW"
  }
}

Which is actually not what i needed to retrieve. Once again, I want to capture the COMMAND_NAME, the attributes only if provided, the text in between if the {COMMAND_NAME} has a closing {/COMMAND_NAME} and the closing {/COMMAND_NAME} if provided. That means the command can be inline {COMMAND_NAME}, or not {COMMAND_NAME} some strings {/COMMAND_NAME}, has an attribute {COMMAND_NAME{attr1=value attr2=value2}} or not.

Hudibrastic answered 21/11, 2015 at 8:13 Comment(2)
While you don't need something as complex as the tags used with Wikipedia, you could also have a look over the MediaWiki Parser.php source.Likeminded
@Likeminded Good point, I had no idea it follows wiki tags syntaxSeismo
S
1

This regex will work as you specified:

$regex = '~

#opening tag
\{(RAW|ACCESS|DWNLINK|MODL|\w+)
 #optional attributes
 (?>
     \{   ([^}]*)   }
 )?

}


#optional text and closing tag
(?:
    (   #text:= any char except "{", or a "{" not followed by /commandname
        [^{]*+
        (?>\{(?!/?\1[{}])[^{]*)*+
    )

    #closing tag
    (   \{/\1}   )
)?

~ix';

regex101 demo


Compared to what you had:

First of all, I used the /x modifier (at the end), which ignores whitespace and #comments.

In the opening tag, I used your options, but you may as well use \w+ to match any command name:

\{(RAW|ACCESS|DWNLINK|MODL|\w+)

For the optional attributes, you had [\{]{0,1}([\w\W\s]*?)\}{0}, which was avalid attempt to make every part optional. Instead, I'm using a (?> group )? (See non-capturing groups and atomic groups) to make the whole subpattern optional (with the ? quantifier).

 (?>
     \{   ([^}]*)   }
 )?

The same logic is applied to the text and closing tag, to make it optional.

You were using [\w\s]+ to match the text, which matches word characters and whitespace, but fails to match punctuation and other characters. I could have used .*? and it would work just as fine. However, I used the following construct, which matches the same, but performs better:

    (   #text:= any char except "{", or a "{" not followed by /commandname
        [^{]*+
        (?>\{(?!/?\1[{}])[^{]*)*?
    )

And finally, I'm matching the closing tag using \1, which is a backreference to the text matched in group 1 (the opening tag name):

\{/\1}

Assumptions:

  • An attribute does not have a closing brace in quotes such as "te}xt" that could make it break.
Seismo answered 21/11, 2015 at 8:49 Comment(5)
Mariano you happened to save my day! Believe me, you saved me. Regex is one thing i always stumble on whenever am handling a project. And i cant just help the frustration it gives me each time. Thanks once again.Hudibrastic
Please can you modify your solution to include capturing the closing {/COMMAND_NAME} tag. Reason is because some of the commands may come as inline e,g {MODL{name=module_name,title=title of the module,access=access_rule}} which is a module replacement. Unlike ACCESS which depends on the text in between .Thanks once again.Hudibrastic
I'm glad it helped you. I thought there wasn't a reason to capture the closing tag because you'd only have one if you have a match for the text (group 2)Seismo
Okay . but i neeeded the closing tag group i.e {/COMMAD_NAME} is exist. Meaning it is optional. Some COMMANDS are inline. So a block commands replacement will happen on the text in betweeen while inline tag will just replace the COMMAND declaration. Hope you understandHudibrastic
I still think you don't need it, but I've already edited the code before your comment. Enjoy :)Seismo

© 2022 - 2024 — McMap. All rights reserved.