Regex expression to replace special characters except first and last character found
Asked Answered
F

1

6

I'd like to remove every special character from a string identifier and replace them with hyphens so it can be URL friendly.

This is part of Sitefinity CMS URL configuration, meaning that every time I create an item, it gets the title of it and generates a URL slug based on the regex expression I provide.

So I can only use ONE regex expression, and ONE substitution text, since it is added in Sitefinity's CMS URL configuration fields.

I can't use code or use regex in multiple steps.

So, for example, if I have the following title string: Infographic phishing's awareness and $prevention (updated)

I'd like it to transform to: infographic-phishing-awareness-and-prevention-updated

In Settings / Advanced / System / Site URL Settings / URLRulesClient we have the default regex expression set: [^\p{L}-!$()=@\d_'.]+|.+$

The problem is that when content is created, the URLs only replace spaces and not special characters, with hyphens.

Is there a way I can replace the last special characters at the end of the string with an empty space?

Femur answered 10/5, 2023 at 18:51 Comment(12)
If you want to make text url friendly, the right way to do that is to url-encode it. See https://mcmap.net/q/40441/-encode-url-in-javascriptPyrope
@PaulDempsey my bad for not explaining better that this NEEDS to be a regex expression, because I'm configuring Sitefinity. I updated my answer.Femur
Understood, that's why it's a comment. Perhaps there is a more suitable place for transforming this data than in sitefinity configuration. So, if your first and last characters are not URL safe, what do you expect to do with them? For handling first character and last character, define subpatterns anchored at start (^) and end ($) (assuming typical regex syntax -- I'm not familiar with sitefinity).Pyrope
Is it absolutely necessary to remove parentheses?Trotyl
Is there support for backreferences in the replacement string? If so, then maybe (^\w+)?(?:'s)?\W+(\w+)(?:\W+$)? replace with $1-$2Spindlelegs
@Spindlelegs I tried that too but it fails for (lorem) ipsum.Panelboard
So then maybe (?:^\W*(\w+))?(?:'s)?\W+(\w+)(?:\W+$)? replace with $1-$2 ?Spindlelegs
I found another tricky case: foobar+.Panelboard
Why not use encodeURIComponent?Benison
Regex has tools like captures can be used to capture text and as a flag to know where in a multifaceted regex the data is relevant. This is in lieu of multiple regex's operations. Flags can be used after the regex or inline using a callback. The bottom line is that you can't blindly replace what the regex matches if you have two conditions, middle or not-middle. So if no callback capability, it is 100% guaranteed this cannot be done what you seek ! End of story.Chronological
Btw if you use PCRE2 regex engine, you could try replacement string conditionals, see these examples: regex101.com/r/YEeaYd/1 regex101.com/r/WEKIe3/1Duckboard
@Chronological yeah, I hadn't also realized that Sitefinity's regex needed to be different. my bad. saying "truly a tragedy" seems a bit dramatic though. also, if you look closely you'll see the question was answered and solved.Femur
S
1

You can try this regex - it matches everything except any letters from any language, digits (0-9), dash, underscore:

(?:'s)?[^\p{L}\-\d_]+|\.+$

If your title in Sitefinity is: Infographic phishing's awareness and $prevention (updated).+!@=¨$'^^;,:

The URL that will be generated by Sitefinity with the custom regex will be as below infographic-phishing-awareness-and-prevention-updated

test the regex

result in Sitefinity

If you want to leave the dot (.) in the url you can just add it within the square brackets - \.

(?:'s)?[^\p{L}\-\d_\.]+|\.+$

If you want to include any characters in the url and not replace them with a dash just add them in between the square brackets - below is an example how I included the brackets (I know you want to replace them with dash - just as a sample for reference) - \(\)

(?:'s)?[^\p{L}\-\d_\(\)]+|\.+$

I tested in Sitefinity the suggestions from the comments but they didn't work for me. Did you test them in Sitefinity?

Scrivings answered 11/5, 2023 at 9:25 Comment(6)
I tested in Sitefinity but it didn't work. Where exactly in Advanced Settings did you put the Regex? I might be setting it in the wrong place.Femur
You need to do the change in Administration -> Settings -> Advanced -> System -> Site URL Settings -> URLRulesClient -> RegularExpressionFilter field You can restart the app pool to make sure the config changes will get applied. When you create items from the backend interface the system will generate the urls based on the regex specified in this field. This is global for all backend screens. If you want the same rules to apply when creating/importing content using the Fluent API then you need to update also the URLRulesServer - RegularExpressionFilter fieldScrivings
Yeah, that's where I was changing it. I tried again now, and it still doesn't remove the special characters when creating a content page.Femur
I am testing locally and using Sitefinity 14.4 (I think older versions still work). Can you check the SystemConfig.config - find the <siteUrlSettings> tag and see if the changes are reflected there? Which Sitefinity version you are using? If I have it on my side I can test it on this version as well.Scrivings
I actually hadn't re-started the app pool after changing the regex expression. Your regex worked!! Thank you so much!Femur
Regex fails regex101.com/r/AXxJOy/1 in regex tester, maybe Sitefinity fixes failed regex.Chronological

© 2022 - 2024 — McMap. All rights reserved.