Regexp-replace: Multiple replacements within a match
Asked Answered
K

4

6

I'm converting our MVC3 project to use T4MVC. And I would like to replace java-script includes to work with T4MVC as well. So I need to replace

"~/Scripts/DataTables/TableTools/TableTools.min.js"
"~/Scripts/jquery-ui-1.8.24.min.js"

Into

Scripts.DataTables.TableTools.TableTools_min_js
Scripts.jquery_ui_1_8_24_min_js

I'm using Notepad++ as a regexp tool at the moment, and it is using POSIX regexps. I can find script name and replace it with these regexps:

Find: \("~/Scripts/(.*)"\)

Replace with \(Scripts.\1\)

But I can't figure out how do I replace dots and dashes in the file names into underscores and replace forward slashes into dots.

I can check that js-filename have dot or dash in a name with this

 \("~/Scripts/(?=\.*)(?=\-*).*"\)

But how do I replace groups within a group?

Need to have non-greedy replacement within group, and have these replacements going in an order, so forward slashes converted into a dot will not be converted to underscore afterwards.

This is a non-critical problem, I've already done all the replacements manually, but I thought I'm good with regexp, so this problem bugs me!!

p.s. preferred tool is Notepad++, but any POSIX regexp solution would do -)

p.p.s. Here you can get a sample of stuff to be replaced And here is the the target text

Kalinda answered 8/10, 2012 at 12:0 Comment(4)
Is your goal to copy some text to an editor, have it do the replacements, then copy back?Reamonn
Yeah, pretty much it. Copy from Visual Studio somewhere, do replacements, copy back to VS. I know for sure VS can't handle that, so must be done somewhere else.Kalinda
Not usre what VS you're using but the PowerGUI console gives you access in Visual Studio to Powershell to manipulate the editor environment, so you'd get the nice regex lookarounds right in VS. Might be worth a look.Kazue
oh, nice! I'll give it a go! I'm using VS2012. In-built regex search-replace is very strange and non-compliant.Kalinda
R
3

I would just use a site like RegexHero

  1. You can past the code into the target string box, then place (?<=(~/Script).*)[.-](?=(.*"[)]")) into the Regular Expression box, with _ in the Replacement String box.

  2. Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.

  3. From there, Paste (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) into the Regular Expression box and leave the Replacement String box empty.

  4. Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.

  5. From there paste (?<=(Script).*)[/](?=(.*[)]")) into the Regular Expression box and . into the Replacement String box.

After that, the Final String box will have what you are looking for. I'm not sure the upper limits of how much text you can parse, but it could be broken up if that's an issue. I'm sure there might be better ways to do it, but this tends to be the way I go about things like this. One reason I like this site, is because I don't have to install anything, so I can do it anywhere quickly.

Edit 1: Per the comments, I have moved step 3 to Step 5 and added new steps 3 and 4. I had to do it this way, because new Step 5 would have replaced the / in "~/Scripts with a ., breaking the removal of "~/. I also had to change Step 5's code to account for the changed beginning of Script

Reamonn answered 11/10, 2012 at 0:23 Comment(7)
Absolute magic! Thanks for the link - never seen it before, but I can see it'll be used a lot in the future! And solution is the most simple so far with only 2 passes. One for the win!Kalinda
@Kalinda Did I miss something in your question? The example lines provided in your question don't match the output presented by this solution. Perhaps you could edit your question example with the full lines?Kazue
@Kazue Ah! you are right. I did miss the quotes and ~ before Scripts. That's not a big deal to replace. But adds another pass with regexp, making it three passes.Kalinda
@Kazue I've added the target text to the question. Nick, perhaps you would like to modify your answer and add another regexp pass to strip extra quotes and tilde. Just to be absolute hero -)Kalinda
@Kalinda I have adjusted the steps to account for the "~/ and the last ".Reamonn
Not a very versatile solution, since it's Windows or IE 9+ (Silverlight) only. Does not work for Mac or Linux users. (Although maybe that's ok for this particular question, since the OP specifically said they were using Notepad++, which is Windows-only.)Zonda
@SeantheBean You are correct, it's a Windows/IE 9+ solution. Considering the question was in 2012 regarding ASP.NET MVC 3, it's a pretty fair assumption the OP is using Windows. If you need a solution that is not Windows based, you can ask a new question. Also, please read up on when to cast downvotes. The answer here is not Sloppy, no-effort, nor clearly incorrect. For more information on proper downvoting, see here: stackoverflow.com/help/privileges/vote-downReamonn
C
3

Here is a vanilla Notepad++ solution, but it's certainly not the most elegant one. I managed to do the transformation with several passes over the file.

First pass

Replace . and - with _.

Find: ("~/Scripts[^"]*?)[.-]

Replace With: \1_

Unfortunately, I could not find a way to match only the . or -, because it would require a lookbehind, which is apparently not supported by Notepad++. Due to this, every time you execute the replacement only the first . or - in a script name will be replaced (because matches cannot overlap). Hence, you have to run this replacement multiple times until no more replacements are done (in your example input, that would be 8 times).

Second pass

Replace / with ..

Find: ("~/Scripts[^"]*?)/

Replace with: \1.

This is basically the same thing as the first pass, just with different characters (you will have to this 3 times for the example file). Doing the passes in this order ensures that no slashes will end up as underscores.

Third pass

Remove the surrounding characters.

Find: "~/(Scripts[^"]*?)"

Replace with: \1

This will now match all the script names that are still surrounded by "~/ and ", capturing what is in between and just outputting that.

Note that by including those surrounding characters in the find patterns of the first two passes, you can avoid converting the . in strings that are already of the new format.

As I said this is not the most convenient way to do it. Especially, since passes one and two have to be executed manually multiple times. But it would still save a lot of time for large files, and I cannot think of a way to get all of them - only in the correct strings - in one pass, without lookbehind capabilities. Of course, I would very much welcome suggestions to improve this solution :). I hope I could at least give you (and anyone with a similar problem) a starting point.

Chefoo answered 10/10, 2012 at 21:28 Comment(4)
good effort! thanks. I was hoping to avoid the multiple passes, at least with the same regexp. But by the lack of response it seems like impossible.Kalinda
well, if you could name a POSIX regex engine you could use, that supports positive, variable-length lookbehinds, I could try to find a solution that uses each pass only once (although even then, I am not sure, whether overlaps of the lookbehinds would be allowed)Chefoo
yeah, I did not realised notepad++ does not support all of the required regexp functionality. But as Nick have said, RegexHero does all the lookaheads and lookbehinds. And it allows to do the transformation in 2 passes.Kalinda
yup, that's the lookbehind solution I had in mind. I just didn't know an engine available to you that supports them :). I really like the python script one, too... although it requires an additional plug-inChefoo
K
3

If, as your question indicates, you'd like to use N++ then use N++ Python Script. Setup the script and assign a shortcut key, then you have a single pass solution requiring only to open, modify, and save... can't get much simpler than that.

I think part of the problem is that N++ is not a regex tool and the use of a dedicated regex tool , or even a search/replace solution, is sometimes warranted. You may be better off, both in speed and in time value using a tool made for text processing vs editing.

[Script Edit]:: Altered to match the modified in/out expectations.

# Substitute & Replace within matched group.
from Npp import *
import re

def repl(m):
    return "(Scripts." + re.sub( "[-.]", "_", m.group(1) ).replace( "/", "." ) + ")"

editor.pyreplace( '(?:[(].*?Scripts.)(.*?)(?:"?[)])',  repl )
  1. Install:: Plugins -> Plugin Manager -> Python Script
  2. New Script:: Plugins -> Python Script -> script-name.py
  3. Select target tab.
  4. Run:: Plugins -> Python Script -> Scripts -> script-name

[Edit: An extended one-liner PythonScript command]

Having need for the new regex module for Python (that I hope replaces re) I played around and compiled it for use with the N++ PythonScript plugin and decided to test it on your sample set.

Two commands on the console ended up with the correct results in the editor.

import regex as re
editor.setText( (re.compile( r'(?<=.*Content[(].*)((?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+))+(?=.*[)]".*)' ) ).sub(lambda m: {'omit':'','toDot':'.','toUnderscore':'_'}[[ key for key, value in m.groupdict().items() if value != None ][0]], editor.getText() ) )

Very sweet!

What else is really cool about using regex instead of re was that I was able to build the expression in Expresso and use it as is! Which allows for a verbose explanation of it, just by copy-paste of the r'' string portion into Expresso.

The abbreviated text of which is::

Match a prefix but exclude it from the capture. [.*Content[(].*]
[1]: A numbered capture group. [(?<omit>["~]+?([~])[/]|["])|(?<toUnderscore>[-.]+)|(?<toDot>[/]+)], one or more repetitions
    Select from 3 alternatives
         [omit]: A named capture group. [["~]+?([~])[/]|["]]
             Select from 2 alternatives
                 ["~]+?([~])[/]
                 Any character in this class: ["]
         [toUnderscore]: A named capture group. [[-.]+]
         [toDot]: A named capture group. [[/]+]
Match a suffix but exclude it from the capture. [.*[)]".*]

The command breakdown is fairly nifty, we are telling Scintilla to set the full buffer contents to the results of a compiled regex substitution command by essentially using a 'switch' off of the name of the group that isn't empty.

Hopefully Dave (the PythonScript Author) will add the regex module to the ExtraPythonLibs part of the project.

Kazue answered 10/10, 2012 at 23:24 Comment(1)
yeah... N++ Python Script rocks!Kazue
R
3

I would just use a site like RegexHero

  1. You can past the code into the target string box, then place (?<=(~/Script).*)[.-](?=(.*"[)]")) into the Regular Expression box, with _ in the Replacement String box.

  2. Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.

  3. From there, Paste (?<=(<script).*)("~/)(?=(.*[)]" ))|(?<=(Url.).*)(")(?=(.*(\)" ))) into the Regular Expression box and leave the Replacement String box empty.

  4. Once the replace is done, click on Final String at the bottom, and select Move to target string and start a new expression.

  5. From there paste (?<=(Script).*)[/](?=(.*[)]")) into the Regular Expression box and . into the Replacement String box.

After that, the Final String box will have what you are looking for. I'm not sure the upper limits of how much text you can parse, but it could be broken up if that's an issue. I'm sure there might be better ways to do it, but this tends to be the way I go about things like this. One reason I like this site, is because I don't have to install anything, so I can do it anywhere quickly.

Edit 1: Per the comments, I have moved step 3 to Step 5 and added new steps 3 and 4. I had to do it this way, because new Step 5 would have replaced the / in "~/Scripts with a ., breaking the removal of "~/. I also had to change Step 5's code to account for the changed beginning of Script

Reamonn answered 11/10, 2012 at 0:23 Comment(7)
Absolute magic! Thanks for the link - never seen it before, but I can see it'll be used a lot in the future! And solution is the most simple so far with only 2 passes. One for the win!Kalinda
@Kalinda Did I miss something in your question? The example lines provided in your question don't match the output presented by this solution. Perhaps you could edit your question example with the full lines?Kazue
@Kazue Ah! you are right. I did miss the quotes and ~ before Scripts. That's not a big deal to replace. But adds another pass with regexp, making it three passes.Kalinda
@Kazue I've added the target text to the question. Nick, perhaps you would like to modify your answer and add another regexp pass to strip extra quotes and tilde. Just to be absolute hero -)Kalinda
@Kalinda I have adjusted the steps to account for the "~/ and the last ".Reamonn
Not a very versatile solution, since it's Windows or IE 9+ (Silverlight) only. Does not work for Mac or Linux users. (Although maybe that's ok for this particular question, since the OP specifically said they were using Notepad++, which is Windows-only.)Zonda
@SeantheBean You are correct, it's a Windows/IE 9+ solution. Considering the question was in 2012 regarding ASP.NET MVC 3, it's a pretty fair assumption the OP is using Windows. If you need a solution that is not Windows based, you can ask a new question. Also, please read up on when to cast downvotes. The answer here is not Sloppy, no-effort, nor clearly incorrect. For more information on proper downvoting, see here: stackoverflow.com/help/privileges/vote-downReamonn
C
2

Alternatively you could use a script that would do it and avoid copy pasting and the rest of the manual labor altogether. Consider using the following script:

$_.gsub!(%r{(?:"~/)?Scripts/([a-z0-9./-]+)"?}i) do |i| 
    'Scripts.' + $1.split('/').map { |i| i.gsub(/[.-]/, '_') }.join('.')
end

And run it like this:

$ ruby -pi.bak script.rb *.ext

All the files with extension .ext will be edited in-place and the original files will be saved with .ext.bak extension. If you use revision control (and you should) then you can easily review changes with some visual diff tool, correct them if necessary and commit them afterwards.

Chlor answered 12/10, 2012 at 14:30 Comment(3)
It was only one time change in only one file, so a bit of overkill. But +1 for introducing new tools. Unfortunately won't be able to test - don't have ruby environment anywhere set up.Kalinda
Sure. I think it's good to learn to use something like Ruby, Python, or Perl for tasks like this. Even for one time tasks like this you will be faster then with the editor. Editors usually offer only simple regexp substitutions and anything a bit different from that turn into a problem. With scripts like this one it's only one step to complete automation for multiple files or other repetitive scenarios.Chlor
Yeah, I used Bash before moving to Windows completely. I was meant to look on Ruby for a while - could be a kick-started just now -)Kalinda

© 2022 - 2024 — McMap. All rights reserved.