Using regular expressions to remove all the GOs in a sql script file
Asked Answered
H

6

8

I'm trying to parse a sort of big SQL script containing commands like create functions and store procedures. I want to split the file in strings whenever I find a GO statement (I want to execute it using ordinary ado.net instead of SMO).

The problem is that I haven't find a suitable regular expression for that so far. Using a simple \bGO\b ignoring case will split it. But will also split all go(s) inside a comment like

-- this go will also be split

Using this expression ^\bGO\b[^--]$ almost solve my problem but I get some error when I have two consecutive GOs (that for whatever reason are there and are behind my domain).

end
go 
GO 

This is how the end of a SP creation looks like in my script.

I'm doing it in C#

Thanks a lot

** EDIT **

A co-worker came up with a solution that, for now, worked for all my scripts

^\s*go\s*\r\n

Hydrocellulose answered 23/2, 2012 at 17:22 Comment(4)
More information will be needed in order for us to give you a "do-all" regex pattern for this. Just in general, I would set up a "sandbox" unit test for this; set up a literal containing a pattern of GOs you either do or don't want removed, then apply a "candidate" regex algorithm to each of those patterns, and assert that the resulting text is what you want it to be. The tests that pass or fail given your particular regex should be clues as to how to make modifications to the regex. if you give us more examples of patterns you do or don't want removed, it'll help us help you.Northnorthwest
@Hydrocellulose A little off-topic: does ^\bGO\b[^--]$ really work, or is there a typo? I think you may want to avoid -- before GO, not after.Marashio
@Northnorthwest its hard to give an example because this will be applied to all database sources that create SPs, Functions and Views. But basically it's a SQL file source that can be very diverse. I had some problems with commented code between /* ... */ for example, as well as 'go' verb inside line comments like -- this should go ...Hydrocellulose
If you want this to be perfect, with no possibility of things like GO inside a string literal, /* quotes, etc, you will need to do the equivalent of building a complete SQL parser.Transfinite
L
4

You can try this:

(?i-msnx:\b(?<!-{2,}.*)go[^a-zA-Z])

meaning, mach the string go if it is not preceded by 2 or more dashes followed by anything.

This should do the trick!

Edited to force checking only at word boundaries

Edited to ignore 'go' followed by letter/digits (last attempt :)) and added link to regular expression tool

PS: In case you haven't found this is a great resource about RE.

PS2: This is a great tool for RE authoring/test

Liberalize answered 24/2, 2012 at 0:57 Comment(3)
Cheers Adriano, but the regex fails in places like this: GRANT ALL ON C_MAIL_CAT_CONV_TYPES TO IPSAdmins GRANT ALL ON C_MAIL_CATEGORIES TO IPSAdmins I get GRANT ALL ON C_MAIL_CAT_CONV_TYPES TO IPSAdmins GRANT ALL ON C_MAIL_CATEHydrocellulose
I know this web site, I think, as our friend Reginaldo uses to say, "do a favour to yourself, learn regular expressions" :-)Hydrocellulose
Almost, but fails to process the follow code (it interprets the go of nago.OPERATOR_CD as a go: laa.PERIOD_CD as [AccPeriodType], nago.OPERATOR_CD as [Partner], OFFICE_FCD as [PartnerOffice],Hydrocellulose
B
3

Well, you probably already heard "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems." If performance is not your bigger concern, you could simply check if the trimmed line is equal to "go", ignoring the case. You won't waste more time dealing with regex.

Beseem answered 24/2, 2012 at 1:31 Comment(0)
F
1

I know this is a really old question but hopefully this helps somebody else.

This is the regular expression I use:

(^(go)[\s,;])|(^(go)$)|(;go)|(;)\s*(go)

I have not been able to break it yet.

Here is the test case I used:

use SCRATCH  
GO;  
--go  
set nocount ON ;go;  
go  
' go '   
'go'  
'go  
go'  
create table gonogo ( go int null, nogo int null)   
GO  

insert INTO mungbean VALUES (1); go  
GO  
;  
select * from mungbean  
go  

 go  
 go;  
 '  
go  
Flat answered 18/1, 2017 at 0:54 Comment(1)
Me neither until today. GOTO someAnchorTwin
M
0

It could be easier to handle this last case without regex.

After you split the script, remove the parts for which String.IsNullOrWhitespace(part) == true

Marashio answered 23/2, 2012 at 17:36 Comment(0)
B
0

For me it worked:

(?<!.)([\s]*)(go|GO)\s*([\r\n])?

https://regex101.com/r/V5DgQN/1

Beanfeast answered 29/11, 2021 at 3:40 Comment(0)
O
0

The following selects the part between the GO's and ommits GO's in literals like 'GO':

(?<=(?:^|(?:^|[ \t;,\n\r])(?<!--[^\n]*)GO(?=[ \t;,\n\r])[^\n]*\n))([^'"]*?('[^']*?')?("[^"]*?")?)*?(?=(?:(?:^|[ \t;,\n\r])(?<!--[^\n]*)GO(?:(?=[ \t;,\n\r]|$))|$))

Ovi answered 11/10 at 12:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.