SQL Server Regular expressions in T-SQL
Asked Answered
D

6

143

Is there any regular expression library written in T-SQL (no CLR, no extended SP, pure T-SQL) for SQL Server, and that should work with shared hosting?

Edit:

  • Thanks, I know about PATINDEX, LIKE, xp_ sps and CLR solutions
  • I also know it is not the best place for regex, the question is theoretical :)
  • Reduced functionality is also accepted
Donner answered 11/10, 2008 at 21:10 Comment(8)
I too have this question. I know that a database is not the best place to have this, but the reality is that other solutions require SQL admin permissions to reconfigure the server. Unfortunately, some of our clients will not choose to enable CLR, etc, and we are stuck to database-only solutions.Electioneer
@PaulDraper and xnagyg: why rule out SQLCLR? It is the most appropriate means of getting Regular Expressions in queries. And why would some of your clients choose to not enable CLR? I have yet to come across a valid reason. Sure, I hear "security" and "performance", but those are bogus reasons that are a result of not understanding how SQLCLR works and how it can be restricted.Dudleyduds
@srutzky: most shared hosting provider does not allow CLR. You should ask them about "security" and "performance":)Donner
@Donner Sure, I can ask a few. However, pointing to the behavior of a group does not in any way address the question of "is there a valid reason" for that behavior. It could just as easily be that all of those shared hosting providers set their policy based on the same misunderstanding. And, if nothing else, the simple fact that not all of them disallow SQLCLR actually supports the idea of there not being a problem more than the idea of there being a problem since if those problems did exist, the providers that allow SQLCLR would be experiencing those problems and would stop allowing it.Dudleyduds
@Donner Also, I should clarify that I am speaking in terms of Assemblies marked as SAFE and not marked as either EXTERNAL_ACCESS or UNSAFE (as I do understand why those 2 latter Permission Sets would be problematic for a shared hosting environment). Microsoft Azure SQL Database V12 (i.e. the new version as of late 2014), which is a shared environment, allows for Assemblies marked as SAFE (and loaded via FROM 0x... instead of from a DLL since you can't upload a DLL). But SAFE is all that is needed for Regular Expressions and LOTS of other very useful functions.Dudleyduds
@srutzky: unfortunatelly I did not find any shared hosting with CLR support at that time (in 2011). I managed to use full text search instead.Donner
Just in case someone is still looking for a solution, here's another option: red-gate.com/simple-talk/sql/t-sql-programming/…, which uses the VBScript.RegExp procedure via OLE Automation from T-SQLLiverwort
(similar to James Poulose's answer below)Liverwort
M
86

How about the PATINDEX function?

The pattern matching in TSQL is not a complete regex library, but it gives you the basics.

(From Books Online)

Wildcard  Meaning  
% Any string of zero or more characters.

_ Any single character.

[ ] Any single character within the specified range 
    (for example, [a-f]) or set (for example, [abcdef]).

[^] Any single character not within the specified range 
    (for example, [^a - f]) or set (for example, [^abcdef]).
Marja answered 13/10, 2008 at 20:49 Comment(3)
For at least a decade (SQL Server 2005+), LIKE has supported everything PATINDEX does. Don't know about before that...Smitty
Yet this doesn't let me specify a pattern that matches, say, a variable number of ascii letters. % matches 0 or more characters (regardless), [...] matches just one, and there is nothing in between.Negate
LIKE is the same as PATINDEX > 0Liverwort
W
21

If anybody is interested in using regex with CLR here is a solution. The function below (C# .net 4.5) returns a 1 if the pattern is matched and a 0 if the pattern is not matched. I use it to tag lines in sub queries. The SQLfunction attribute tells sql server that this method is the actual UDF that SQL server will use. Save the file as a dll in a place where you can access it from management studio.

// default using statements above
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;

namespace CLR_Functions
{   
    public class myFunctions
    {
        [SqlFunction]
        public static SqlInt16 RegexContain(SqlString text, SqlString pattern)
        {            
            SqlInt16 returnVal = 0;
            try
            {
                string myText = text.ToString();
                string myPattern = pattern.ToString();
                MatchCollection mc = Regex.Matches(myText, myPattern);
                if (mc.Count > 0)
                {
                    returnVal = 1;
                }
            }
            catch
            {
                returnVal = 0;
            }

            return returnVal;
        }
    }
}

In management studio import the dll file via programability -- assemblies -- new assembly

Then run this query:

CREATE FUNCTION RegexContain(@text NVARCHAR(50), @pattern NVARCHAR(50))
RETURNS smallint 
AS
EXTERNAL NAME CLR_Functions.[CLR_Functions.myFunctions].RegexContain

Then you should have complete access to the function via the database you stored the assembly in.

Then use in queries like so:

SELECT * 
FROM 
(
    SELECT
        DailyLog.Date,
        DailyLog.Researcher,
        DailyLog.team,
        DailyLog.field,
        DailyLog.EntityID,
        DailyLog.[From],
        DailyLog.[To],
        dbo.RegexContain(Researcher, '[\p{L}\s]+') as 'is null values'
    FROM [DailyOps].[dbo].[DailyLog]
) AS a
WHERE a.[is null values] = 0
Wingless answered 16/6, 2015 at 20:13 Comment(0)
C
16

There is some basic pattern matching available through using LIKE, where % matches any number and combination of characters, _ matches any one character, and [abc] could match a, b, or c... There is more info on the MSDN site.

Composer answered 11/10, 2008 at 22:13 Comment(0)
U
6

In case anyone else is still looking at this question, http://www.sqlsharp.com/ is a free, easy way to add regular expression CLR functions into your database.

Uneducated answered 3/6, 2015 at 18:43 Comment(6)
Once again, i'ts a CLR solution - not what the OP asked forLiverwort
@DaveBoltman: He asked the question in 2008. People search for this sometimes and run across this question without wanting to avoid CLR. This helped me and might help them.Uneducated
Sure, I do agree with you @JohnFisher - it is a helpful answer for someone using CLR. But in 2015, we'd still like a SQL only solution in our SQL project (no CLR) for various reasons, just like the OP did in 2008. The year doesn't matter :) E.g. the battery in your car was released in 1859. But you'd still like to avoid using more modern batteries such as NiMH batteries released more than 100 years later, for various reasons (such as being able to afford a car at all :)Liverwort
@DaveBoltman: You missed the part where "People search for this sometimes and run across this question without wanting to avoid CLR". It was the key point.Uneducated
sure - you're right @JohnFisher, you did say that. Glad it helped you, and I'm sure it will help others tooLiverwort
I agree with @JohnFisher, I need other solutions that are easy and free. I have no budget and I'm no SQL expert. I.e. I don't know how to add procedures or anything else like that.Bellebelleek
S
5

If you are using SQL Server 2016 or above, you can use sp_execute_external_script along with R. It has functions for Regular Expression searches, such as grep and grepl.

Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People] table in the [WideWorldImporters] sample database. They get passed to the R engine as a dataframe named InputDataSet. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.

EXEC sp_execute_external_script 
 @language = N'R',
 @script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
 @input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
 WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))

Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".

Closing Thoughts Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.

Stanwin answered 3/10, 2017 at 3:11 Comment(0)
D
2

You can use VBScript regular expression features using OLE Automation. This is way better than the overhead of creating and maintaining an assembly. Please make sure you go through the comments section to get a better modified version of the main one.

http://blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx

DECLARE @obj INT, @res INT, @match BIT;
DECLARE @pattern varchar(255) = '<your regex pattern goes here>';
DECLARE @matchstring varchar(8000) = '<string to search goes here>';
SET @match = 0;

-- Create a VB script component object
EXEC @res = sp_OACreate 'VBScript.RegExp', @obj OUT;

-- Apply/set the pattern to the RegEx object
EXEC @res = sp_OASetProperty @obj, 'Pattern', @pattern;

-- Set any other settings/properties here
EXEC @res = sp_OASetProperty @obj, 'IgnoreCase', 1;

-- Call the method 'Test' to find a match
EXEC @res = sp_OAMethod @obj, 'Test', @match OUT, @matchstring;

-- Don't forget to clean-up
EXEC @res = sp_OADestroy @obj;

If you get SQL Server blocked access to procedure 'sys.sp_OACreate'... error, use sp_reconfigure to enable Ole Automation Procedures. (Yes, unfortunately that is a server level change!)

More information about the Test method is available here

Happy coding

Doralyn answered 15/10, 2012 at 19:57 Comment(3)
sry, i know this is old, BUT: Why is VBScript through OLE "way better" than CLR? If you ONLY think about maintainance, you COULD be right, BUT what about performance?Demotic
@Demotic By 'way better', i was referring to the time saved due to the overhead of creating and maintaining a .NET assembly just for this purpose.Doralyn
I'm getting "403 forbidden" using your link blogs.msdn.com/b/khen1234/archive/2005/05/11/416392.aspx. Do you think the link moved? Any idea where? Is OLE deprecated now? I want to use regex on an Azure SQL Server.Bellebelleek

© 2022 - 2025 — McMap. All rights reserved.