If you are using SQL Server 2016 or above, you can use sp_execute_external_script
along with R. It has functions for Regular Expression searches, such as grep
and grepl
.
Here's an example for email addresses. I'll query some "people" via the SQL Server database engine, pass the data for those people to R, let R decide which people have invalid email addresses, and have R pass back that subset of people to SQL Server. The "people" are from the [Application].[People]
table in the [WideWorldImporters]
sample database. They get passed to the R engine as a dataframe named InputDataSet
. R uses the grepl function with the "not" operator (exclamation point!) to find which people have email addresses that don't match the RegEx string search pattern.
EXEC sp_execute_external_script
@language = N'R',
@script = N' RegexWithR <- InputDataSet;
OutputDataSet <- RegexWithR[!grepl("([_a-z0-9-]+(\\.[_a-z0-9-]+)*@[a-z0-9-]+(\\.[a-z0-9-]+)*(\\.[a-z]{2,4}))", RegexWithR$EmailAddress), ];',
@input_data_1 = N'SELECT PersonID, FullName, EmailAddress FROM Application.People'
WITH RESULT SETS (([PersonID] INT, [FullName] NVARCHAR(50), [EmailAddress] NVARCHAR(256)))
Note that the appropriate features must be installed on the SQL Server host. For SQL Server 2016, it is called "SQL Server R Services". For SQL Server 2017, it was renamed to "SQL Server Machine Learning Services".
Closing Thoughts
Microsoft's implementation of SQL (T-SQL) doesn't have native support for RegEx. This proposed solution may not be any more desirable to the OP than the use of a CLR stored procedure. But it does offer an additional way to approach the problem.
SAFE
and not marked as eitherEXTERNAL_ACCESS
orUNSAFE
(as I do understand why those 2 latter Permission Sets would be problematic for a shared hosting environment). Microsoft Azure SQL Database V12 (i.e. the new version as of late 2014), which is a shared environment, allows for Assemblies marked asSAFE
(and loaded viaFROM 0x...
instead of from a DLL since you can't upload a DLL). ButSAFE
is all that is needed for Regular Expressions and LOTS of other very useful functions. – Dudleyduds