C# parse SQL statement to find all INSERT/UPDATE/DELETE tables used in stored procedures
Asked Answered
H

5

7

As the title says my intention is to find all tables participating in either INSERT/UPDATE/DELETE statements and produce a structured format. So far this is what I've come up with -

void Main()
{
    string DBName = "Blah";
    string ServerName = @"(localdb)\MSSQLLocalDB";

    Server s = new Server(ServerName);
    Database db = s.Databases[DBName];

    ConcurrentDictionary<string, SPAudit> list = new ConcurrentDictionary<string, SPAudit>();

    var sps = db.StoredProcedures.Cast<StoredProcedure>()
    .Where(x => x.ImplementationType == ImplementationType.TransactSql  && x.Schema == "dbo")
    .Select(x => new
    {
        x.Name,
        Body = x.TextBody
    }).ToList();

    Parallel.ForEach(sps, item =>
    {
        try
        {
            ParseResult p = Parser.Parse(item.Body);
            IEnumerable<SqlInsertStatement> insStats = null;
            IEnumerable<SqlUpdateStatement> updStats = null;
            IEnumerable<SqlDeleteStatement> delStats = null;
            var listTask = new List<Task>();
            listTask.Add(Task.Run(() =>
            {
                insStats = FindBatchCollection<SqlInsertStatement>(p.Script.Batches);
            }));
            listTask.Add(Task.Run(() =>
            {
                updStats = FindBatchCollection<SqlUpdateStatement>(p.Script.Batches);
            }));
            listTask.Add(Task.Run(() =>
            {
                delStats = FindBatchCollection<SqlDeleteStatement>(p.Script.Batches);
            }));
            Task.WaitAll(listTask.ToArray());
            foreach (var ins in insStats)
            {
                var table = ins?.InsertSpecification?.Children?.FirstOrDefault();
                if (table != null)
                {
                    var tableName = table.Sql.Replace("dbo.", "").Replace("[", "").Replace("]", "");
                    if (!tableName.StartsWith("@"))
                    {
                        var ll = list.ContainsKey(item.Name) ? list[item.Name] : null;
                        if (ll == null)
                        {
                            ll = new SPAudit();
                        }
                        ll.InsertTable.Add(tableName);
                        list.AddOrUpdate(item.Name, ll, (key, old) => ll);
                    }
                }
            }
            foreach (var ins in updStats)
            {
                var table = ins?.UpdateSpecification?.Children?.FirstOrDefault();
                if (table != null)
                {
                    var tableName = table.Sql.Replace("dbo.", "").Replace("[", "").Replace("]", "");
                    if (!tableName.StartsWith("@"))
                    {
                        var ll = list.ContainsKey(item.Name) ? list[item.Name] : null;
                        if (ll == null)
                        {
                            ll = new SPAudit();
                        }
                        ll.UpdateTable.Add(tableName);
                        list.AddOrUpdate(item.Name, ll, (key, old) => ll);
                    }
                }
            }
            foreach (var ins in delStats)
            {
                var table = ins?.DeleteSpecification?.Children?.FirstOrDefault();
                if (table != null)
                {
                    var tableName = table.Sql.Replace("dbo.", "").Replace("[", "").Replace("]", "");
                    if (!tableName.StartsWith("@"))
                    {
                        var ll = list.ContainsKey(item.Name) ? list[item.Name] : null;
                        if (ll == null)
                        {
                            ll = new SPAudit();
                        }
                        ll.DeleteTable.Add(tableName);
                        list.AddOrUpdate(item.Name, ll, (key, old) => ll);
                    }
                }
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
    });
}

IEnumerable<T> FindBatchCollection<T>(SqlBatchCollection coll) where T : SqlStatement
{
    List<T> sts = new List<T>();
    foreach (var item in coll)
    {
        sts.AddRange(FindStatement<T>(item.Children));
    }
    return sts;
}


IEnumerable<T> FindStatement<T>(IEnumerable<SqlCodeObject> objs) where T : SqlStatement
{
    List<T> sts = new List<T>();
    foreach (var item in objs)
    {
        if (item.GetType() == typeof(T))
        {
            sts.Add(item as T);
        }
        else
        {
            foreach (var sub in item.Children)
            {
                sts.AddRange(FindStatement<T>(item.Children));
            }
        }
    }
    return sts;
}

public class SPAudit
{
    public HashSet<string> InsertTable { get; set; }
    public HashSet<string> UpdateTable { get; set; }
    public HashSet<string> DeleteTable { get; set; }

    public SPAudit()
    {
        InsertTable = new HashSet<string>();
        UpdateTable = new HashSet<string>();
        DeleteTable = new HashSet<string>();
    }
}

Now I'm facing two problems

  • First, its is taking hell lot of a time to complete, given that there are around 841 stored procedures in the database.
  • Second, if there are statements like the following the table name is not being captured properly, meaning that the table is being captured as w instead of SomeTable_1 or SomeTable_2.
CREATE PROCEDURE [dbo].[sp_blah]
    @t SomeTableType READONLY
AS  

    DELETE w
    FROM SomeTable_2 w
    INNER JOIN (Select * from @t) t
    ON w.SomeID = t.SomeID

    DELETE w
    FROM SomeTable_1 w
    INNER JOIN (Select * from @t) t
    ON w.SomeID = t.SomeID


RETURN 0

Any help would be greatly appreciated.

Edit

Using the following dll from this location C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Tasks-

  • Microsoft.SqlServer.ConnectionInfo.dll
  • Microsoft.SqlServer.Management.SqlParser.dll
  • Microsoft.SqlServer.Smo.dll
  • Microsoft.SqlServer.SqlEnum.dll
Halfhardy answered 21/8, 2018 at 6:59 Comment(5)
1) Have you tried do the search directly on SQL Server through SQL querying?Maximin
2) Also consider the reverse. Start from table and look for all procedures that use that table.Maximin
Any pointers on how I can start with a table list?Halfhardy
3) Do you need to cater for tables in dynamic queries or parameters?Maximin
Yes if we can, surely.Halfhardy
B
4

The SMO model exposes elements of the syntax tree. So instead of assuming a token by position, as in

UpdateSpecification?.Children?.FirstOrDefault();

look up the corresponding property in the documentation. For the update clause, the target table (or updatable view) can occur in different positions. Take this syntax:

UPDATE tablename SET column=value WHERE conditions

which is represented as

var targettable = ins?.UpdateSpecification?.Target?.ScriptTokenStream?.FirstOrDefault()?.Text;

in the SMO model. Whereas, a syntax unique to tsql,

UPDATE t SET t.columnname=value FROM tablename t WHERE conditions 

will have its list of tables in the FROM clause.

Regarding the other two DML statements you mentioned: DELETE is the same because they share a common base class, DeleteInsertSpecification (Target).

For INSERT, there is the Target as well, and if its InsertSource is of type SelectInsertSource, this may be based on any number of tables and views too.

Bruner answered 21/8, 2018 at 7:21 Comment(4)
Is it the same for INSERT/DELETE as well? Also your code is not compiling for me as it can't find ScriptTokenStream.Halfhardy
Can't seem to find the Microsoft.SqlServer.TransactSql.ScriptDom.dll in the location that I mentioned in the question(edited).Halfhardy
that's an entirely different problem. so did the original code compile at all? I refer to the 140.17283.0 version of the SMO componentsBruner
Had to change my code to use this TSqlParser parser = new TSql120Parser(true); IList<ParseError> parseErrors; TSqlFragment sqlFragment = parser.Parse(new StringReader(sql), out parseErrors);Halfhardy
H
5

Finally I got it to work like I wanted the output to look like using @dlatikay answer. I'm posting this here more for documentation purposes than anything else.

I'm using the following nuget packages -

and removed all other local dependencies. I hope this helps someone out there.

void Main()
{
    string DatabaseName = "Blah";
    string ServerIP = @"(localdb)\MSSQLLocalDB";
    List<string> ExcludeList = new List<string>()
    {
        "sp_upgraddiagrams",
        "sp_helpdiagrams",
        "sp_helpdiagramdefinition",
        "sp_creatediagram",
        "sp_renamediagram",
        "sp_alterdiagram",
        "sp_dropdiagram"
    };

    List<string> StringDataTypes = new List<string>()
    {
        "nvarchar",
        "varchar",
        "nchar",
        "char",
    };

    Server s = new Server(ServerIP);
    s.SetDefaultInitFields(typeof(StoredProcedure), "IsSystemObject");
    Database db = s.Databases[DatabaseName];

    Dictionary<string, SPAudit> AuditList = new Dictionary<string, SPAudit>();

    var sps = db.StoredProcedures.Cast<StoredProcedure>()
    .Where(x => x.ImplementationType == ImplementationType.TransactSql && x.Schema == "dbo" && !x.IsSystemObject)
    .Select(x => new
    {
        x.Name,
        Body = x.TextBody,
        Parameters = x.Parameters.Cast<StoredProcedureParameter>().Select(t =>
        new SPParam()
        {
            Name = t.Name,
            DefaultValue = t.DefaultValue,
            DataType = $"{t.DataType.Name}{(StringDataTypes.Contains(t.DataType.Name) ? $"({(t.DataType.MaximumLength > 0 ? Convert.ToString(t.DataType.MaximumLength) : "MAX")})" : "")}"
        })
    }).ToList();

    foreach (var item in sps)
    {
        try
        {
            TSqlParser parser = new TSql140Parser(true, SqlEngineType.Standalone);
            IList<ParseError> parseErrors;
            TSqlFragment sqlFragment = parser.Parse(new StringReader(item.Body), out parseErrors);
            sqlFragment.Accept(new OwnVisitor(ref AuditList, item.Name, item.Parameters));
        }
        catch (Exception ex)
        {
            //Handle exception
        }
    }
}

public class OwnVisitor : TSqlFragmentVisitor
{
    private string spname;
    private IEnumerable<SPParam> parameters;
    private Dictionary<string, SPAudit> list;

    public OwnVisitor(ref Dictionary<string, SPAudit> _list, string _name, IEnumerable<SPParam> _parameters)
    {
        list = _list;
        spname = _name;
        parameters = _parameters;
    }

    public override void ExplicitVisit(InsertStatement node)
    {
        NamedTableReference namedTableReference = node?.InsertSpecification?.Target as NamedTableReference;
        if (namedTableReference != null)
        {
            string table = namedTableReference?.SchemaObject.BaseIdentifier?.Value;
            if (!string.IsNullOrWhiteSpace(table) && !table.StartsWith("#"))
            {
                if (!list.ContainsKey(spname))
                {
                    SPAudit ll = new SPAudit();
                    ll.InsertTable.Add(table);
                    ll.Parameters.AddRange(parameters);
                    list.Add(spname, ll);
                }
                else
                {
                    SPAudit ll = list[spname];
                    ll.InsertTable.Add(table);
                }
            }
        }
        base.ExplicitVisit(node);
    }

    public override void ExplicitVisit(UpdateStatement node)
    {
        NamedTableReference namedTableReference;
        if (node?.UpdateSpecification?.FromClause != null)
        {
            namedTableReference = node?.UpdateSpecification?.FromClause?.TableReferences[0] as NamedTableReference;
        }
        else
        {
            namedTableReference = node?.UpdateSpecification?.Target as NamedTableReference;
        }
        string table = namedTableReference?.SchemaObject.BaseIdentifier?.Value;
        if (!string.IsNullOrWhiteSpace(table) && !table.StartsWith("#"))
        {
            if (!list.ContainsKey(spname))
            {
                SPAudit ll = new SPAudit();
                ll.UpdateTable.Add(table);
                ll.Parameters.AddRange(parameters);
                list.Add(spname, ll);
            }
            else
            {
                SPAudit ll = list[spname];
                ll.UpdateTable.Add(table);
            }
        }
        base.ExplicitVisit(node);
    }

    public override void ExplicitVisit(DeleteStatement node)
    {
        NamedTableReference namedTableReference;
        if (node?.DeleteSpecification?.FromClause != null)
        {
            namedTableReference = node?.DeleteSpecification?.FromClause?.TableReferences[0] as NamedTableReference;
        }
        else
        {
            namedTableReference = node?.DeleteSpecification?.Target as NamedTableReference;
        }
        if (namedTableReference != null)
        {
            string table = namedTableReference?.SchemaObject.BaseIdentifier?.Value;
            if (!string.IsNullOrWhiteSpace(table) && !table.StartsWith("#"))
            {
                if (!list.ContainsKey(spname))
                {
                    SPAudit ll = new SPAudit();
                    ll.DeleteTable.Add(table);
                    ll.Parameters.AddRange(parameters);
                    list.Add(spname, ll);
                }
                else
                {
                    SPAudit ll = list[spname];
                    ll.DeleteTable.Add(table);
                }
            }
        }
        base.ExplicitVisit(node);
    }
}

public class SPAudit
{
    public HashSet<string> InsertTable { get; set; }
    public HashSet<string> UpdateTable { get; set; }
    public HashSet<string> DeleteTable { get; set; }
    public List<SPParam> Parameters { get; set; }

    public SPAudit()
    {
        InsertTable = new HashSet<string>();
        UpdateTable = new HashSet<string>();
        DeleteTable = new HashSet<string>();
        Parameters = new List<SPParam>();
    }
}

public class SPParam
{
    public string Name { get; set; }
    public string DefaultValue { get; set; }
    public string DataType { get; set; }
}
Halfhardy answered 22/8, 2018 at 3:53 Comment(0)
B
4

The SMO model exposes elements of the syntax tree. So instead of assuming a token by position, as in

UpdateSpecification?.Children?.FirstOrDefault();

look up the corresponding property in the documentation. For the update clause, the target table (or updatable view) can occur in different positions. Take this syntax:

UPDATE tablename SET column=value WHERE conditions

which is represented as

var targettable = ins?.UpdateSpecification?.Target?.ScriptTokenStream?.FirstOrDefault()?.Text;

in the SMO model. Whereas, a syntax unique to tsql,

UPDATE t SET t.columnname=value FROM tablename t WHERE conditions 

will have its list of tables in the FROM clause.

Regarding the other two DML statements you mentioned: DELETE is the same because they share a common base class, DeleteInsertSpecification (Target).

For INSERT, there is the Target as well, and if its InsertSource is of type SelectInsertSource, this may be based on any number of tables and views too.

Bruner answered 21/8, 2018 at 7:21 Comment(4)
Is it the same for INSERT/DELETE as well? Also your code is not compiling for me as it can't find ScriptTokenStream.Halfhardy
Can't seem to find the Microsoft.SqlServer.TransactSql.ScriptDom.dll in the location that I mentioned in the question(edited).Halfhardy
that's an entirely different problem. so did the original code compile at all? I refer to the 140.17283.0 version of the SMO componentsBruner
Had to change my code to use this TSqlParser parser = new TSql120Parser(true); IList<ParseError> parseErrors; TSqlFragment sqlFragment = parser.Parse(new StringReader(sql), out parseErrors);Halfhardy
L
2

You can use following SQL Query:

SELECT *
FROM sys.dm_sql_referenced_entities ('dbo.APSP_MySP', 'OBJECT');  

It gives you all the tables, views, SPs impacted in the stored procedure. is_selected or is_select_all are set to 1 for selected references is_updated is set to 1 for updated references

As query is reading from pre-defined system tables, it runs fast

If you need information about the referred object use the referenced_id column value to find details

You can use it in 2 ways:

  1. Call the above query in parallel for each stored procedure
  2. Create another query/SP which will loop and run it for every stored procedure
Lyrism answered 21/8, 2018 at 8:5 Comment(4)
This will not get me the dataset I'm trying to get.Halfhardy
@SohamDasgupta: What values you are looking for? I may suggest other methodsLyrism
The dataset I'm looking for is a table which will have the following columns - SPName,InsertTable,UpdateTable,DeleteTable, not sure how I can arrive at this from your solution.Halfhardy
@SohamDasgupta: Table name is in 'referenced_entity_name' column. If action is Select then 'is_selected' or ' is_select_all' columns will have value '1'. If action is Update or Delete then 'is_updated' column will have value '1'. Please check here --------> learn.microsoft.com/en-us/sql/relational-databases/…Lyrism
M
1
  1. Change Proc_1 to your procedure name
  2. Refine PATINDEX matching to cater for the different possibilites
  3. Modify to look at all procedures
  4. Does not cater for tables in dynamic sql or passed as parameters
  5. Look out for any issues with dm_sql_referenced_entities
SELECT
    e.TableName,
    p.name,
    PATINDEX('%DELETE '+e.TableName+'%', p.definition) AS is_delete,
    PATINDEX('%INSERT INTO '+e.TableName+'%', p.definition) AS is_insert,
    PATINDEX('%UPDATE '+e.TableName+'%', p.definition) AS is_update
FROM
    (
        SELECT distinct referenced_entity_name AS TableName
        FROM sys.dm_sql_referenced_entities ('dbo.Proc_1', 'OBJECT')
    ) e,
    (
        SELECT o.name, m.object_id, definition
        FROM sys.objects o, sys.sql_modules m
        WHERE o.name = 'Proc_1'
        AND o.type='P'
        AND m.object_id = o.object_id
    ) p
Maximin answered 21/8, 2018 at 14:17 Comment(0)
G
0

I would recommend you querying the syscomments SQL view. The performance will be much better.

select text from sys.syscomments where text like '%DELETE%'

You can work with the results in the SQL Query or fetch all the results and filter the data in C#.

Gypsy answered 21/8, 2018 at 7:36 Comment(2)
Tried that, too many false positives when you've columns like Date_Update.Halfhardy
How about adding a space character before the update keyword? where text like '% UPDATE%'Gypsy

© 2022 - 2024 — McMap. All rights reserved.