SQL Server ScriptDom Parsing

Asked 11/9, 2015 at 12:47 Answered 25/5, 2017 at 17:35

Solved c#sql-server sql-server-data-tools dacpac scriptdom

The team of developers I work with are using SQL Data Projects for a large piece of work we have to do against an existing database. We are a few weeks in and there have been a few gotchas, but the experience has been generally good.

However, when we get to deploy to production, the dba team have refused to accept DACPACs as a deployment method. Instead, they are want to see a traditional script per DML or DDL statement.

The current thinking is to create a difference script between the finished SQL project and the production environment, and then parse that into individual scripts. Not nice I know.

To parse the difference script there seems to be two options:

Parse the script based on the batch separator command, GO. A rather basic solutions but shows promise.
Or, use the Microsoft.SqlServer.TransactSql.ScriptDom. This looks more future proof but seems far more complex.

I'm trialling the ScriptDom at the moment but am having trouble understanding it. My current, but not only issues, is as follows.

I'm trying to parse the following SQL using the ScriptDOM in C#:

CREATE TABLE dbo.MyTable
(
    MyColumn VARCHAR(255)
)

But cannot see how to access the VARCHAR size, in this case, 255.

The code I'm using is as follows:

TSqlFragment sqlFragment = parser.Parse(textReader, out errors);

SQLVisitor myVisitor = new SQLVisitor();
sqlFragment.Accept(myVisitor);

public override void ExplicitVisit(CreateTableStatement node)
{
    // node.SchemaObjectName.Identifiers to access the table name
    // node.Definition.ColumnDefinitions to access the column attributes
}

From each column definition I expected to find a length property or similar. However, I also have a sneaking suspicion that you can use the Visitor Pattern, which I struggle with, to reparse each column definition. Any ideas?

Banquet answered 11/9, 2015 at 12:47 Comment(0)

Great that you are using ssdt!

The easiest way to handle this when you have DBA's who don't want to work with dacpacs is to pre-generate the deloyment script using sqlpackage.exe.

The way I do it is...

Check t-sql code into project
Build server builds ssdt project
Deploy and run tests on ci server
use sqlpackage.exe /action:script to compare the dacpac to QA, PROD etc and generate a deployment script.

The DBA's then take that script (or when we are ready we tell them the build number to grab) - they can the peruse and deploy that script.

Things to note:

You will need access to the prod db or a mirror copy you can use, you do not need dbo or anything just permissions in (https://the.agilesql.club/Blogs/Ed-Elliott/What-Permissions-Do-I-Need-To-Generate-A-Deploy-Script-With-SSDT)
The scripts are only valid until the schema in the prod db changes - so if you generate 4 scripts the and run script 1, the other three are invalid and you will need to re-run a build to generate the script.

If you don't have CI setup you can just use sqlpackage.exe to generate the script without the automatic bits :)

Hope it helps!

Cimino answered 11/9, 2015 at 13:8 Comment(10)

We do have all the steps that you mentioned above, although there are 'discussions' over using a SQL Test Project, but I digress. The issues I have is that the dba's want one script file per change not one script with all the changes. Amongst the myriad of parameters that you can pass to SQLPackage I could not see one to specify use multiple output files. Hence the need to parse the output of SQLPackage.Thanks anyway. – Banquet 11/9, 2015 at 14:1

1 script per change? that doesn't even make sense - if you have a hundred changes they want to run 100 scripts? What order would they run them in and what would you do with post-deploy scripts. I would try to figure out a way to get them to accept a single deploy script, otherwise you are asking for a world of pain (for them and you) – Cimino 11/9, 2015 at 14:16

I can only agree that it does not make much sense. The dba team have a home grown tool that they use to run the scripts. Developers have to complete an Excel spread that specifies the location in the source control system, order in which the scripts are run, destination server and database.. DBA's then use this to deploy changes. Output scripts are then made available to the developers. It is fine for the dba team as it makes their life simple but is a pain for the developers. Hence the reason that the DBA team are in no rush to look for a better solution such as dacpac's. – Banquet 11/9, 2015 at 14:46

I feel your pain! If you generate your single deploy script you can parse it and instead of looking for specific statement types (like CreateTable) you can use ParseStatementList (there is one per version of the parser so TSql100Parser.ParseStatementList) - that will get you a list of all the individual statements and then you can put them into a file of their own – Cimino 11/9, 2015 at 14:59

Ah ha, the fog is clearing. This is starting to make sense. I will investigate further. Thanks. Looks like you have an interesting blog in this area. – Banquet 11/9, 2015 at 15:28

I find it fascinating :) - if you get stuck ping me and I'll do a sample for you – Cimino 11/9, 2015 at 15:32

Just curious - does their system use SQLCMD to run the scripts and can it handle "GO" within the scripts? I don't get "one script per change" - that's insanity, no matter how much they like their system. If they can handle running a script regardless of what's in it, you may get a manager to talk about time spent tweaking scripts and billing their team for those hours to see what their reaction is. Besides, sometimes scripts really need to run together - all or nothing. How do they handle that in their "one change per script" model? Sounds like an opportunity for a manager to help out. :) – Mineraloid 11/9, 2015 at 21:24

I doubt it uses SQLCMD but I've never seen the code so cannot be sure. What I did not mention, in my original question, is that there are also rather onerous standards around DML statements where each data change must be run in rollback first and then, on changing a flag, in commit mode. Hence, each batch of DML statements must in a separate batch that can be run independently. The reason is that in the past a change went live which caused data corruption and a whole day of data input was lost. Now the DBA team act as testers of last resort. – Banquet 13/9, 2015 at 7:0

The dba's should ensure they have a valid log backup system that means they never lose more than the business allows - then if they lose data either through release or actual disaster then they can recover data. That is what I did as a dba. – Cimino 13/9, 2015 at 9:11

I understand that the DML script was run during working hours, and then the error was not discovered until some hours later. Hence, the backups did not prevent the loss of the data input for that day. The practice of running DML scripts first in ROLLBACK and then COMMIT, to get the DBAs to act as testers of last resort, is in my view flawed as they rarely are involved enough in the change to add value. This process causes the scripts to become more complex and hence more likely to have errors. One of my own scripts recently had one line of 'doing code' and about 50 lines of boilerplate code. – Banquet 13/9, 2015 at 9:40

I don't think you need a visitor here at all. If I understand your goal correctly, you'd like to take the TSQL generated by SSDT, parse it using SQLDOM and then print the batches individually. The code to do that would look something like this:

using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.SqlServer.TransactSql.ScriptDom;

namespace ScriptDomDemo
{
    class Program
    {
        static void Main(string[] args)
        {
            TSql120Parser parser = new TSql120Parser(false);
            IList<ParseError> errors;
            using (StringReader sr = new StringReader(@"create table t1 (c1 int primary key)
GO
create table t2 (c1 int primary key)"))
            {
                TSqlFragment fragment = parser.Parse(sr, out errors);
                IEnumerable<string> batches = GetBatches(fragment);
                foreach (var batch in batches)
                {
                    Console.WriteLine(batch);
                }
            }
        }

        private static IEnumerable<string> GetBatches(TSqlFragment fragment)
        {
            Sql120ScriptGenerator sg = new Sql120ScriptGenerator();
            TSqlScript script = fragment as TSqlScript;
            if (script != null)
            {
                foreach (var batch in script.Batches)
                {
                    yield return ScriptFragment(sg, batch);
                }
            }
            else
            {
                // TSqlFragment is a TSqlBatch or a TSqlStatement
                yield return ScriptFragment(sg, fragment);
            }
        }

        private static string ScriptFragment(SqlScriptGenerator sg, TSqlFragment fragment)
        {
            string resultString;
            sg.GenerateScript(fragment, out resultString);
            return resultString;
        }
    }
}

As for how to work with these ASTs, I find it easiest to use Visual Studio's debugger to visualize the tree, because you can see the actual type of each node and all of its properties. It takes just a little bit of code to parse the TSQL, as you can see.

Ohalloran answered 11/9, 2015 at 17:53 Comment(1)

I did a demo using your code, and it is a better way of parsing the code than writing my own code to look for the batch terminated, GO. One part I did not understand was in your GetBatches method you have an IF\ELSE statement, and I can't see when the ELSE part would be executed. Thanks for your response. – Banquet 13/9, 2015 at 7:6