Parsing C# code (as string) and inserting additional methods

Asked 14/2, 2011 at 23:39 Answered 27/2, 2011 at 20:38

Solved c#parsing reflection code-generation codedom

I have a C# app I'm working on that loads it's code remotely, and then runs it (for the sake of argument, you can assume the app is secure).

The code is C#, but it is sent as an XML document, parse out as a string, and then compiled and executed.

Now, what I'd like to do - and am having a bit more difficulty than I expected - is be able to parse the entire document, and before compiling, insert additional commands after every line execution.

For example, consider the code:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            i++;

            Log(i);
        }
    }
}

What I'd like, after parsing is something more like:

using System;
using System.Collections.Generic;
using System.Linq;

namespace MyCode
{
    static class MyProg
    {
        static void Run()
        {
            int i = 0;
            MyAdditionalMethod();
            i++;
            MyAdditionalMethod();

            Log(i);
            MyAdditionalMethod();
        }
    }
}

Keep in mind the obvious pitfalls - I can't just have it after every semi-colon, because this would not work in a getter/setter, i.e.:

Converting:

public string MyString { get; set; }

To:

public string MyString { get; MyAdditionalMethod(); set; MyAdditionalMethod(); }

would fail. As would class-level declarations, using statements, etc. Also, there are a number of cases where I could also add in MyAdditionalMethod() after curly braces - like in delegates, immediately after if statements, or method declarations, etc.

So, what I've been looking into CodeDOM, and this looks like it could be a solution but it's tough to figure out where to start. I'm otherwise trying to parse the entire thing and create a tree which I can parse through - though that's a little tough, considering the number of cases I need to consider.

Does anyone know any other solutions that are out there?

Drescher answered 14/2, 2011 at 23:39 Comment(2)

"The code is C#, but it is sent as an XML document, parse out as a string" -- Setting aside how weird this all sounds, can we see a sample of the XML? Perhaps you can inject your methods before it's parsed into a string. – Smukler 14/2, 2011 at 23:41

I don't control the XML document, so I can't inject anything until it arrives and is parsed out. The XML just contains the code as the inner text of an element - ie. <Code>using System.IO; namespace MyApp { public class myClass {... </Code> – Drescher 15/2, 2011 at 16:14

There are a few C# parsers out there I'd recommend using something from Mono or SharpDevelop as they should be up to date. I had a go using NRefactory from SharpDevelop, if you download the source for SharpDevelop there is a demo and some UnitTests that are a good intro to its usage.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using ICSharpCode.NRefactory;
using System.IO;
using ICSharpCode.NRefactory.Ast;
using ICSharpCode.NRefactory.Visitors;
using ICSharpCode.NRefactory.PrettyPrinter;

namespace Parse
{
    class Program
    {
        static void Main(string[] args)
        {
            string code = @"using System;
            using System.Collections.Generic;
            using System.Linq;

            namespace MyCode
            {
                static class MyProg
                {
                    static void Run()
                    {
                        int i = 0;
                        i++;

                        Log(i);
                    }
                }
            }
            ";

            IParser p = ParserFactory.CreateParser(SupportedLanguage.CSharp, new StringReader(code));
            p.Parse();

            //Output Original
            CSharpOutputVisitor output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);
            Console.Write(output.Text);

            //Add custom method calls
            AddMethodVisitor v = new AddMethodVisitor();
            v.VisitCompilationUnit(p.CompilationUnit, null);
            v.AddMethodCalls();
            output = new CSharpOutputVisitor();
            output.VisitCompilationUnit(p.CompilationUnit, null);

            //Output result
            Console.Write(output.Text);
            Console.ReadLine();
        }


    }

    //The vistor adds method calls after visiting by storing the nodes in a dictionary. 
    public class AddMethodVisitor : ConvertVisitorBase
    {
        private IdentifierExpression member = new IdentifierExpression("MyAdditionalMethod");

        private Dictionary<INode, INode> expressions = new Dictionary<INode, INode>();

        private void AddNode(INode original)
        {
            expressions.Add(original, new ExpressionStatement(new InvocationExpression(member)));
        }

        public override object VisitExpressionStatement(ExpressionStatement expressionStatement, object data)
        {
            AddNode(expressionStatement);
            return base.VisitExpressionStatement(expressionStatement, data);
        }

        public override object VisitLocalVariableDeclaration(LocalVariableDeclaration localVariableDeclaration, object data)
        {
            AddNode(localVariableDeclaration);
            return base.VisitLocalVariableDeclaration(localVariableDeclaration, data);
        }

        public void AddMethodCalls()
        {
            foreach (var e in expressions)
            {
                InsertAfterSibling(e.Key, e.Value);
            }
        }

    }
}

You will need to improve the visitor to handle more cases but it's a good start.

Alternatively you could compile the original and do some IL manipulation using Cecil or try some AOP library like PostSharp. Finally you could look into the .NET Profiling API.

Surrender answered 27/2, 2011 at 20:38 Comment(0)

You could use a source-to-source program transformation system. Such a tool parses the code, builds and ASTs, lets you apply transformations, and then regenerates text from the AST. What makes a source-to-source system nice, it that you can write transformations in terms of the source language syntax rather than the fractal detail of the AST, which makes them far easier to write and understand later.

What you want to do would be modelled by a pretty simple program transformation using our DMS Software Reengineering Toolkit:

rule insert_post_statement_call(s: stmt): stmt -> stmt =
   " \s " -> " { \s ; MyAdditionalMethod();   }";

This rule isn't a "text" substitution; rather, it is parsed by the parser that processes the target code, and so in fact it represents two ASTs, a left- and right- hand side (separated by the "->" syntax. The quotes aren't string quotes; they are quotes around the target language syntax to differentiate it from the syntax of the rule language itself. What is inside the quotes is target language (e.g., C#) text with escapes like \s, which represent entire language elements (in this case, a stmt according the the target language (e.g. C#) grammar. The left hand side says, "match any statement s" because s is defined to be a "stmt" in the grammar. The right hand side says, "replace the statement with a block containing the original statement \s, and the new code you want inserted". This is all done in terms of syntax trees using the grammar as a guide; it can't apply the transform to anything that isn't a statement. [The reason for rewriting the statement as a block, is because that way the right side is valid where statements are valid, go check your grammar.]

As a practical matter, you'll need to write rules to handle other special cases but this is mostly writing more rules. You also need to package the parser/transformer/prettyprinter as bundle which requires some procedural glue. This is still far easier than trying to write code to reliably climb up and down the tree, matching the nodes and then smashing those nodes to get what you want. Better, when your grammar (invariably) has to be adjusted, the rewrite rules are reparsed according to the revised grammar and still work; whatever procedural tree climbing you might be doing is almost certainly gauranteed to break.

As you write more and more transformations, this capability becomes more and more valuable. And when you are successful with a small number of transformations, adding more becomes attractive quickly.

See this technical paper for a more thorough discussion of how DMS works, and how it is used to apply instrumentation transformations, like you want to do, in real tools. This paper describes the basic ideas behind the test coverage tools sold by Semantic Designs.

Tarton answered 26/2, 2011 at 2:57 Comment(2)

This seems to be just an ad for a product you probably don't need. The .Net framework provides simple classes for on the fly compilation – Gabrielegabriell 28/2, 2011 at 20:58

@TFD: from the OP: "Does anyone know any other solutions that are out there?" Thanks for the ding. – Tarton 28/2, 2011 at 21:32

What you need is to use the Expression Trees. Some useful information from MSDN for the start:

Melissa answered 14/2, 2011 at 23:55 Comment(0)

For parsing you could use CSharpCodeProvider Class's Parse() .

Colon answered 25/2, 2011 at 21:39 Comment(3)

I think this throws a NotImplementedException. – Lens 26/2, 2011 at 11:1

@Simon: Incorrect. The base class (CodeDomProvider) throws NotImplementedException. The method is overridden by derived classes (such as CSharpCodeProvider) and therefore it throws no such exception. – Thorson 1/3, 2011 at 12:40

@Mike - Where have you seen that CSharpCodeProvider overrides CreateParser or Parse? – Lens 1/3, 2011 at 12:43

Once you parse the text, this page has some great info on compiling and executing the code dynamically: http://www.west-wind.com/presentations/dynamiccode/dynamiccode.htm

Erose answered 25/2, 2011 at 23:13 Comment(0)

Recommended topics

Hot tags