Roslyn - get grouped single line comments
Asked Answered
H

1

6

I am writing a program in C# for extracting comments from code. I am using Roslyn compiler to do that. It's great, because I am just visiting the whole abstract syntax tree and fetching SingleLineComment trivia, MultiLineComment trivia and DocumentationComment trivia syntax from the file in solution. But there is a problem because programmers often write comments like that:

// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.

You can see that these are three single line comments, but I want them them to be fetched from code as one comment. Can I achieve that with Roslyn or maybe there is another way? Because that's frequent situation when programmers are writing multi line commments using single line comments syntax.

My code for extracting comments looks like this:

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;
using System.Collections.Generic;

namespace RoslynPlay
{
    public class CommentStore
    {
        public List<Comment> Comments { get; } = new List<Comment>();

        public void AddCommentTrivia(SyntaxTrivia trivia,
            LocationStore commentLocationstore, string fileName)
        {
            if (trivia.Kind() == SyntaxKind.SingleLineCommentTrivia)
            {
                Comments.Add(new SingleLineComment(trivia.ToString(),
                    trivia.GetLocation().GetLineSpan().EndLinePosition.Line + 1, commentLocationstore)
                {
                    FileName = fileName,
                });
            }
            else if (trivia.Kind() == SyntaxKind.MultiLineCommentTrivia)
            {
                Comments.Add(new MultiLineComment(trivia.ToString(),
                    trivia.GetLocation().GetLineSpan().StartLinePosition.Line + 1,
                    trivia.GetLocation().GetLineSpan().EndLinePosition.Line + 1, commentLocationstore)
                {
                    FileName = fileName,
                });
            }
        }

        public void AddCommentNode(DocumentationCommentTriviaSyntax node,
            LocationStore commentLocationstore, string fileName)
        {
            Comments.Add(new DocComment(node.ToString(),
                node.GetLocation().GetLineSpan().StartLinePosition.Line + 1,
                node.GetLocation().GetLineSpan().EndLinePosition.Line,
                commentLocationstore)
            {
                FileName = fileName,
            });
        }
    }
}

and in main main file (Program.cs) I am launching comment extraction from code like this:

    string fileContent;
    SyntaxTree tree;
    SyntaxNode root;
    CommentsWalker commentWalker;
    MethodsAndClassesWalker methodWalker;
    string[] files = Directory.GetFiles(projectPath, $"*.cs", SearchOption.AllDirectories);
    var commentStore = new CommentStore();

    Console.WriteLine("Reading files...");
    ProgressBar progressBar = new ProgressBar(files.Length);

    foreach (var file in files)
    {
        fileContent = File.ReadAllText(file);
        string filePath = new Regex($@"{projectPath}\\(.*)$").Match(file).Groups[1].ToString();
        tree = CSharpSyntaxTree.ParseText(fileContent);
        root = tree.GetRoot();
        commentWalker = new CommentsWalker(filePath, commentStore);
        commentWalker.Visit(root);

        progressBar.UpdateAndDisplay();
    }

and here is also the comment walker:

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;
using Microsoft.CodeAnalysis.CSharp.Syntax;

namespace RoslynPlay
{
    public class CommentsWalker : CSharpSyntaxWalker
    {
        private string _fileName;
        private CommentStore _commentStore;

        public CommentsWalker(string fileName,
            CommentStore commentStore)
            : base(SyntaxWalkerDepth.StructuredTrivia)
        {
            _fileName = fileName;
            _commentStore = commentStore;
        }

        public override void VisitTrivia(SyntaxTrivia trivia)
        {
            if (trivia.Kind() == SyntaxKind.SingleLineCommentTrivia
                || trivia.Kind() == SyntaxKind.MultiLineCommentTrivia)
            {
                _commentStore.AddCommentTrivia(trivia, _commentLocationStore, _fileName);
            }
            base.VisitTrivia(trivia);
        }

        public override void VisitDocumentationCommentTrivia(DocumentationCommentTriviaSyntax node)
        {
            _commentStore.AddCommentNode(node, _commentLocationStore, _fileName);
            base.VisitDocumentationCommentTrivia(node);
        }
    }
}

And the problem is because trivia.Kind() == SyntaxKind.SingleLineCommentTrivia extracts only single line of comments, but I want to extract single line comments blocks as one comment.

Honorific answered 15/4, 2018 at 15:54 Comment(2)
What does your code look like to retrieve the comments? Adding that information gives you a better chance of getting a good answer.Foulk
I added more information like you wanted,Honorific
A
1

I omitted the part with documentation comments for brevity.

You can do this by monitoring the previous comment block's line endings.

While there are a few edge cases (such as single-line comments next to multi-line comments) that need to be handled, they are quite uncommon.

using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

internal class Program
{
    private static void Main(string[] args)
    {
        // example code to check for comments
        const string programText =
            @"
            using System;
        using System.Collections;
        using System.Linq;
        using System.Text;

            /* Block start 1
             *
             *
             *
             *
             * Block End 2
             */


        // single line comment 0

        // single line comment 1
        // single line comment 2
        // single line comment 3
        namespace HelloWorld
        {
            /* Block start 3
             *
             *
             *
             *
             * Block End 4
             */
            class Program
            {
                /// documentation 1
                /// documentation 2
                /// documentation 3
                /// documentation 4
                static void Main(string[] args)
                {
                    Console.WriteLine(""Hello, World!"");
                }
            }

            // another comment 1


            // another comment 2
            // another comment 3


            /* Adjacent
             *
             */

            // another comment 4
            // another comment 5

            /* Adjacent
             *
             *
             */
        }";


        SyntaxTree tree = CSharpSyntaxTree.ParseText(programText);
        var root = tree.GetRoot();

        var walker = new CommentsWalker();
        walker.Visit(root);

        // printing out all comments in groups that we found
        foreach (var (start, end) in walker.GetCommentBlocks())
        {
            Console.WriteLine($"Comment block: {start} - {end}");
        }
    }
}

public class CommentsWalker : CSharpSyntaxWalker
{
    // keeps track of all comment blocks in code
    private List<(int start, int end)> commentBlocks;

    // keeps track of current continuous comment block
    // using linked list so it is easy to access first and last
    // line in a comment block
    private LinkedList<(int start, int end)> currentBlock;

    public CommentsWalker()
        : base(SyntaxWalkerDepth.StructuredTrivia)
    {
        commentBlocks = new();
        currentBlock = new();
    }

    public List<(int start, int end)> GetCommentBlocks()
    {
        // before returning all comments
        // we have to ensure that all comment blocks were added
        if (currentBlock.First != null)
            commentBlocks.Add((currentBlock.First.Value.start, currentBlock.Last.Value.end));

        return commentBlocks;
    }

    public override void VisitTrivia(SyntaxTrivia trivia)
    {
        var span = trivia.GetLocation()
            .GetLineSpan();

        // get starting and ending line numbers of a current comment 
        // it can be a multi-line and a single-line comment
        (int start, int end) = (span.StartLinePosition.Line, span.EndLinePosition.Line);

        switch (trivia.Kind())
        {
            case SyntaxKind.SingleLineCommentTrivia:
            {
                // if this is a first comment in a block
                // or if this is a continuation of a previous comment
                if (currentBlock.First == null || currentBlock.Last.Value.end + 1 == start)
                    currentBlock.AddLast((start, end));
                else
                {
                    // this is a new single line comment that isn't adjacent to previous one
                    // add previous block and start a new one with current comment
                    commentBlocks.Add((currentBlock.First.Value.start, currentBlock.Last.Value.end));
                    currentBlock = new();
                    currentBlock.AddFirst((start, end));
                }
            }
            break;

            case SyntaxKind.MultiLineCommentTrivia:
            {
                // treat whole comment block at once
                if (currentBlock.First != null)
                {
                    commentBlocks.Add((currentBlock.First.Value.start, currentBlock.Last.Value.end));
                    currentBlock = new();
                }

                currentBlock.AddFirst((start, end));
            }
            break;
        }

        base.VisitTrivia(trivia);
    }
}

Example output:

Comment block: 6 - 12
Comment block: 15 - 15
Comment block: 17 - 19
Comment block: 22 - 28
Comment block: 41 - 41
Comment block: 44 - 45
Comment block: 48 - 50
Comment block: 52 - 53
Comment block: 55 - 58

link to the dotnet fiddle

Aleenaleetha answered 19/2, 2024 at 13:56 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.