Java: Removing comments from string
Asked Answered
B

10

6

I'd like to do a function which gets a string and in case it has inline comments it removes it. I know it sounds pretty simple but i wanna make sure im doing this right, for example:

private String filterString(String code) {
  // lets say code = "some code //comment inside"

  // return the string "some code" (without the comment)
}

I thought about 2 ways: feel free to advice otherwise

  1. Iterating the string and finding double inline brackets and using substring method.
  2. regex way.. (im not so sure bout it)

can u tell me what's the best way and show me how it should be done? (please don't advice too advanced solutions)

edited: can this be done somehow with Scanner object? (im using this object anyway)

Bousquet answered 24/12, 2011 at 21:3 Comment(0)
U
2

To find the substring before a constant substring using a regular expression replacement is a bit much.

You can do it using indexOf() to check for the position of the comment start and substring() to get the first part, something like:

String code = "some code // comment";
int    offset = code.indexOf("//");

if (-1 != offset) {
    code = code.substring(0, offset);
}
Uncork answered 24/12, 2011 at 21:12 Comment(5)
This wont work for your own code, its going to remove the "// comment" within the string.Tracheotomy
I dont need to handle /** comments :) i checked this solution it works fine!Bousquet
way too simplistic -- will mangle something like: String url="http://www.google.com";Loveless
I was looking for a way to remove all comment lines in a string. For /* */ and // style comments check this answer, it helped me: https://mcmap.net/q/349584/-remove-source-file-comments-using-intellijBucaramanga
This will break source code which contains the comment start character sequences in String literals.Snapper
S
9

If you want a more efficient regex to really match all types of comments, use this one :

replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","");

source : http://ostermiller.org/findcomment.html

EDIT:

Another solution, if you're not sure about using regex is to design a small automata like follows :

public static String removeComments(String code){
    final int outsideComment=0;
    final int insideLineComment=1;
    final int insideblockComment=2;
    final int insideblockComment_noNewLineYet=3; // we want to have at least one new line in the result if the block is not inline.
    
    int currentState=outsideComment;
    String endResult="";
    Scanner s= new Scanner(code);
    s.useDelimiter("");
    while(s.hasNext()){
        String c=s.next();
        switch(currentState){
            case outsideComment: 
                if(c.equals("/") && s.hasNext()){
                    String c2=s.next();
                    if(c2.equals("/"))
                        currentState=insideLineComment;
                    else if(c2.equals("*")){
                        currentState=insideblockComment_noNewLineYet;
                    }
                    else 
                        endResult+=c+c2;
                }
                else
                    endResult+=c;
                break;
            case insideLineComment:
                if(c.equals("\n")){
                    currentState=outsideComment;
                    endResult+="\n";
                }
            break;
            case insideblockComment_noNewLineYet:
                if(c.equals("\n")){
                    endResult+="\n";
                    currentState=insideblockComment;
                }
            case insideblockComment:
                while(c.equals("*") && s.hasNext()){
                    String c2=s.next();
                    if(c2.equals("/")){
                        currentState=outsideComment;
                        break;
                    }
                    
                }
                
        }
    }
    s.close();
    return endResult;   
}
Septa answered 6/12, 2013 at 11:26 Comment(3)
The regular expression solution and the solution that you've given will destroy source code which contains the comment start character sequences inside String literals.Snapper
True, thanks for noticing, I didn't pay much attention to those cases as they were irrelevant to me at the time I had this problem (and posted this answer) An adaptation of the solution to keep comments in string declaration shouldn't be too hard to achieve though, especially for the second solution.Purdum
Confirmed... Gets completely messed up with quotes.Nation
O
5

The best way to do this is to use regular expressions. At first to find the /**/ comments and then remove all // commnets. For example:

private String filterString(String code) {
  String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
  String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "")
}
Ottoman answered 24/12, 2011 at 22:33 Comment(1)
This breaks source code which contains the comment start character sequence inside String literals.Snapper
N
3

Just use the replaceAll method from the String class, combined with a simple regular expression. Here's how to do it:

import java.util.*;
import java.lang.*;

class Main
{
        public static void main (String[] args) throws java.lang.Exception
        {
                String s = "private String filterString(String code) {\n" +
"  // lets say code = \"some code //comment inside\"\n" +
"  // return the string \"some code\" (without the comment)\n}";

                s = s.replaceAll("//.*?\n","\n");
                System.out.println("s=" + s);

        }
}

The key is the line:

s = s.replaceAll("//.*?\n","\n");

The regex //.*?\n matches strings starting with // until the end of the line.

And if you want to see this code in action, go here: http://www.ideone.com/e26Ve

Hope it helps!

Nadabb answered 24/12, 2011 at 21:15 Comment(3)
can you please explain this regex? i only need to remove "//some text" and it looks like its affecting more chars such as "\n".. what should be the exact regex?Bousquet
The line should read s = s.replaceAll("//.*?\n","\n"); I'll edit the post and correct it. The solution you "picked" wouldn't work properly on multi-line strings, as was the example you gave.Nadabb
The regular expression solution and the solution that you've given will destroy source code which contains the comment start character sequences inside String literals.Snapper
A
3

@Christian Hujer has been correctly pointing out that many or all of the solutions posted fail if the comments occur within a string.

@Loïc Gammaitoni suggests that his automata approach could easily be extended to handle that case. Here is that extension.

enum State { outsideComment, insideLineComment, insideblockComment, insideblockComment_noNewLineYet, insideString };

public static String removeComments(String code) {
  State state = State.outsideComment;
  StringBuilder result = new StringBuilder();
  Scanner s = new Scanner(code);
  s.useDelimiter("");
  while (s.hasNext()) {
    String c = s.next();
    switch (state) {
      case outsideComment:
        if (c.equals("/") && s.hasNext()) {
          String c2 = s.next();
          if (c2.equals("/"))
            state = State.insideLineComment;
          else if (c2.equals("*")) {
            state = State.insideblockComment_noNewLineYet;
          } else {
            result.append(c).append(c2);
          }
        } else {
          result.append(c);
          if (c.equals("\"")) {
            state = State.insideString;
          }
        }
        break;
      case insideString:
        result.append(c);
        if (c.equals("\"")) {
          state = State.outsideComment;
        } else if (c.equals("\\") && s.hasNext()) {
          result.append(s.next());
        }
        break;
      case insideLineComment:
        if (c.equals("\n")) {
          state = State.outsideComment;
          result.append("\n");
        }
        break;
      case insideblockComment_noNewLineYet:
        if (c.equals("\n")) {
          result.append("\n");
          state = State.insideblockComment;
        }
      case insideblockComment:
        while (c.equals("*") && s.hasNext()) {
          String c2 = s.next();
          if (c2.equals("/")) {
            state = State.outsideComment;
            break;
          }
        }
    }
  }
  s.close();
  return result.toString();
}
Antinomian answered 12/1, 2017 at 12:24 Comment(0)
U
2

To find the substring before a constant substring using a regular expression replacement is a bit much.

You can do it using indexOf() to check for the position of the comment start and substring() to get the first part, something like:

String code = "some code // comment";
int    offset = code.indexOf("//");

if (-1 != offset) {
    code = code.substring(0, offset);
}
Uncork answered 24/12, 2011 at 21:12 Comment(5)
This wont work for your own code, its going to remove the "// comment" within the string.Tracheotomy
I dont need to handle /** comments :) i checked this solution it works fine!Bousquet
way too simplistic -- will mangle something like: String url="http://www.google.com";Loveless
I was looking for a way to remove all comment lines in a string. For /* */ and // style comments check this answer, it helped me: https://mcmap.net/q/349584/-remove-source-file-comments-using-intellijBucaramanga
This will break source code which contains the comment start character sequences in String literals.Snapper
I
1

I made an open source library (on GitHub) for this purpose , its called CommentRemover you can remove single line and multiple line Java Comments.

It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.

Little code snippet how to use it (There is 2 type usage):

First way InternalPath

 public static void main(String[] args) throws CommentRemoverException {

 // root dir is: /Users/user/Projects/MyProject
 // example for startInternalPath

 CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
        .removeJava(true) // Remove Java file Comments....
        .removeJavaScript(true) // Remove JavaScript file Comments....
        .removeJSP(true) // etc.. goes like that
        .removeTodos(false) //  Do Not Touch Todos (leave them alone)
        .removeSingleLines(true) // Remove single line type comments
        .removeMultiLines(true) // Remove multiple type comments
        .startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
        .setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
        .build();

 CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
                  commentProcessor.start();        
  }

Second way ExternalPath

 public static void main(String[] args) throws CommentRemoverException {

 // example for externalPath

 CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
        .removeJava(true) // Remove Java file Comments....
        .removeJavaScript(true) // Remove JavaScript file Comments....
        .removeJSP(true) // etc..
        .removeTodos(true) // Remove todos
        .removeSingleLines(false) // Do not remove single line type comments
        .removeMultiLines(true) // Remove multiple type comments
        .startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
        .setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
        .build();

 CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
                  commentProcessor.start();        
  }
Idiosyncrasy answered 10/7, 2015 at 8:34 Comment(3)
How do I get the result? It's not returned and not written back to the source file...Exclaim
@Exclaim you would like to get a list of classes that their comments removed if so there is no feature like that. But if there is something go wrong the library shows a list of classes that couldn't be removed.Herodotus
This works really well. If you simply want to run it for an external path you don't even need to add the 'setExcludePackages' setter. I cloned this and was able to run the external path example after removing the 'setExcludePackages' setter without any issues.Family
C
0

for scanner, use a delimiter,

delimiter example.

import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;

public class MainClass {
  public static void main(String args[]) throws IOException {
FileWriter fout = new FileWriter("test.txt");
fout.write("2, 3.4,    5,6, 7.4, 9.1, 10.5, done");
fout.close();

FileReader fin = new FileReader("Test.txt");
Scanner src = new Scanner(fin);
// Set delimiters to space and comma.
// ", *" tells Scanner to match a comma and zero or more spaces as
// delimiters.

src.useDelimiter(", *");

// Read and sum numbers.
while (src.hasNext()) {
  if (src.hasNextDouble()) {
    System.out.println(src.nextDouble());
  } else {
    break;
  }
}
fin.close();
  }
}

Use a tokenizer for a normal string

tokenizer:

// start with a String of space-separated words
String tags = "pizza pepperoni food cheese";

// convert each tag to a token
StringTokenizer st = new StringTokenizer(tags," ");

while ( st.hasMoreTokens() )
{
  String token = (String)st.nextToken();
  System.out.println(token);
}

http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example
Cleocleobulus answered 24/12, 2011 at 21:8 Comment(2)
Thanks but i dont see how it's relevant to my problem, in your example u didn't consider the string i gave as an example. Plus im sorry but im trying not to use too advanced solutionsBousquet
I saw u just added another part to ur suggestion, well thanks but this is still not answering my problem, i wanted to make a clean function i dont see how it helps.Bousquet
B
0

It will be better if code handles single line comment and multi line comment separately . Any suggestions ?

    public class RemovingCommentsFromFile {

public static void main(String[] args) throws IOException {

    BufferedReader fin = new BufferedReader(new FileReader("/home/pathtofilewithcomments/File"));
    BufferedWriter fout = new BufferedWriter(new FileWriter("/home/result/File1"));


    boolean multilinecomment = false;
    boolean singlelinecomment = false;


    int len,j;
    String s = null;
    while ((s = fin.readLine()) != null) {

        StringBuilder obj = new StringBuilder(s);

        len = obj.length();

        for (int i = 0; i < len; i++) {
            for (j = i; j < len; j++) {
                if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '*') {
                    j += 2;
                    multilinecomment = true;
                    continue;
                } else if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '/') {
                    singlelinecomment = true;
                    j = len;
                    break;
                } else if (obj.charAt(j) == '*' && obj.charAt(j + 1) == '/') {
                    j += 2;
                    multilinecomment = false;
                    break;
                } else if (multilinecomment == true)
                    continue;
                else
                    break;
            }
            if (j == len)
            {
                singlelinecomment=false;
                break;
            }
            else
                i = j;

            System.out.print((char)obj.charAt(i));
            fout.write((char)obj.charAt(i));
        }
        System.out.println();
        fout.write((char)10);
    }
    fin.close();
    fout.close();

}
Bibliomania answered 4/9, 2019 at 10:31 Comment(0)
A
0

Easy solution that doesn't remove extra parts of code (like those above) // works for any reader, you can also iterate over list of strings instead

        String str="";
        String s;
        while ((s = reader.readLine()) != null)
        {
            s=s.replaceAll("//.*","\n");
            str+=s;
        }
        str=str.replaceAll("/\\*.*\\*/"," ");
Arlina answered 28/11, 2022 at 1:41 Comment(0)
N
0

I'm not sure if this works, but it seems to preserve String literals (passed all of my 7 tests)

public static String removeJavaComments(String line) {
    StringBuilder builder = new StringBuilder();
    char[] lineChars = line.toCharArray();
        
    boolean quoted = false;
    boolean commented = false;
    boolean line_commented = false;
        
    for(int pos = 0; pos < lineChars.length; ++pos) {
        switch(lineChars[pos]) {
        case '"':
            if(!(commented || line_commented)) { 
                quoted = !quoted;
                builder.append(lineChars[pos]);
            }
            break;
        case '/':
            if(quoted) {
                builder.append(lineChars[pos]);
            } else if(lineChars[pos+1] == '/') {
                line_commented = true;
            } else if(!line_commented && lineChars[pos+1] == '*') {
                commented = true;
            } else if(commented && !line_commented && lineChars[pos-1] == '*') {
                commented = false;
            }
            break;
        case '\n':
            line_commented = false;
            builder.append(lineChars[pos]);
            break;
        default:
            if(!(commented || line_commented)) {
                builder.append(lineChars[pos]);
            }
        }   
    }
        
    return builder.toString();
}
Nation answered 17/7 at 0:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.