Java regex match markdown syntax for headings
Asked Answered
S

2

6

I have a string with markdown syntax in it, and I want to be able to find markdown syntax for headings, i.e h1 = #, h2 = ## etc etc.

I know that whenever I find a heading, it is at the start of the line. I also know there can only be one heading per line. So for example, "###This is a heading" would match true for my h3 pattern, but not for my h2 or h1 patterns. This is my code so far:

h1 = Pattern.compile("(?<!\\#)^\\#(\\b)*");
h2 = Pattern.compile("(?<!\\#)^\\#{2}(\\b)*");
h3 = Pattern.compile("(?<!\\#)^\\#{3}(\\b)*");
h4 = Pattern.compile("(?<!\\#)^\\#{4}(\\b)*");
h5 = Pattern.compile("(?<!\\#)^\\#{5}(\\b)*");
h6 = Pattern.compile("(?<!\\#)^\\#{6}(\\b)*");

Whenever I use \\#, my compiler (IntelliJ) tells me: "Redundant character escape". It does that whenever I use \\#. As far as I know, # should not be a special character in regex, so escaping it with two backslashes should allow me to use it.

When I find a match, I want to surrond the entire match with bold HTML-tags, like this: "###Heading", but for some reason it's not working

//check for heading 6
Matcher match = h6.matcher(tmp);
StringBuffer sb = new StringBuffer();
while (match.find()) {
    match.appendReplacement(sb, "<b>" + match.group(0) + "</b>");
}
match.appendTail(sb);
tmp = sb.toString();

EDIT

So I have to seperately look at each heading, I can't look at heading 1-6 in the same pattern (this has to do with other parts of my program that uses the same pattern). What I know so far:

  • If there is a heading in the string, it is at the start.
  • If it starts with a heading, the entire string that follows is considered a heading, until the user presses Enter.
  • If I have "## This a heading", then it must match true for h2, but false for h1.
  • When I find my match, this "## This a heading" becomes this "## This a heading.
Smarmy answered 22/5, 2017 at 8:56 Comment(5)
You do not have to escape #. You do not even need the Matcher#appendReplacement here. You may use "(?<!#)#{6}\\b", and then use a simple tmp = tmp.replaceAll("(?<!#)#{6}\\b", "<b>$0</b>")Prenatal
@WiktorStribiżew I tried your solution, but the problem is that the match only returns the #:s, and not the text that follows afterSmarmy
If you need to match lines starting with those # sequences, see my updated answer. Always add new details to the question itself, and not to just comments.Prenatal
@WiktorStribiżew Sorry, kinda new to this. Taking a look at your answer now. Also, question has been updated :)Smarmy
Good, I upvoted it because it is a good question showing effort. And now, it is really much clearer.Prenatal
P
6

There is no need to escape # since it is not a special regex metacharacter. Also, the ^ is the string start anchor, so all the lookbehinds in your patterns are redundant as they always return true (since there is no character before the beginning of a string).

You seem to want to match a specified number of # before a word char. Use

String s = "###### Heading6 Something here\r\n" +
           "###### More text \r\n" +
          "###Heading 3 text";
Matcher m = Pattern.compile("(?m)^#{6}(?!#)(.*)").matcher(s);
String result = m.replaceAll("<b>$1</b>");
System.out.println(result);

See the Java demo

Result:

<b> Heading6 Something here</b>
<b> More text </b>
###Heading 3 text

Details:

  • (?m) - now, ^ matches start of a line
  • ^ - start of a line
  • #{6}(?!#) - exactly 6 # symbols
  • (.*) - Group 1: 0+ chars other than a line break up to the line end.

Thus, your regex definitions will look like

h1 = Pattern.compile("(?m)^#(?!#)(.*)");
h2 = Pattern.compile("(?m)^#{2}(?!#)(.*)");
h3 = Pattern.compile("(?m)^#{3}(?!#)(.*)");
h4 = Pattern.compile("(?m)^#{4}(?!#)(.*)");
h5 = Pattern.compile("(?m)^#{5}(?!#)(.*)");
h6 = Pattern.compile("(?m)^#{6}(?!#)(.*)");
Polito answered 22/5, 2017 at 9:7 Comment(1)
I tried your solution, it works like a charm! Thanks a lot man, really appreciated. :-)Smarmy
G
5

You can try this:

^(#{1,6}\s*[\S]+)

As you have mentioned that heading comes only at the start of a line thus you don't need look behind.

UPDATE: If you want to bold the full line that starts with heading then you can try this:

^(#{1,6}.*)

And replace by:

<b>$1</b>

Regex Demo

Sample Java source:

final String regex = "^(#{1,6}\\s*[\\S]+)";
final String string = "#heading 1 \n"
     + "bla bla bla\n"
     + "### heading 3 djdjdj\n"
     + "bla bla bla\n"
     + "## heading 2 bal;kasddfas\n"
     + "fbla bla bla";
final String subst = "<b>$1</b>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final String result = matcher.replaceAll(subst);
System.out.println(result);

Run java source

Goddamn answered 22/5, 2017 at 9:9 Comment(5)
Thank you! The only problem is that a heaidng could be more than one word, I guess I would want to check for #:s and then get all the text that follows, until the end. Do you have a suggestion as how to tweak your solution? I thought maybe \\b, but that only gave me the #:sSmarmy
so you want the full line ?Goddamn
Yes, if a # is written, then everything that follows will be included as the heading, until the user presses enter.Smarmy
Your solution works excellent. My problem is that I have to do a seperate check for every heading (h1, h2...), because I use the pattern in other parts of my program, so it's easier that way. Right now, if I have "## Some text here", it matches true for both h1 and h2, but only h2 should be true. I'm building of off your solution, but haven't gotten it to work as I want yet.Smarmy
@Kaffemakarn: Is If you want to bold the full line true? Please add the details to the question.Prenatal

© 2022 - 2024 — McMap. All rights reserved.