Which is the right regular expression to use for Numbers and Strings?
Asked Answered
C

5

6

I am trying to create simple IDE and coloring my JTextPane based on

  • Strings (" ")
  • Comments (// and /* */)
  • Keywords (public, int ...)
  • Numbers (integers like 69 and floats like 1.5)

The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.

After much testing, i have completed comments and keywords.

Q1: As for my Strings coloring, I color my strings based on this regular expression:

Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);

while (matcherS.find()) {
    setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}

This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding. Can anyone correct my regular expression to fix my error?

Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:

Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
    setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}

By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?

Below is a screenshot of the output: enter image description here

Thank you for any help in advance!

Carmarthenshire answered 8/7, 2015 at 17:14 Comment(1)
M
3

For the strings, this is probably the fastest regex -

"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""

Formatted:

 " [^"\\]* 
 (?: \\ . [^"\\]* )*
 "

For integers and decimal numbers, the only foolproof expression I know of is
this -

"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"

Formatted:

 (?:
      \d+ 
      (?: \. \d* )?
   |  \. \d+ 
 )

As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.

Mccafferty answered 8/7, 2015 at 17:48 Comment(0)
B
2

Try with:

  1. \\b\\d+(\\.\\d+)?\\b for int, float and double,
  2. "(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,
Buffum answered 8/7, 2015 at 17:34 Comment(0)
I
1
  1. Match a String ignoring the \" situations

    ".*?(?<!\\)"

The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html

  1. Match all numbers with & without decimal points

(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.

  1. The question of matching numbers inside strings can be achieved in 2 ways :

    • a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).

    • b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.

Impetrate answered 8/7, 2015 at 17:49 Comment(1)
I think I have a solution for your String question as well now. Take a look in the edited answer ".*?(?<!\\)"Impetrate
G
1

For Integer go with

(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))
Gravettian answered 8/7, 2015 at 17:51 Comment(0)
L
0

R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).

However: If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest: \"((\\\\\")|[^\"])*\"

R2: I believe the following regex would work for finding both integers and fractions: \\d+(\.\\d+)?

You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".

Leviathan answered 8/7, 2015 at 18:4 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.