Priority in regex manipulating
Asked Answered
C

4

0

I write some java code to split string into array of string. First, I split that string using regex pattern "\\,\\,|\\," and then I split using pattern "\\,|\\,\\,". Why there are difference between output of the first and output of the second?

public class Test2 {
    public static void main(String[] args){

        String regex1 = "\\,\\,|\\,";
        String regex2 = "\\,|\\,\\,"; 

        String a  = "20140608,FT141590Z0LL,0608103611018634TCKJ3301000000018667,3000054789,IDR1742630000001,80507,1000,6012,TCKJ3301,6.00E+12,ID0010015,WADORI PURWANTO,,3000054789";
        String ss[] = a.split(regex1); 

        int index = 0; 
        for(String m : ss){
            System.out.println((index++)+ ": "+m+"|"); 
        }
    }
} 

Output when using regex1:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: 3000054789|

And when using regex2:

0: 20140608|
1: FT141590Z0LL|
2: 0608103611018634TCKJ3301000000018667|
3: 3000054789|
4: IDR1742630000001|
5: 80507|
6: 1000|
7: 6012|
8: TCKJ3301|
9: 6.00E+12|
10: ID0010015|
11: WADORI PURWANTO|
12: |
13: 3000054789|

I need some explanation of how regex engine works when handling this situation.

Crypto answered 7/8, 2014 at 9:51 Comment(3)
You don't have to quote ,.Barrens
@MarounMaroun can you give specific answer based my question...Crypto
MarounMaroun comment was not intended to be an answer, but just some additionall info which could imrpove readability of your question. To be short: you don't need to write "\\,\\,|\\," when you can simply write ",,|,".Injudicious
C
4

How regex works: The state machine always reads from left to right. ,|,, == ,, as it always will only be matched to the first alternation:

img
(source: gyazo.com)

,,|, == ,,?:

x
(source: gyazo.com)


However, you should use ,,? instead so there's no backtracking:

r
(source: gyazo.com)

Corvine answered 7/8, 2014 at 10:42 Comment(0)
C
1

Seeing the two results, it seems that the split method try to find the first expression at first ("," for regex2, ",," for regex1) and split the string, and then the second one, but after the first pass with regex2 there isn't a single "," left in the strings. That's why there is an empty string detected when ",," is read with regex2.

So for your regex to be useful, you need to write the more complex expression first.

Callao answered 7/8, 2014 at 10:25 Comment(0)
M
1

It will be evaluated from left to right. In regex1, \\,\\, is tried first, otherwise \\, is tried. That's why 12th String is not empty, because \\,\\, is matched in that case. For regex2, everything is matched using \\,, hence the empty String.

Mar answered 7/8, 2014 at 10:25 Comment(0)
F
1

Case 1: Split by ,, else ,
This gets only first case, the rest split by ,.

Case 2: Split by , else ,,
gets all cases. So ,, gets split into word and ,word.
Then ,word gets split into " " and word.

Flaky answered 7/8, 2014 at 10:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.