converting the data with regexp in oracle sql
Asked Answered
C

5

6

I have a data like below with tab limited among them. I have represented them with a view here

with t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
union
select '6-20  6-20  6-20  6-20  6-20  ' from dual
union
select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual)

My expected output is

Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21
Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20
Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9

I thought of replacing all those horizontal table with some unique patterns like this and then replace that pattern with Mon, Tue based on indexing

$1(6-20 )$2(6-20 )$3(6-20 )$4(6-20 )$5(6-20 )

I have tried the below query but could not complete it

select regexp_replace(col, '([[:digit:]]-[[:digit:]]{2}[[:space:]]+)','$(\1)') from t_view;
Calen answered 7/9, 2015 at 8:27 Comment(1)
You cannot use one regex replacement here since it requires 2 different replacement strings (as the number of days is not constant).Episcopal
C
3

You need a combination of CASE expression, REGEXP_COUNT and REGEXP_REPLACE since you do not have the same expression for all the rows. Depending on the data, you could have as many conditions in the case expression.

The regular expression pattern is (\d-\d+ ).

For example,

SQL> WITH t_view(col) AS
  2    ( SELECT '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' FROM dual
  3    UNION
  4    SELECT '6-20  6-20  6-20  6-20  6-20  ' FROM dual
  5    UNION
  6    SELECT '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' FROM dual
  7    )
  8  SELECT REPLACE(new_col, '  ','') new_col
  9  FROM (
 10    SELECT
 11      CASE
 12        WHEN regexp_count(col, '\d+\-\d+') = 5
 13        THEN regexp_replace(col,
 14                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 15                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5')
 16        WHEN regexp_count(col, '\d+\-\d+') = 7
 17        THEN regexp_replace(col,
 18                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 19                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5,Sat: \6,Sun: \7')
 20      END new_col
 21    FROM t_view
 22  );

NEW_COL
----------------------------------------------------------------------------------------------------
Mon: 6-20, Tue: 6-20,Wed: 6-20,Thu: 6-20,Fri: 6-20
Mon: 6-21, Tue: 6-21,Wed: 6-21,Thu: 6-21,Fri: 6-21,Sat: 6-21,Sun: 6-21
Mon: 6-9, Tue: 6-9,Wed: 6-9,Thu: 6-9,Fri: 6-9,Sat: 6-9,Sun: 6-9

SQL>
Coincident answered 7/9, 2015 at 8:58 Comment(7)
What happens if there are 1, 2 or 20 values in an input?Celesta
@Celesta I already explained that in my answer. The expression is not same for all rows, so CASE could be used. But tell me, which calendar has more than 7 days in a week? Logically, OP's question is concatenating the Day name to each pattern in the regular expression. I know it could also be done using row generator method and string split. But I focused on regexp_replace as OP mentioned it in the question.Coincident
If there are 10 days in the input then you want a full week plus 3 additional days of the next week - a point you seem to have missed when you bring up different calendars. With this method you are going to have to hand code a CASE for each potential length of input - and if you miss one then you will get NULL instead of the expected output. Also - you have whitespace before the commas.Celesta
@MTO The whitespace before the commas...hmm...replaced ;-)Coincident
This technique won't scale to more than 9 days - as Oracle can only do single digit references to regular expression capture groups. Also, while you've edited the output to fix the whitespace the SQL in the REGEXP_REPLACE has not changed so does not produce your current output.Celesta
@MTO Oh come on! You just thought that I simply manipulated the output? I used REPLACE function, re-executed the query and edited my answer. You are free to copy the code and verify at your end.Coincident
My apologies - you have a nested REPLACE which I hadn't seen - you can eliminate it by moving the brackets on the capture groups from (\d-\d+ ) (which will include the trailing space in the capture group) to (\d-\d+) .Celesta
C
2

SQL Fiddle

Oracle 11g R2 Schema Setup:

Query 1:

with t_view ( col ) as (
      select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' from dual
union select '6-20  6-20  6-20  6-20  6-20  ' from dual
union select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9' from dual
union select '6-1' from dual
union select '6-1 6-2' from dual
),
days ( id, day ) AS (
            SELECT 1, 'Mon' FROM DUAL
  UNION ALL SELECT 2, 'Tue' FROM DUAL
  UNION ALL SELECT 3, 'Wed' FROM DUAL
  UNION ALL SELECT 4, 'Thu' FROM DUAL
  UNION ALL SELECT 5, 'Fri' FROM DUAL
  UNION ALL SELECT 6, 'Sat' FROM DUAL
  UNION ALL SELECT 0, 'Sun' FROM DUAL
),
matches ( col, idx, day ) AS (
  SELECT col,
         COLUMN_VALUE,
         day || ': ' || REGEXP_SUBSTR( t.col, '\d+-\d+', 1, COLUMN_VALUE )
  FROM   t_view t,
         TABLE(
           CAST(
             MULTISET(
               SELECT LEVEL
               FROM   DUAL
               CONNECT BY LEVEL <= REGEXP_COUNT( t.col, '\d+-\d+' )
             )
             AS SYS.ODCINUMBERLIST
           )
         ) l
         INNER JOIN days d
         ON ( MOD( l.COLUMN_VALUE, 7 ) = d.id )
)
SELECT LISTAGG( day, ', ' ) WITHIN GROUP ( ORDER BY IDX ) AS col
FROM   matches
GROUP BY col

Results:

|                                                                                      COL |
|------------------------------------------------------------------------------------------|
|                                                                                 Mon: 6-1 |
|                                                                       Mon: 6-1, Tue: 6-2 |
|                                    Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20 |
|              Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21 |
| Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9, Mon: 6-9, Tue: 6-9 |
Celesta answered 7/9, 2015 at 9:2 Comment(8)
LISTAGG is only supported on 11gR2 and up.Coincident
There are multiple other string aggregation techniques that can be used in earlier versions (if LISTAGG is not available).Celesta
@MTO From performance point of view, your solution will be extremely slow. Compare the explain plans.Coincident
@LalitKumarB This will handle ANY length of input - if you need flexibility then you will need to do something like this (or a custom function) - if you only want a fixed set of inputs then you can go with a more inflexible solution which will, of course, be more performant as it has to cope with less.Celesta
@MTO a hundred lines of more code is better than spending a 100 seconds more :-) Given that the data scalability is known in advance.Coincident
@LalitKumarB If you run the SQLFIDDLE linked in the question you will see that it returns in (for me) between 1 and 5ms.Celesta
@MTO And you mean that performance could be measured with 5 rows of input data?Coincident
@MTO In any case, it is up to the OP to chose what he wants. We have helped with different approaches and techniques.Coincident
E
1

Why can't we use this simple way? Looks good as for me

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||  
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7)
 FROM t_view

Obviously it's easy to eliminate empty Sat Sun, for example with nvl2:

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||
       nvl2(regexp_substr(col,'\d+\-\d+',1,6), 
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7),null)
 FROM t_view

You should keep in mind, that this is just example, and if you can get data with any number of day missed you should use nvl2 in more places

Enterovirus answered 7/9, 2015 at 8:54 Comment(4)
That will leave the extra concatenated values where the number of values are less than 7. It could be handled using case expression.Coincident
This will include Sat and Sun even if there are not enough values in the input to include them.Celesta
I think we can avoid this by adding the replace function above the result we got and replace the strings that dont have any digits associated with it, in our case it will be Sat and SunCalen
@arunb2w, obviously it's easy to eliminate empty Sat Sun, as you've said with replace or with nvl2 wich simpler as for me, so the rest of query will be (i've added to the answer)Enterovirus
W
1

Considering Space as delimiter, tokenise into rows(using levels) and rejoin using LISTAGG() , using the level as day generator (TO_CHAR(TRUNC(SYSDATE,'D')+level)

with t_view as
(
  select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
  union all
  select '6-20  6-20  6-20  6-20  6-20  ' from dual
  union all
  select '6-9  6-9  6-9  6-9  6-9  6-9  6-9 6-9 ' from dual
)
SELECT LISTAGG(TO_CHAR(TRUNC(SYSDATE,'D')+level1,'Dy')||': '||
               REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL1),', ')
        WITHIN GROUP (ORDER BY level1 )
from
(
SELECT  col,level level1
  FROM t_view
CONNECT BY REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL) IS NOT NULL
   AND PRIOR col = col
   AND PRIOR sys_guid() IS NOT NULL
)
group by col;
Woodberry answered 7/9, 2015 at 10:51 Comment(0)
J
1

This is my try. It is not different too much from MTo's version. The ideea is the same: transform strings into lines, add information about day, then regroup the record.

with week as (
  select 1 day_num, 'Mon' day_name from dual union all
  select 2 day_num, 'Tue' day_name from dual union all
  select 3 day_num, 'Wed' day_name from dual union all
  select 4 day_num, 'Thu' day_name from dual union all
  select 5 day_num, 'Fri' day_name from dual union all
  select 6 day_num, 'Sat' day_name from dual union all
  select 7 day_num, 'Sun' day_name from dual
),
t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
                union all
                select '6-20  6-20  6-20  6-20  6-20  ' from dual
                union all
                select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual
                ),
lines as(
  select 
    col, WEEK.DAY_NAME, l, trim(regexp_substr(col, '[^,]+', 1, L)) elem
  from (
      select regexp_replace(col,'([[:digit:]]-[[:digit:]]{1,2}[[:space:]]+)','\1,')  col
      from t_view
      )
      join (select level l from dual connect by level < 10)
  on instr(col, ',', 1, L ) > 0
  join week on WEEK.DAY_NUM = l
  order by col,l
  )
select listagg(day_name||':'||elem,' ')  within group (order by l)
from lines
group by col;

Result:

Mon:6-20 Tue:6-20 Wed:6-20 Thu:6-20 Fri:6-20
Mon:6-21 Tue:6-21 Wed:6-21 Thu:6-21 Fri:6-21 Sat:6-21 Sun:6-21
Mon:6-9 Tue:6-9 Wed:6-9 Thu:6-9 Fri:6-9 Sat:6-9 Sun:6-9
Jordain answered 7/9, 2015 at 11:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.