converting the data with regexp in oracle sql

Asked 7/9, 2015 at 8:27 Answered 7/9, 2015 at 11:13

I have a data like below with tab limited among them. I have represented them with a view here

with t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
union
select '6-20  6-20  6-20  6-20  6-20  ' from dual
union
select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual)

My expected output is

Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21
Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20
Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9

I thought of replacing all those horizontal table with some unique patterns like this and then replace that pattern with Mon, Tue based on indexing

$1(6-20 )$2(6-20 )$3(6-20 )$4(6-20 )$5(6-20 )

I have tried the below query but could not complete it

select regexp_replace(col, '([[:digit:]]-[[:digit:]]{2}[[:space:]]+)','$(\1)') from t_view;

Calen answered 7/9, 2015 at 8:27 Comment(1)

You cannot use one regex replacement here since it requires 2 different replacement strings (as the number of days is not constant). – Episcopal 7/9, 2015 at 8:48

You need a combination of CASE expression, REGEXP_COUNT and REGEXP_REPLACE since you do not have the same expression for all the rows. Depending on the data, you could have as many conditions in the case expression.

The regular expression pattern is (\d-\d+ ).

For example,

SQL> WITH t_view(col) AS
  2    ( SELECT '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' FROM dual
  3    UNION
  4    SELECT '6-20  6-20  6-20  6-20  6-20  ' FROM dual
  5    UNION
  6    SELECT '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' FROM dual
  7    )
  8  SELECT REPLACE(new_col, '  ','') new_col
  9  FROM (
 10    SELECT
 11      CASE
 12        WHEN regexp_count(col, '\d+\-\d+') = 5
 13        THEN regexp_replace(col,
 14                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 15                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5')
 16        WHEN regexp_count(col, '\d+\-\d+') = 7
 17        THEN regexp_replace(col,
 18                            '(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )(\d-\d+  )',
 19                            'Mon: \1, Tue: \2,Wed: \3,Thu: \4,Fri: \5,Sat: \6,Sun: \7')
 20      END new_col
 21    FROM t_view
 22  );

NEW_COL
----------------------------------------------------------------------------------------------------
Mon: 6-20, Tue: 6-20,Wed: 6-20,Thu: 6-20,Fri: 6-20
Mon: 6-21, Tue: 6-21,Wed: 6-21,Thu: 6-21,Fri: 6-21,Sat: 6-21,Sun: 6-21
Mon: 6-9, Tue: 6-9,Wed: 6-9,Thu: 6-9,Fri: 6-9,Sat: 6-9,Sun: 6-9

SQL>

Coincident answered 7/9, 2015 at 8:58 Comment(7)

What happens if there are 1, 2 or 20 values in an input? – Celesta 7/9, 2015 at 9:8

@Celesta I already explained that in my answer. The expression is not same for all rows, so CASE could be used. But tell me, which calendar has more than 7 days in a week? Logically, OP's question is concatenating the Day name to each pattern in the regular expression. I know it could also be done using row generator method and string split. But I focused on regexp_replace as OP mentioned it in the question. – Coincident 7/9, 2015 at 9:12

If there are 10 days in the input then you want a full week plus 3 additional days of the next week - a point you seem to have missed when you bring up different calendars. With this method you are going to have to hand code a CASE for each potential length of input - and if you miss one then you will get NULL instead of the expected output. Also - you have whitespace before the commas. – Celesta 7/9, 2015 at 9:18

@MTO The whitespace before the commas...hmm...replaced ;-) – Coincident 7/9, 2015 at 9:20

This technique won't scale to more than 9 days - as Oracle can only do single digit references to regular expression capture groups. Also, while you've edited the output to fix the whitespace the SQL in the REGEXP_REPLACE has not changed so does not produce your current output. – Celesta 7/9, 2015 at 9:50

@MTO Oh come on! You just thought that I simply manipulated the output? I used REPLACE function, re-executed the query and edited my answer. You are free to copy the code and verify at your end. – Coincident 7/9, 2015 at 9:57

My apologies - you have a nested REPLACE which I hadn't seen - you can eliminate it by moving the brackets on the capture groups from (\d-\d+ ) (which will include the trailing space in the capture group) to (\d-\d+) . – Celesta 7/9, 2015 at 10:8

SQL Fiddle

Oracle 11g R2 Schema Setup:

Query 1:

with t_view ( col ) as (
      select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' from dual
union select '6-20  6-20  6-20  6-20  6-20  ' from dual
union select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9  6-9' from dual
union select '6-1' from dual
union select '6-1 6-2' from dual
),
days ( id, day ) AS (
            SELECT 1, 'Mon' FROM DUAL
  UNION ALL SELECT 2, 'Tue' FROM DUAL
  UNION ALL SELECT 3, 'Wed' FROM DUAL
  UNION ALL SELECT 4, 'Thu' FROM DUAL
  UNION ALL SELECT 5, 'Fri' FROM DUAL
  UNION ALL SELECT 6, 'Sat' FROM DUAL
  UNION ALL SELECT 0, 'Sun' FROM DUAL
),
matches ( col, idx, day ) AS (
  SELECT col,
         COLUMN_VALUE,
         day || ': ' || REGEXP_SUBSTR( t.col, '\d+-\d+', 1, COLUMN_VALUE )
  FROM   t_view t,
         TABLE(
           CAST(
             MULTISET(
               SELECT LEVEL
               FROM   DUAL
               CONNECT BY LEVEL <= REGEXP_COUNT( t.col, '\d+-\d+' )
             )
             AS SYS.ODCINUMBERLIST
           )
         ) l
         INNER JOIN days d
         ON ( MOD( l.COLUMN_VALUE, 7 ) = d.id )
)
SELECT LISTAGG( day, ', ' ) WITHIN GROUP ( ORDER BY IDX ) AS col
FROM   matches
GROUP BY col

Results:

|                                                                                      COL |
|------------------------------------------------------------------------------------------|
|                                                                                 Mon: 6-1 |
|                                                                       Mon: 6-1, Tue: 6-2 |
|                                    Mon: 6-20, Tue: 6-20, Wed: 6-20, Thu: 6-20, Fri: 6-20 |
|              Mon: 6-21, Tue: 6-21, Wed: 6-21, Thu: 6-21, Fri: 6-21, Sat: 6-21, Sun: 6-21 |
| Mon: 6-9, Tue: 6-9, Wed: 6-9, Thu: 6-9, Fri: 6-9, Sat: 6-9, Sun: 6-9, Mon: 6-9, Tue: 6-9 |

Celesta answered 7/9, 2015 at 9:2 Comment(8)

LISTAGG is only supported on 11gR2 and up. – Coincident 7/9, 2015 at 9:13

There are multiple other string aggregation techniques that can be used in earlier versions (if LISTAGG is not available). – Celesta 7/9, 2015 at 9:22

@MTO From performance point of view, your solution will be extremely slow. Compare the explain plans. – Coincident 7/9, 2015 at 9:25

@LalitKumarB This will handle ANY length of input - if you need flexibility then you will need to do something like this (or a custom function) - if you only want a fixed set of inputs then you can go with a more inflexible solution which will, of course, be more performant as it has to cope with less. – Celesta 7/9, 2015 at 9:31

@MTO a hundred lines of more code is better than spending a 100 seconds more :-) Given that the data scalability is known in advance. – Coincident 7/9, 2015 at 9:34

@LalitKumarB If you run the SQLFIDDLE linked in the question you will see that it returns in (for me) between 1 and 5ms. – Celesta 7/9, 2015 at 9:54

@MTO And you mean that performance could be measured with 5 rows of input data? – Coincident 7/9, 2015 at 9:59

@MTO In any case, it is up to the OP to chose what he wants. We have helped with different approaches and techniques. – Coincident 7/9, 2015 at 10:0

Why can't we use this simple way? Looks good as for me

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||  
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7)
 FROM t_view

Obviously it's easy to eliminate empty Sat Sun, for example with nvl2:

SELECT 'Mon: '||regexp_substr(col,'\d+\-\d+',1,1) ||
       ', Tue: '||regexp_substr(col,'\d+\-\d+',1,2) ||
       ', Wed: '||regexp_substr(col,'\d+\-\d+',1,3) ||
       ', Thu: '||regexp_substr(col,'\d+\-\d+',1,4) ||
       ', Fri: '||regexp_substr(col,'\d+\-\d+',1,5) ||
       nvl2(regexp_substr(col,'\d+\-\d+',1,6), 
       ', Sat: '||regexp_substr(col,'\d+\-\d+',1,6) ||
       ', Sun: '||regexp_substr(col,'\d+\-\d+',1,7),null)
 FROM t_view

You should keep in mind, that this is just example, and if you can get data with any number of day missed you should use nvl2 in more places

Enterovirus answered 7/9, 2015 at 8:54 Comment(4)

That will leave the extra concatenated values where the number of values are less than 7. It could be handled using case expression. – Coincident 7/9, 2015 at 9:5

This will include Sat and Sun even if there are not enough values in the input to include them. – Celesta 7/9, 2015 at 9:7

I think we can avoid this by adding the replace function above the result we got and replace the strings that dont have any digits associated with it, in our case it will be Sat and Sun – Calen 7/9, 2015 at 11:23

@arunb2w, obviously it's easy to eliminate empty Sat Sun, as you've said with replace or with nvl2 wich simpler as for me, so the rest of query will be (i've added to the answer) – Enterovirus 7/9, 2015 at 11:37

Considering Space as delimiter, tokenise into rows(using levels) and rejoin using LISTAGG() , using the level as day generator (TO_CHAR(TRUNC(SYSDATE,'D')+level)

with t_view as
(
  select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
  union all
  select '6-20  6-20  6-20  6-20  6-20  ' from dual
  union all
  select '6-9  6-9  6-9  6-9  6-9  6-9  6-9 6-9 ' from dual
)
SELECT LISTAGG(TO_CHAR(TRUNC(SYSDATE,'D')+level1,'Dy')||': '||
               REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL1),', ')
        WITHIN GROUP (ORDER BY level1 )
from
(
SELECT  col,level level1
  FROM t_view
CONNECT BY REGEXP_SUBSTR(col,'[^ ]+',1,LEVEL) IS NOT NULL
   AND PRIOR col = col
   AND PRIOR sys_guid() IS NOT NULL
)
group by col;

Woodberry answered 7/9, 2015 at 10:51 Comment(0)

This is my try. It is not different too much from MTo's version. The ideea is the same: transform strings into lines, add information about day, then regroup the record.

with week as (
  select 1 day_num, 'Mon' day_name from dual union all
  select 2 day_num, 'Tue' day_name from dual union all
  select 3 day_num, 'Wed' day_name from dual union all
  select 4 day_num, 'Thu' day_name from dual union all
  select 5 day_num, 'Fri' day_name from dual union all
  select 6 day_num, 'Sat' day_name from dual union all
  select 7 day_num, 'Sun' day_name from dual
),
t_view as (select '6-21  6-21  6-21  6-21  6-21  6-21  6-21  ' as col from dual
                union all
                select '6-20  6-20  6-20  6-20  6-20  ' from dual
                union all
                select '6-9  6-9  6-9  6-9  6-9  6-9  6-9  ' from dual
                ),
lines as(
  select 
    col, WEEK.DAY_NAME, l, trim(regexp_substr(col, '[^,]+', 1, L)) elem
  from (
      select regexp_replace(col,'([[:digit:]]-[[:digit:]]{1,2}[[:space:]]+)','\1,')  col
      from t_view
      )
      join (select level l from dual connect by level < 10)
  on instr(col, ',', 1, L ) > 0
  join week on WEEK.DAY_NUM = l
  order by col,l
  )
select listagg(day_name||':'||elem,' ')  within group (order by l)
from lines
group by col;

Result:

Mon:6-20 Tue:6-20 Wed:6-20 Thu:6-20 Fri:6-20
Mon:6-21 Tue:6-21 Wed:6-21 Thu:6-21 Fri:6-21 Sat:6-21 Sun:6-21
Mon:6-9 Tue:6-9 Wed:6-9 Thu:6-9 Fri:6-9 Sat:6-9 Sun:6-9

Jordain answered 7/9, 2015 at 11:13 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags