SQL Regex - Replace with substring from another field
Asked Answered
D

4

6

I have a database table (Oracle 11g) of questionnaire feedback, including multiple choice, multiple answer questions. The Options column has each value the user could choose, and the Answers column has the numerical values of what they chose.

ID_NO     OPTIONS                               ANSWERS
1001      Apple Pie|Banana-Split|Cream Tea      1|2
1002      Apple Pie|Banana-Split|Cream Tea      2|3
1003      Apple Pie|Banana-Split|Cream Tea      1|2|3

I need a query that will decode the answers, with the text versions of the answers as a single string.

ID_NO     ANSWERS     ANSWER_DECODE
1001      1|2         Apple Pie|Banana-Split
1002      2|3         Banana-Split|Cream Tea
1003      1|2|3       Apple Pie|Banana-Split|Cream Tea

I have experimented with regular expressions to replace values and get substrings, but I can't work out a way to properly merge the two.

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT 
  id_no,
  options,
  REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
  answers,
  REGEXP_REPLACE(answers, '(\d)+', ' \1 ') answer_numbers,
  REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback

I don't want to have to manually define or decode the answers in the SQL; there are many surveys with different questions (and differing numbers of options), so I'm hoping that there's a solution that will dynamically work for all of them.

I've tried to split the options and answers into separate rows by LEVEL, and re-join them where the codes match, but this runs exceedingly slow with the actual dataset (a 5-option question with 600 rows of responses).

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
    answer_rows.id_no,
    ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
  (SELECT DISTINCT
    LEVEL lvl,
    REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
  FROM
    (SELECT DISTINCT
      options,
      REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
    FROM
      feedback)
  CONNECT BY LEVEL <= num_choices
  ) option_rows
  LEFT OUTER JOIN
  (SELECT DISTINCT
    id_no,
    to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
  FROM
    (SELECT DISTINCT
      id_no,
      answers,
      To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
    FROM
      feedback)
  WHERE
    to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
  CONNECT BY LEVEL <= max_answer
  ) answer_rows
    ON option_rows.lvl = answer_rows.answer
GROUP BY
    answer_rows.id_no
ORDER BY
  answer_rows.id_no

If there isn't a solution just using Regex, is there a more efficient way than LEVEL to split the values? Or is there another approach that would work?

Darb answered 25/1, 2016 at 16:41 Comment(2)
Relevant: #26408038Perjure
Why not a function. Should be simpler.Skimmer
B
1

It's slow because you're expanding each row too many times; the connect-by clauses you're using are looking across all the rows, so you're ending up with a huge amount of data to then sort - which is presumably why you ended up with the DISTINCT in there.

You can add two PRIOR clauses to the connect-by, firstly so the ID_NO is preserved, and a second to avoid a loop - any non-deterministic function will do for this, I've picked dbms_random.value but you can use sys_guid if you prefer, or something else. You also don't need to many subqueries, you can do it with two; or as CTEs which I think it s a bit clearer:

WITH feedback AS (
  SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
  SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
  SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
  SELECT
    id_no,
    LEVEL answer,
    REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
  FROM feedback
  CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
  AND id_no = PRIOR id_no
  AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
  SELECT
    id_no,
    REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
  FROM feedback
  CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
  AND PRIOR id_no = id_no
  AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
  option_rows.id_no,
  LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
  LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;

Which gets:

     ID_NO ANSWERS    ANSWER_DECODE                          
---------- ---------- ----------------------------------------
      1001 1|2        Apple Pie|Banana-Split                  
      1002 2|3        Banana-Split|Cream Tea                  
      1003 1|2|3      Apple Pie|Banana-Split|Cream Tea  

I've also changed your regex pattern so you don't have to append or strip the |.

Backandforth answered 25/1, 2016 at 18:34 Comment(0)
M
1

Check out this compact solution:

   with sample_data as
(
  select 'ala|ma|kota' options, '1|2' answers from dual
  union all
  select 'apples|oranges|bacon', '1|2|3' from dual
  union all
  select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual
)
select answers, options,
regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode
from sample_data;

Output:

  ANSWERS   OPTIONS              ANSWER_DECODE
--------- -------------------- ---------------------------
1|2       ala|ma|kota          ala|ma
1|2|3     apples|oranges|bacon apples|oranges|bacon
1|3|4|5|8 a|b|c|d|e|f|h|i      a|c|d|f|h|i
Maillot answered 25/1, 2016 at 19:27 Comment(2)
What happens if there are more than 3 possible answers in the survey?Dissatisfaction
Stil works with little extra cleaning - removing unnecessary separatorsMaillot
D
0

I've written a close solution in MySQL (don't have Oracle installed right now) - but I've written what needs to be changed in order for the query to work in Oracle.

Also, the ugliest part of my code will be much better looking in Oracle since it has a much better INSTR function.

The idea is to do a CROSS JOIN with a list of numbers (1 to 10 in order to support up to 10 options per survey), and break-down the OPTIONS field into different rows... (you do this by using both the list of numbers and Oracle's INSTR function, see the comments).

From there you filter out the rows that were not selected and group everything back together.

-- I've used GROUP_CONCAT in MySQL, but in Oracle you'll have to use WM_CONCAT
select ID_NO, ANSWERS, group_concat(broken_down_options,'|') `OPTIONS`
from (
    select your_table.ID_NO, your_table.ANSWERS, 
            -- Luckily, you're using ORACLE so you can use an INSTR function that has the "occurrence" parameter
            -- INSTR(string, substring, [position, [occurrence]])
            -- use the nums.num field as input for the occurrence parameter
            -- and just put '1' under "position"
            case when nums.num = 1 
                then substr(your_table.`OPTIONS`, 1, instr(your_table.`OPTIONS`, '|') - 1)
                when nums.num = 2
                then substr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), 1, instr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), '|') - 1)
                else substr(your_table.`OPTIONS`,  length(your_table.`OPTIONS`) - instr(reverse(your_table.`OPTIONS`), '|') + 2) end broken_down_options
    from (select 1 num union all
        select 2 num union all
        select 3 num union all
        select 4 num union all
        select 5 num union all
        select 6 num union all
        select 7 num union all
        select 8 num union all
        select 9 num union all
        select 10 num
        ) nums 
        CROSS JOIN
        (select 1001 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2' ANSWERS union
        select 1002 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '2|3' ANSWERS union
        select 1003 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2|3' ANSWERS
        ) your_table
    -- for example: 2|3 matches 2 and 3 but not 1
    where your_table.ANSWERS like concat(concat('%',nums.num),'%')
) some_query
group by ID_NO, ANSWERS
Dissatisfaction answered 25/1, 2016 at 18:18 Comment(0)
C
0

Create a stored predure and do below steps

  • Declare an array of your size.
  • Get option data from first row. Use a regex or level to extract values between pipes and then store them in array. Note: This will be only a one time itwration. So you dont need to repeat it for every row.
  • Now in a loop, for each row, select answers and use array values to assign the values of answers
Correct answered 25/1, 2016 at 18:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.