SQL: How would you split a 100,000 records from a Oracle table into 5 chunks?
Asked Answered
F

4

9

I'm trying to figure out away to split the first 100,000 records from a table that has 1 million+ records into 5 (five) 20,000 records chunks to go into a file? Maybe some SQL that will get the min and max rowid or primary id for each 5 chunks of 20,000 records, so I can put the min and max value into a variable and pass it into the SQL and use a BETWEEN in the where clause to the SQL.

Can this be done?

I'm on an Oracle 11g database.

Thanks in advance.

Fainthearted answered 31/3, 2016 at 13:50 Comment(6)
OFFSET 0 FETCH FIRST 20000 ROWS ONLY?Misapprehend
There is no such thing that "the first 100,000 records [in] a table". SQL tables represent unordered sets. There is no ordering unless a column specifies the ordering.Contraceptive
okay just the first 100,000 records it fetches then, how would I split this into 5 chunks?Fainthearted
Search for NTILE function.Amundson
What do you mean split?Contraceptive
I mean grab 100,000 records and split those 100,000 records into 5 chunks, so your able to get the min and max value for each chunk, I need the min and max value of each chunk to shove into a where clause of a piece of SQL.Fainthearted
C
15

If you just want to assign values 1-5 to basically equal sized groups, then use ntile():

select t.*, ntile(5) over (order by NULL) as num
from (select t.*
      from t
      where rownum <= 100000
     ) t;

If you want to insert into 5 different tables, then use insert all:

insert all
    when num = 1 then into t1
    when num = 2 then into t2
    when num = 3 then into t3
    when num = 4 then into t4
    when num = 5 then into t5
    select t.*, ntile(5) over (order by NULL) as num
    from (select t.*
          from t
          where rownum <= 100000
         ) t;
Contraceptive answered 31/3, 2016 at 14:2 Comment(1)
@BN . . . This answer is for Oracle.Contraceptive
H
3

A bit harsh down voting another fair question.

Anyway, NTILE is new to me, so I wouldn't have discovered that were it not for your question.

My way of doing this , the old school way, would have been to MOD the rownum to get the group number, e.g.

select t.*, mod(rn,5) as num
from (select t.*, rownnum rn
      from t
     ) t;

This solves the SQL part, or rather how to group rows into equal chunks, but that is only half your question. The next half is how to write these to 5 separate files.

You can either have 5 separate queries each spooling to a separate file, e.g:

spool f1.dat
    select t.*
    from (select t.*, rownnum rn
          from t
         ) t
    where mod(t.rn,5) = 0;
spool off

spool f2.dat
    select t.*
    from (select t.*, rownnum rn
          from t
         ) t
    where mod(t.rn,5) = 1;
spool off

etc.

Or, using UTL_FILE. You could try something clever with a single query and have an array of UTL_FILE types where the array index matches the MOD(rn,5) then you wouldn't need logic like "IF rn = 0 THEN UTL_FILE.WRITELN(f0, ...".

So, something like (not tested, just in a rough form for guidance, never tried this myself):

DECLARE
   TYPE fname IS VARRAY(5) OF VARCHAR2(100);
   TYPE fh    IS VARRAY(5) OF UTL_FILE.FILE_TYPE;
   CURSOR c1 IS 
    select t.*, mod(rn,5) as num
    from (select t.*, rownnum rn
          from t
         ) t;
   idx INTEGER;
BEGIN
  FOR idx IN 1..5 LOOP
      fname(idx) := 'data_' || idx || '.dat';
      fh(idx) := UTL_FILE.'THE_DIR', fname(idx), 'w');
  END LOOP;
  FOR r1 IN c1 LOOP
     UTL_FILE.PUT_LINE ( fh(r1.num+1), r1.{column value from C1} );
  END LOOP;
  FOR idx IN 1..5 LOOP
      UTL_FILE.FCLOSE (fh(idx));
  END LOOP;
END;
Hux answered 31/3, 2016 at 14:43 Comment(1)
Thanks TenG, "A bit harsh down voting another fair question"...I'm not too bothered with people not liking my question, so long as the question is answered and thanks to guys like you questions are answered.Fainthearted
F
2

Thanks so much to Gordon Linoff for giving me a starter to the code.

just an update on how to get the min and max values for 5 chunks.

select num, min(cre_surr_id), max(cre_surr_id)
from
(select p.cre_surr_id, ntile(5) over (order by NULL) as num
from (select p.*
      from productions p
      where rownum <= 100000
 ) p )
group by num
Fainthearted answered 31/3, 2016 at 14:33 Comment(0)
G
2

You can even try with simple aggregation:

create table test_chunk(val) as
(
    select floor(dbms_random.value(1, level * 10)) from dual
    connect by level <= 100
)

select min(val), max(val), floor((num+1)/2)
from (select rownum as num, val from test_chunk)
group by floor((num+1)/2)
Gifferd answered 31/3, 2016 at 14:34 Comment(1)
Another great Answer by Aleksej, thanks guys, you've all been a big help.Fainthearted

© 2022 - 2024 — McMap. All rights reserved.