How to delete large amount of data from Oracle table in batches

Asked 14/11, 2014 at 12:50 Answered 30/6, 2023 at 8:52

Solved sql database oracle-database oracle10g

I'm using Oracle 10g and want to delete approx 5 million records from Persons table (Having total 15 million records) which doesn't have any reference in Order table (Having total 5 million records)

Since deteting 5 million records in one go would cause undo log issues, I've decided to delete in batches of 100k. I'm using the below query:

DELETE FROM Persons p
      WHERE     City = 'ABC'
            AND NOT EXISTS
                   (SELECT O_Id
                      FROM Orders o
                     WHERE p.P_Id = o.P_Id)
            AND ROWNUM <= 100000

Now the problem is that this query takes as long to execute for 100k records as it would have taken for 5 million because still full table scans and joins on both tables will happen.

Is there a efficient way to rewrite this query for faster execution? or replace NOT EXISTS clause with better join condition? or use some better way to limit records to 100k?

P.S. This is a one time operation only and I can't use any DDL operations for this, however pl/sql is fine

Figwort answered 14/11, 2014 at 12:50 Comment(9)

Do you have an index on Orders(P_ID) ? – Ocotillo 14/11, 2014 at 12:54

No, there are indexes on primary keys only i.e. Persons(P_ID) and Orders(O_ID) – Figwort 14/11, 2014 at 12:58

@Figwort have you tried to create different partitions on Persons based on the country? also add indexes on p_id in Orders and (city,p_id) in Persons – Hally 14/11, 2014 at 13:4

I'd like to avoid adding indexes, since its a one time operation only. – Figwort 14/11, 2014 at 13:6

have you try using create as select for values you want, and then drop the old table? I think this can be faster – Porte 14/11, 2014 at 13:55

yeah but i can't use any DDL statements (except for the index) – Figwort 14/11, 2014 at 14:12

ups, yes. i see now. Do you try @Gaurav Soni solution?, i think that is the best way – Porte 14/11, 2014 at 14:17

Do you have any triggers on the table that would slow down deletions? – Ocotillo 14/11, 2014 at 16:47

I have multiple update triggers on Orders table but none on Persons – Figwort 14/11, 2014 at 22:13

If you want this query to run faster, add the following two indexes:

 create index idx_persons_city_pid on persons(city, p_id);
 create index idx_orders_pid on orders(p_id);

Ocotillo answered 14/11, 2014 at 13:4 Comment(3)

I'd like to avoid adding indexes, since its a one time operation only. I know its really slow without index, but if query can be tuned a bit then it'll be good – Figwort 14/11, 2014 at 13:7

Adding indexes goes a lot faster than you might think. You can add the indexes, do the processing you want, and then remove them. – Ocotillo 14/11, 2014 at 13:12

I tried index. Getting count earlier took around 300 seconds, now it takes 100. Cancelled deletion after waiting for 30 mins. Not much improvement in that respect – Figwort 14/11, 2014 at 14:4

From my experience, the fastest way to delete lot of rows is :

solution 1 (recommended by Tom Kyte)

`SET TRANSACTION USE ROLLBACK SEGMENT <your_seg>
 DELETE FROM <tab1> WHERE <cond>
 COMMIT`

solution 2

`create table new_table unrecoverable as select * from old_table where ....;
drop table old_table;
rename new_table to old_table;
create index old_table_idx1 on old_table(c1,c2) unrecoverable parallel 5;
`

I used the second solution in different contexts: it is always the fastest to delete huge amount of rows.

An alternative is to put the data to delete in a partition and then drop the partition (each partition has its own rollback segment, can use parallelism, ...).

Abet answered 17/11, 2014 at 10:49 Comment(0)

If you want this query to run faster, add the following two indexes:

 create index idx_persons_city_pid on persons(city, p_id);
 create index idx_orders_pid on orders(p_id);

Ocotillo answered 14/11, 2014 at 13:4 Comment(3)

I'd like to avoid adding indexes, since its a one time operation only. I know its really slow without index, but if query can be tuned a bit then it'll be good – Figwort 14/11, 2014 at 13:7

Adding indexes goes a lot faster than you might think. You can add the indexes, do the processing you want, and then remove them. – Ocotillo 14/11, 2014 at 13:12

I tried index. Getting count earlier took around 300 seconds, now it takes 100. Cancelled deletion after waiting for 30 mins. Not much improvement in that respect – Figwort 14/11, 2014 at 14:4

DECLARE
 v_limit PLS_INTEGER :=100000;

CURSOR person_deleted_cur
IS 
 SELECT rowid 
   FROM Persons p
  WHERE City = 'ABC'
   AND NOT EXISTS
               (SELECT O_Id
                  FROM Orders o
                 WHERE p.P_Id = o.P_Id);

TYPE person_deleted_nt IS TABLE OF person_deleted_cur%ROWTYPE
        INDEX BY PLS_INTEGER;
BEGIN
  OPEN person_deleted_cur;
    LOOP
      FETCH person_deleted_cur 
        BULK COLLECT INTO person_deleted_nt LIMIT v_limit;

    FORALL indx IN 1 .. person_deleted_nt.COUNT 
      DELETE FROM Persons WHERE rowid=person_deleted_nt(indx);

    EXIT WHEN person_deleted_cur%NOTFOUND;

   END LOOP;

   CLOSE person_deleted_cur;
  COMMIT;
END;
/

Poulos answered 14/11, 2014 at 13:23 Comment(8)

@dusk7: If you dont wanna sit and delete in batches , try this PL/SQL code,it will delete the records in batch of limit 100000 – Poulos 14/11, 2014 at 13:25

yeah, will try in a test environment first :) – Figwort 14/11, 2014 at 13:32

doesn't reduce execution time :( – Figwort 14/11, 2014 at 14:37

@dusk7:Yup it will not reduce the execution time ,you can do the steps mentioned by Gordon Linoff ,and then run my code.The fundamental problem is how data is been fetched ,if proper indexing is done , then data is been fetched faster,hence faster execution.My solution is like ,if you dont want to sit in office and wait for each batch to execute ,then run my code and go home ,you can see the result later on :) – Poulos 14/11, 2014 at 14:48

is the loop necessary?, i think with fetch is enough for put the values into the collection – Porte 14/11, 2014 at 15:3

@Aramillo:Yes the loop is necessary because i am fetching by limit of 100000 ,reason is obvious,because it cause memory issue.Anyways it does not impact the context switching issue – Poulos 14/11, 2014 at 22:1

COMMIT should be in a loop (before end loop) to make transaction smaller. – Starter 19/11, 2015 at 11:29

Good idea is to finally close the cursor: EXCEPTION WHEN OTHERS THEN IF person_deleted_cur%ISOPEN THEN person_deleted_cur.CLOSE; END IF; RAISE; – Starter 19/11, 2015 at 11:31

Solution 1 : The below approach will delete 1000 records at a time and will commit the commit the thousand deleted records .

--This collection will hold 1000 no of ids to be deleted
CREATE OR REPLACE ARRAYNAMETYPE IS VARARRAY(1000) OF NUMBER;

DECLARE  
arrayNameType ARRAYNAMETYPE ;  
recordsToBeDeleted NUMBER; 
noOfRecordsDeleted NUMBER;
countDeleted NUMBER;      

CURSOR CURSOR1  
 IS  
   SELECT ID_COLUMN FROM TABLE with conditions;  

BEGIN   
    noOfRecordsDeleted  :=0;  
    countDeleted :=0;  
   --Taking the count of the total no of records that is planned to delete  
   SELECT COUNT(*) INTO recordsToBeDeleted from table_name with where condition  

     open CURSOR1;  
        LOOP   
          FETCH CURSOR1 BULK COLLECT INTO arrayNameType LIMIT 1000;  
          EXIT WHEN arrayNameType.count=0;  
            delete from table where id in (select * from table (arrayNameType));  
countDeleted:=SQL%ROWCOUNT;
noOfRecordsDeleted:= countDeleted+noOfRecordsDeleted
            commit;  

        END LOOP;  
CLOSE CURSOR1;  

end;

Solution 2 : if 70% or above data has to be deleted then the below approach can be considered

1)Create a temp table and copy the data into this temp table which is not to be deleted 2)truncate the original table 3)insert the records from the temp table into the original table 4)drop the temp table. 5)indexes needs to be taken care of

Measures answered 30/6, 2023 at 8:52 Comment(0)

-1

One more way to delete:

begin
  dbms_errlog.create_error_log('PERSONS');
end;
/

-- index on foreign key is useful thing in many cases, not only now
create index idx_orders_pid on orders(p_id); 

declare 
  min_id number;
  max_id number;
begin
  select min(p_id), max(p_id)
    into min_id, max_id
    from persons;

  for i in min_id..max_id loop
    delete from persons where p_id between i and i + 100000
    log errors into err$_persons reject limit unlimited;
  end loop;
end;
/

drop table err$_persons;

Aemia answered 14/11, 2014 at 16:37 Comment(1)

this depends on a linear growing ID sequence, if you have reallocating ID sequences this will break your data consistency – Chaunce 10/2, 2020 at 14:23

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags