Safely remove items from an array table while iterating
Asked Answered
W

12

52

This question is similar to How can I safely iterate a lua table while keys are being removed but distinctly different.

Summary

Given a Lua array (table with keys that are sequential integers starting at 1), what's the best way to iterate through this array and delete some of the entries as they are seen?

Real World Example

I have an array of timestamped entries in a Lua array table. Entries are always added to the end of the array (using table.insert).

local timestampedEvents = {}
function addEvent( data )
  table.insert( timestampedEvents, {getCurrentTime(),data} )
end

I need to occasionally run through this table (in order) and process-and-remove certain entries:

function processEventsBefore( timestamp )
  for i,stamp in ipairs( timestampedEvents ) do
    if stamp[1] <= timestamp then
      processEventData( stamp[2] )
      table.remove( timestampedEvents, i )
    end
  end
end

Unfortunately, the code above approach breaks iteration, skipping over some entries. Is there any better (less typing, but still safe) way to do this than manually walking the indices:

function processEventsBefore( timestamp )
  local i = 1
  while i <= #timestampedEvents do -- warning: do not cache the table length
    local stamp = timestampedEvents[i]
    if stamp[1] <= timestamp then
      processEventData( stamp[2] )
      table.remove( timestampedEvents, i )
    else
      i = i + 1
    end
  end
end
Wellheeled answered 12/9, 2012 at 19:18 Comment(2)
WARNING FOR ALL READERS: The table.remove() function should be banned from Lua! Scroll down to my answer labeled "Efficiency!" if you want to see benchmarks, showing how dangerous table.remove() is! ;-) Dear readers: Please ignore all answers that suggest table.remove(). It should never be used for anything!Lathi
And just read this to save you some time regarding the comment above: #7129737Nominal
P
21

I'd avoid table.remove and traverse the array once setting the unwanted entries to nil then traverse the array again compacting it if necessary.

Here's the code I have in mind, using the example from Mud's answer:

local input = { 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p' }
local remove = { f=true, g=true, j=true, n=true, o=true, p=true }

local n=#input

for i=1,n do
        if remove[input[i]] then
                input[i]=nil
        end
end

local j=0
for i=1,n do
        if input[i]~=nil then
                j=j+1
                input[j]=input[i]
        end
end
for i=j+1,n do
        input[i]=nil
end
Pomerania answered 13/9, 2012 at 0:13 Comment(4)
Can you provide any reason for avoiding table.remove, or consequences of using it?Wellheeled
@Wellheeled because it will move items several times when only one is needed. For instance if you remove every other entry in the array then a quadratic number of moves will be performed if you use table.remove.Pomerania
One can do even better: few lines in the first loop make the second one (j+1 to n) obsolete. It can save up to N steps. (Consider case when after the first loop the array contains all nils but the last element). The key is to not just write to input[j] but swap it with input[i] if i ~= jCashmere
Here's a heads up for anyone transcribing this code: You must capture the length of the array at the top (i.e. you cannot just grab it in the bottom for loop). Arrays are just tables in Lua, so they store no metadata for length. Instead, length means "check each index from 1, and the first one that returns nil tells you the length". This causes problems when you set arbitrary elements in the middle to nil in the first for loop.Obtrude
A
52

the general case of iterating over an array and removing random items from the middle while continuing to iterate

If you're iterating front-to-back, when you remove element N, the next element in your iteration (N+1) gets shifted down into that position. If you increment your iteration variable (as ipairs does), you'll skip that element. There are two ways we can deal with this.

Using this sample data:

    input = { 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p' }
    remove = { f=true, g=true, j=true, n=true, o=true, p=true }

We can remove input elements during iteration by:

  1. Iterating from back to front.

     for i=#input,1,-1 do
         if remove[input[i]] then
             table.remove(input, i)
         end
     end
    
  2. Controlling the loop variable manually, so we can skip incrementing it when removing an element:

     local i=1
     while i <= #input do
         if remove[input[i]] then
             table.remove(input, i)
         else
             i = i + 1
         end
     end
    

For non-array tables, you iterate using next or pairs (which is implemented in terms of next) and set items you want removed to nil.

Note that table.remove shifts all following elements every time it's called, so performance is quadratic (O(N^2)) for N removals. If you're removing a lot of elements, you should shift the items yourself as in LHF or Mitch's answer.

Azelea answered 12/9, 2012 at 23:47 Comment(2)
Performance is most definitely not exponential. You'd expect individual calls to have a performance of O(N), and N removals to have a performance of O(N²).Preselector
You're right. I thought N² was called exponential because of the exponent. Fixed.Azelea
L
33

Efficiency!

WARNING: Do NOT use table.remove(). That function causes all of the subsequent (following) array indices to be re-indexed every time you call it to remove an array entry. It is therefore MUCH faster to just "compact/re-index" the table in a SINGLE passthrough OURSELVES instead!

The best technique is simple: Count upwards (i) through all array entries, while keeping track of the position we should put the next "kept" value into (j). Anything that's not kept (or which is moved from i to j) is set to nil which tells Lua that we've erased that value.

I'm sharing this, since I really don't like the other answers on this page (as of Oct 2018). They're either wrong, bug-ridden, overly simplistic or overly complicated, and most are ultra-slow. So I implemented an efficient, clean, super-fast one-pass algorithm instead. With a SINGLE loop.

Here's a fully commented example (there's a shorter, non-tutorial version at the end of this post):

function ArrayShow(t)
    for i=1,#t do
        print('total:'..#t, 'i:'..i, 'v:'..t[i]);
    end
end

function ArrayRemove(t, fnKeep)
    print('before:');
    ArrayShow(t);
    print('---');
    local j, n = 1, #t;
    for i=1,n do
        print('i:'..i, 'j:'..j);
        if (fnKeep(t, i, j)) then
            if (i ~= j) then
                print('keeping:'..i, 'moving to:'..j);
                -- Keep i's value, move it to j's pos.
                t[j] = t[i];
                t[i] = nil;
            else
                -- Keep i's value, already at j's pos.
                print('keeping:'..i, 'already at:'..j);
            end
            j = j + 1;
        else
            t[i] = nil;
        end
    end
    print('---');
    print('after:');
    ArrayShow(t);
    return t;
end

local t = {
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'
};

ArrayRemove(t, function(t, i, j)
    -- Return true to keep the value, or false to discard it.
    local v = t[i];
    return (v == 'a' or v == 'b' or v == 'f' or v == 'h');
end);

Output, showing its logic along the way, how it's moving things around, etc...

before:
total:9 i:1 v:a
total:9 i:2 v:b
total:9 i:3 v:c
total:9 i:4 v:d
total:9 i:5 v:e
total:9 i:6 v:f
total:9 i:7 v:g
total:9 i:8 v:h
total:9 i:9 v:i
---
i:1 j:1
keeping:1   already at:1
i:2 j:2
keeping:2   already at:2
i:3 j:3
i:4 j:3
i:5 j:3
i:6 j:3
keeping:6   moving to:3
i:7 j:4
i:8 j:4
keeping:8   moving to:4
i:9 j:5
---
after:
total:4 i:1 v:a
total:4 i:2 v:b
total:4 i:3 v:f
total:4 i:4 v:h

Finally, here's the function for use in your own code, without all of the tutorial-printing... and with just a few minimal comments to explain the final algorithm:

function ArrayRemove(t, fnKeep)
    local j, n = 1, #t;

    for i=1,n do
        if (fnKeep(t, i, j)) then
            -- Move i's kept value to j's position, if it's not already there.
            if (i ~= j) then
                t[j] = t[i];
                t[i] = nil;
            end
            j = j + 1; -- Increment position of where we'll place the next kept value.
        else
            t[i] = nil;
        end
    end

    return t;
end

That's it!

And if you don't want to use the whole "re-usable callback/function" design, you can simply copy the inner code of ArrayRemove() into your project, and change the line if (fnKeep(t, i, j)) then to if (t[i] == 'deleteme') then... That way you get rid of the function call/callback overhead too, and speed things up even more!

Personally, I use the re-usable callback system, since it still massively beats table.remove() by factors of 100-1000+ times faster.

Bonus (Advanced Users): Regular users can skip reading this bonus section. It describes how to sync multiple related tables. Note that the 3rd parameter to fnKeep(t, i, j), the j, is a bonus parameter which allows your keep-function to know what index the value will be stored at whenever fnKeep answers true (to keep that value).

Example usage: Let's say you have two "linked" tables, where one is table['Mitch'] = 1; table['Rick'] = 2; (a hash-table for quick array index lookups via named strings) and the other is array[{Mitch Data...}, {Rick Data...}] (an array with numerical indices, where Mitch's data is at pos 1 and Rick's data is at pos 2, exactly as described in the hash-table). Now you decide to loop through the array and remove Mitch Data, which thereby moves Rick Data from position 2 to position 1 instead...

Your fnKeep(t, i, j) function can then easily use the j info to update the hash-table pointers to ensure they always point at the correct array offsets:

local hData = {['Mitch'] = 1, ['Rick'] = 2};
local aData = {
    {['name'] = 'Mitch', ['age'] = 33}, -- [1]
    {['name'] = 'Rick', ['age'] = 45}, -- [2]
};

ArrayRemove(aData, function(t, i, j)
    local v = t[i];
    if (v['name'] == 'Rick') then -- Keep "Rick".
        if (i ~= j) then -- i and j differing means its data offset will be moved if kept.
            hData[v['name']] = j; -- Point Rick's hash table entry at its new array location.
        end
        return true; -- Keep.
    else
        hData[v['name']] = nil; -- Delete this name from the lookup hash-table.
        return false; -- Remove from array.
    end
end);

Thereby removing 'Mitch' from both the lookup hash-table and the array, and moving the 'Rick' hash-table entry to point to 1 (that's the value of j) where its array data is being moved to (since i and j differed, meaning the data was being moved).

This kind of algorithm allows your related tables to stay in perfect sync, always pointing at the correct data position thanks to the j parameter.

It's just an advanced bonus for those who need that feature. Most people can simply ignore the j parameter in their fnKeep() functions!

Well, that's all, folks!

Enjoy! :-)

Benchmarks (aka "Let's have a good laugh...")

I decided to benchmark this algorithm against the standard "loop backwards and use table.remove()" method which 99.9% of all Lua users are using.

To do this test, I used the following test.lua file: https://pastebin.com/aCAdNXVh

Each algorithm being tested is given 10 test-arrays, containing 2 million items per array (a total of 20 million items per algorithm-test). The items in all arrays are identical (to ensure total fairness in testing): Every 5th item is the number "13" (which will be deleted), and all other items are the number "100" (which will be kept).

Well... my ArrayRemove() algorithm's test concluded in 2.8 seconds (to process the 20 million items). I'm now waiting for the table.remove() test to finish... It's been a few minutes so far and I am getting bored........ Update: Still waiting... Update: I am hungry... Update: Hello... today?! Update: Zzz... Update: Still waiting... Update: ............ Update: Okay, the table.remove() code (which is the method that most Lua users are using) is going to take a few days. I'll update the day it finishes.

Note to self: I began running the test at ~04:55 GMT on November 1st, 2018. My ArrayRemove() algorithm finished in 2.8 seconds... The built-in Lua table.remove() algorithm is still running as of now... I'll update this post later... ;-)

Update: It is now 14:55 GMT on November 1st, 2018, and the table.remove() algorithm has STILL NOT FINISHED. I'm going to abort that part of the test, because Lua has been using 100% of my CPU for the past 10 hours, and I need my computer now. And it's hot enough to make coffee on the laptop's aluminum case...

Here's the result:

  • Processing 10 arrays with 2 million items (20 million items total):
  • My ArrayRemove() function: 2.8 seconds.
  • Normal Lua table.remove(): I decided to quit the test after 10 hours of 100% CPU usage by Lua. Because I need to use my laptop now! ;-)

Here's the stack trace when I pressed Ctrl-C... which confirms what Lua function my CPU has been working on for the last 10 hours, haha:

[     mitch] elapsed time: 2.802

^Clua: test.lua:4: interrupted!
stack traceback:
    [C]: in function 'table.remove'
    test.lua:4: in function 'test_tableremove'
    test.lua:43: in function 'time_func'
    test.lua:50: in main chunk
    [C]: in ?

If I had let the table.remove() test run to its completion, it may take a few days... Anyone who doesn't mind wasting a ton of electricity is welcome to re-run this test (file is above at pastebin) and let us all know how long it took.

Why is table.remove() so insanely slow? Simply because every call to that function has to repeatedly re-index every table item that exists after the one we told it to remove! So to delete the 1st item in a 2 million item array, it must move the indices of ALL other 2 million items down by 1 slot to fill the gap caused by the deletion. And then... when you remove another item.. it has to yet again move ALL other 2 million items... It does this over and over...

You should never, EVER use table.remove()! Its performance penalty grows rapidly. Here's an example with smaller array sizes, to demonstrate this:

  • 10 arrays of 1,000 items (10k items total): ArrayRemove(): 0.001 seconds, table.remove(): 0.018 seconds (18x slower).
  • 10 arrays of 10,000 items (100k items total): ArrayRemove(): 0.014 seconds, table.remove(): 1.573 seconds (112.4x slower).
  • 10 arrays of 100,000 items (1m items total): ArrayRemove(): 0.142 seconds, table.remove(): 3 minutes, 48 seconds (1605.6x slower).
  • 10 arrays of 2,000,000 items (20m items total): ArrayRemove(): 2.802 seconds, table.remove(): I decided to abort the test after 10 hours, so we may never now how long it takes. ;-) But at the current timepoint (not even finished), it's taken 12847.9x longer than ArrayRemove()... But the final table.remove() result, if I had let it finish, would probably be around 30-40 thousand times slower.

As you can see, table.remove()'s growth in time is not linear (because if it was, then our 1 million item test would have only taken 10x as long as the 0.1 million (100k) test, but instead we see 1.573s vs 3m48s!). So we cannot take a lower test (such as 10k items) and simply multiply it to 10 million items to know how long the test that I aborted would have taken... So if anyone is truly curious about the final result, you'll have to run the test yourselves and post a comment after a few days when table.remove() finishes...

But what we can do at this point, with the benchmarks we have so far, is say table.remove() sucks! ;-)

There's no reason to ever call that function. EVER. Because if you want to delete items from a table, just use t['something'] = nil;. If you want to delete items from an array (a table with numeric indices), use ArrayRemove().

By the way, the tests above were all executed using Lua 5.3.4, since that's the standard runtime most people use. I decided to do a quick run of the main "20 million items" test using LuaJIT 2.0.5 (JIT: ON CMOV SSE2 SSE3 SSE4.1 fold cse dce fwd dse narrow loop abc sink fuse), which is a faster runtime than the standard Lua. The result for 20 million items with ArrayRemove() was: 2.802 seconds in Lua, and 0.092 seconds in LuaJIT. Which means that if your code/project runs on LuaJIT, you can expect even faster performance from my algorithm! :-)

I also re-ran the "100k items" test one final time using LuaJIT, so that we can see how table.remove() performs in LuaJIT instead, and to see if it's any better than regular Lua:

  • [LUAJIT] 10 arrays of 100,000 items (1m items total): ArrayRemove(): 0.005 seconds, table.remove(): 20.783 seconds (4156.6x slower than ArrayRemove()... but this LuaJIT result is actually a WORSE ratio than regular Lua, whose table.remove() was "only" 1605.6x slower than my algorithm for the same test... So if you're using LuaJIT, the performance ratio is even more in favor of my algorithm!)

Lastly, you may wonder "would table.remove() be faster if we only want to delete one item, since it's a native function?". If you use LuaJIT, the answer to that question is: No. In LuaJIT, ArrayRemove() is faster than table.remove() even for removing ONE ITEM. And who isn't using LuaJIT? With LuaJIT, all Lua code speeds up by easily around 30x compared to regular Lua. Here's the result: [mitch] elapsed time (deleting 1 items): 0.008, [table.remove] elapsed time (deleting 1 items): 0.011. Here's the pastebin for the "just delete 1-6 items" test: https://pastebin.com/wfM7cXtU (with full test results listed at the end of the file).

TL;DR: Don't use table.remove() anywhere, for any reason whatsoever!

Hope you enjoy ArrayRemove()... and have fun, everyone! :-)

Lathi answered 29/10, 2018 at 3:51 Comment(10)
Comments are not for extended discussion; this conversation has been moved to chat.Selfreproach
Beware. This is NOT a good way to iterate and remove tables. It only works for numerical sequenced arrays. Non sequenced arrays will fail as will any other type of table hash - found out the hard way, by using it. I ran this function over a sequenced array and compared with table.remove - it was identical in luajit. This was with over 7 million calls while processing a 300MB file. The claims here are a bit over the top, so be warned, I comment here so that others may not waste too much time on this. Test first.Topsyturvydom
@DavidLannan aren't non numerical sequenced arrays called just tables?Schnitzler
@Schnitzler Yes. Both are tables. I wanted to make clear the usage (as an array).Topsyturvydom
@DavidLannan Excuse me for saying this, but you wasted your own time because you've simply misunderstood what Lua ARRAYS are, so you can't complain that my function does Exactly what it's supposed to do. You say "Warning: This function ONLY works for numerically indexed Lua arrays". Well, sir, that's exactly what a Lua ARRAY is. See the Lua documentation: lua.org/pil/11.1.html (Arrays: Numerical keys ONLY, and all keys MUST be sequential (no gaps)), and lua.org/pil/2.5.html (Tables: Can use any keys, including numbers, and gaps in the numbers are allowed.)Lathi
@DavidLannan That being said, I'm completely fine with you helping to clarify this for other people, so that people who are new to Lua know that this ArrayRemove() function is meant for LUA ARRAYS, not for lua tables. :-) Tables with non-numerical keys are TABLES. Tables with numerical keys but gaps are TABLES. "Tables" with numerical keys in sequential order (no gaps!) are ARRAYS. Hope that clears things up. Otherwise check the Lua documentation I linked to in the previous comment. :-)Lathi
@DavidLannan By the way, I just noticed you saying that performance was identical in LuaJit. That's not true. Check the bottom of my post for the LuaJit table.remove() vs ArrayRemove() benchmarks. There is no situation on Earth where table.remove() will ever be faster than this function when multiple keys need to be removed, since table.remove() does 1 function call AND a FULL array re-indexing for EACH item you want to remove, while my function does the re-indexing and all removals in a single step for a MASSIVE performance boost. The function does exactly what it's supposed to.Lathi
@DavidLannan I guess I'll mention one last thing just to be extra clear about this: Numerically indexed tables with gaps (non-sequential numbers) are NOT ARRAYS and will be rejected by Lua's own built-in library functions for handling arrays too. Lua's own functions start counting at index 1, and as soon as they encounter a nil (missing) numerical key, they stop because they've reached the end of the "array". So a table with keys [1] [2] [4] is NOT an array, and Lua's own array functions will process [1] and [2] and then stop, since they've reached the end of the sequential keys!Lathi
First, nice function - I'm using it even if not on LuaJit or removing multiple items. Secondly, you have a small mistake when you say that one should change the line if (fnKeep(t, i, j)) then to if (t[i] == 'deleteme') then: it should be either if (t[i] == 'keepme') then or if (t[i] ~= 'deleteme') then since what you wrote is a negation of your fnKeep(t, i, j). Third, although not related directly to this question, do you think table.insert() is prone to similar efficiency issues, and if so, would a similar function to the one above be recommended, and how would that look like?Inmate
This is good. Same approach as best practices algorithms from performance oriented languages like c++: std::remove_if refer "possible implementation" here: en.cppreference.com/w/cpp/algorithm/remove I like the optional j parameter to the predicate too.Erythema
P
21

I'd avoid table.remove and traverse the array once setting the unwanted entries to nil then traverse the array again compacting it if necessary.

Here's the code I have in mind, using the example from Mud's answer:

local input = { 'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p' }
local remove = { f=true, g=true, j=true, n=true, o=true, p=true }

local n=#input

for i=1,n do
        if remove[input[i]] then
                input[i]=nil
        end
end

local j=0
for i=1,n do
        if input[i]~=nil then
                j=j+1
                input[j]=input[i]
        end
end
for i=j+1,n do
        input[i]=nil
end
Pomerania answered 13/9, 2012 at 0:13 Comment(4)
Can you provide any reason for avoiding table.remove, or consequences of using it?Wellheeled
@Wellheeled because it will move items several times when only one is needed. For instance if you remove every other entry in the array then a quadratic number of moves will be performed if you use table.remove.Pomerania
One can do even better: few lines in the first loop make the second one (j+1 to n) obsolete. It can save up to N steps. (Consider case when after the first loop the array contains all nils but the last element). The key is to not just write to input[j] but swap it with input[i] if i ~= jCashmere
Here's a heads up for anyone transcribing this code: You must capture the length of the array at the top (i.e. you cannot just grab it in the bottom for loop). Arrays are just tables in Lua, so they store no metadata for length. Instead, length means "check each index from 1, and the first one that returns nil tells you the length". This causes problems when you set arbitrary elements in the middle to nil in the first for loop.Obtrude
S
5

Try this function:

function ripairs(t)
    -- Try not to use break when using this function;
    -- it may cause the array to be left with empty slots
    local ci = 0
    local remove = function()
        t[ci] = nil
    end
    return function(t, i)
        --print("I", table.concat(array, ','))
        i = i+1
        ci = i
        local v = t[i]
        if v == nil then
            local rj = 0
            for ri = 1, i-1 do
                if t[ri] ~= nil then
                    rj = rj+1
                    t[rj] = t[ri]
                    --print("R", table.concat(array, ','))
                end
            end
            for ri = rj+1, i do
                t[ri] = nil
            end
            return
        end
        return i, v, remove
    end, t, ci
end

It doesn't use table.remove, so it should have O(N) complexity. You could move the remove function into the for-generator to remove the need for an upvalue, but that would mean a new closure for every element... and it isn't a practical issue.

Example usage:

function math.isprime(n)
    for i = 2, n^(1/2) do
        if (n % i) == 0 then
            return false
        end
    end
    return true
end

array = {}
for i = 1, 500 do array[i] = i+10 end
print("S", table.concat(array, ','))
for i, v, remove in ripairs(array) do
    if not math.isprime(v) then
        remove()
    end
end
print("E", table.concat(array, ','))

Be careful not to use break (or otherwise exit prematurely from the loop) as it will leave the array with nil elements.

If you want break to mean "abort" (as in, nothing is removed), you could do this:

function rtipairs(t, skip_marked)
    local ci = 0
    local tbr = {} -- "to be removed"
    local remove = function(i)
        tbr[i or ci] = true
    end
    return function(t, i)
        --print("I", table.concat(array, ','))
        local v
        repeat
            i = i+1
            v = t[i]
        until not v or not (skip_marked and tbr[i])
        ci = i
        if v == nil then
            local rj = 0
            for ri = 1, i-1 do
                if not tbr[ri] then
                    rj = rj+1
                    t[rj] = t[ri]
                    --print("R", table.concat(array, ','))
                end
            end
            for ri = rj+1, i do
                t[ri] = nil
            end
            return
        end
        return i, v, remove
    end, t, ci
end

This has the advantage of being able to cancel the entire loop with no elements being removed, as well as provide the option to skip over elements already marked as "to be removed". The disadvantage is the overhead of a new table.

I hope these are helpful to you.

Stannwood answered 15/9, 2012 at 7:11 Comment(0)
H
2

You might consider using a priority queue instead of a sorted array. A priority queue will efficiently compact itself as you remove entries in order.

For an example of a priority queue implementation, see this mailing list thread: http://lua-users.org/lists/lua-l/2007-07/msg00482.html

Heartless answered 13/9, 2012 at 12:26 Comment(0)
P
2

I recommend against using table.remove, for performance reasons (which may be more or less relevant to your particular case).

Here's what that type of loop generally looks like for me:

local mylist_size = #mylist
local i = 1
while i <= mylist_size do
    local value = mylist[i]
    if value == 123 then
        mylist[i] = mylist[mylist_size]
        mylist[mylist_size] = nil
        mylist_size = mylist_size - 1
    else
        i = i + 1
    end
end

Note This is fast BUT with two caveats:

  • It is faster if you need to remove relatively few elements. (It does practically no work for elements that should be kept).
  • It will leave the array UNSORTED. Sometimes you don't care about having a sorted array, and in that case this is a useful "shortcut".

If you want to preserve the order of the elements, or if you expect to not keep most of the elements, then look into Mitch's solution. Here is a rough comparison between mine and his. I ran it on https://www.lua.org/cgi-bin/demo and most results were similar to this:

[    srekel] elapsed time: 0.020
[     mitch] elapsed time: 0.040
[    srekel] elapsed time: 0.020
[     mitch] elapsed time: 0.040

Of course, remember that it varies depending on your particular data.

Here is the code for the test:

function test_srekel(mylist)
    local mylist_size = #mylist
    local i = 1
    while i <= mylist_size do
        local value = mylist[i]
        if value == 13 then
            mylist[i] = mylist[mylist_size]
            mylist[mylist_size] = nil
            mylist_size = mylist_size - 1
        else
            i = i + 1
        end
    end

end -- func

function test_mitch(mylist)
    local j, n = 1, #mylist;

    for i=1,n do
        local value = mylist[i]
        if value ~= 13 then
            -- Move i's kept value to j's position, if it's not already there.
            if (i ~= j) then
                mylist[j] = mylist[i];
                mylist[i] = nil;
            end
            j = j + 1; -- Increment position of where we'll place the next kept value.
        else
            mylist[i] = nil;
        end
    end
end

function build_tables()
    local tables = {}
    for i=1, 10 do
      tables[i] = {}
      for j=1, 100000 do
        tables[i][j] = j % 15373
      end
    end

    return tables
end

function time_func(func, name)
    local tables = build_tables()
    time0 = os.clock()
    for i=1, #tables do
        func(tables[i])
    end
    time1 = os.clock()
    print(string.format("[%10s] elapsed time: %.3f\n", name, time1 - time0))
end

time_func(test_srekel, "srekel")
time_func(test_mitch, "mitch")
time_func(test_srekel, "srekel")
time_func(test_mitch, "mitch")
Parlando answered 9/3, 2015 at 12:28 Comment(3)
Lua.org's demo page is not a benchmarking tool. It's slow and overloaded! I ran your benchmark code locally on LuaJIT 2.0.5. Result: $ luajit test.lua, [srekel] elapsed time: 0.004, [mitch] elapsed time: 0.004, [srekel] elapsed time: 0.003, [mitch] elapsed time: 0.004... And if I raise the number of items per array from 100k to 1 million (for j=1, 1000000 do), to cause the test to run long enough to measure any difference, I get these results: $ luajit test2.lua, [srekel] elapsed time: 0.038, [mitch] elapsed time: 0.041, [srekel] elapsed time: 0.033, [mitch] elapsed time: 0.041...Lathi
Furthermore, your algorithm does its hard work (copying data to different cells) when it actually deletes something, and very little work when it doesn't delete (just increments a loop counter). And the constant that you've used to determine WHEN to randomly place the number "13" into the array (which is the number that will be deleted), via j % 15373, only generates 7 indices that will be deleted... :-O So only 7 items are ever deleted in this test. This is mostly a test of looping from 1 to #n! Either way, nice and interesting work on an algorithm for when sorted order isn't required!Lathi
You're right, mine is faster when you have few elements to remove, and yours is faster when you want to remove many items. I ran the test locally now and tried a few different configurations. For me the cut-off point is around 20-25%, i.e. if you expect to remove about one fifth or fewer of the elements, then mine is faster. But this value is only an estimate and of course will vary from case to case.Parlando
L
2

Simple..

values = {'a', 'b', 'c', 'd', 'e', 'f'}
rem_key = {}

for i,v in pairs(values) do
if remove_value() then
table.insert(rem_key, i)
end
end

for i,v in pairs(rem_key) do
table.remove(values, v)
end
Leatherjacket answered 1/11, 2018 at 11:54 Comment(2)
While I appreciate the performance-focused answers, it is nice to see someone who ACTUALLY ANSWERED the question...Horsefaced
The answer is wrong. The indexes to be removed should be in reverse order : table.insert(rem_key, 1, i)Tellurion
C
1

You can use a functor to check for elements that need to be removed. The additional gain is that it completes in O(n), because it doesn't use table.remove

function table.iremove_if(t, f)
    local j = 0
    local i = 0
    while (i <= #f) do
        if (f(i, t[i])) then
            j = j + 1
        else
            i = i + 1
        end
        if (j > 0) then
            local ij = i + j
            if (ij > #f) then
                t[i] = nil
            else
                t[i] = t[ij]
            end
        end
    end
    return j > 0 and j or nil -- The number of deleted items, nil if 0
end

Usage:

table.iremove_if(myList, function(i,v) return v.name == name end)

In your case:

table.iremove_if(timestampedEvents, function(_,stamp)
    if (stamp[1] <= timestamp) then
        processEventData(stamp[2])
        return true
    end
end)
Capstone answered 28/7, 2017 at 11:14 Comment(0)
D
1

This is basically restating the other solutions in non-functional style; I find this much easier to follow (and harder to get wrong):

for i=#array,1,-1 do
  local element=array[i]
  local remove = false
  -- your code here
  if remove then
    array[i] = array[#array]
    array[#array] = nil
  end
end
Diminuendo answered 27/5, 2020 at 11:41 Comment(1)
This answer does not work! It seemed like a simple solution at first, however have a look at the following example: with the array {"A", "B", "C", "D"} while trying to remove element A and D, this function incorrectly outputs {"C", "B"} instead of {"B", "C"}. Took me a while to figure out where the bug was hidden...Tobin
U
1

Efficiency! Even more! )

Regarding Mitch's variant. It has some waste assignments to nil, here is optimized version with the same idea:

function ArrayRemove(t, fnKeep)
    local j, n = 1, #t;
    for i=1,n do
        if (fnKeep(t, i, j)) then
            -- Move i's kept value to j's position, if it's not already there.
            if (i ~= j) then
                t[j] = t[i];
            end
            j = j + 1; -- Increment position of where we'll place the next kept value.
        end
    end
    table.move(t,n+1,n+n-j+1,j);
    --for i=j,n do t[i]=nil end
    return t;
end

And here is even more optimized version with block moving

For larger arrays and larger keeped blocks

function ArrayRemove(t, fnKeep)
    local i, j, n = 1, 1, #t;
    while i <= n do
        if (fnKeep(t, i, j)) then
            local k = i
            repeat
                i = i + 1;
            until i>n or not fnKeep(t, i, j+i-k)
            --if (k ~= j) then
                table.move(t,k,i-1,j);
            --end
            j = j + i - k;
        end
        i = i + 1;
    end
    table.move(t,n+1,n+n-j+1,j);
    return t;
end

if (k ~= j) is not needed as it is executed many times but "true" after first remove. I think table.move() handles index checks anyway.
table.move(t,n+1,n+n-j+1,j) is equivalent to "for i=j,n do t[i]=nil end".
I'm new to lua and don't know if where is efficient value replication function. Here we would replicate nil n-j+1 times.

And regarding table.remove(). I think it should utilize table.move() that moves elements in one operation. Kind of memcpy in C. So maybe it's not so bad afterall.
@MitchMcMabers, can you update your benchmarks? Did you use lua >= 5.3?

Unrestraint answered 5/6, 2021 at 7:35 Comment(0)
W
0

It occurs to me that—for my special case, where I only ever shift entries from the front of the queue—I can do this far more simply via:

function processEventsBefore( timestamp )
  while timestampedEvents[1] and timestampedEvents[1][1] <= timestamp do
    processEventData( timestampedEvents[1][2] )
    table.remove( timestampedEvents, 1 )
  end
end

However, I'll not accept this as the answer because it does not handle the general case of iterating over an array and removing random items from the middle while continuing to iterate.

Wellheeled answered 12/9, 2012 at 19:22 Comment(2)
If your question really is about "the general case of iterating over an array and removing random items from the middle while continuing to iterate," then you should modify your question to be about that general case. Because your question as stated is really about your specific case, which you just resolved.Boulware
@NicolBolas I've edited the question to be clear that it's the general case I'm looking for an answer to.Wellheeled
A
0

First, definitely read @MitchMcCabers’s post detailing the evils of table.remove().

Now I’m no lua whiz but I tried to combine his approach with @MartinRudat’s, using an assist from an array-detection approach modified from @PiFace’s answer here.

The result, according to my tests, successfully removes an element from either a key-value table or an array.

I hope it’s right, it works for me so far!

--helper function needed for remove(...)
--I’m not super able to explain it, check the link above
function isarray(tableT)
    for k, v in pairs(tableT) do
        if tonumber(k) ~= nil and k ~= #tableT then
            if tableT[k+1] ~= k+1 then
                return false
            end
        end
    end
    return #tableT > 0 and next(tableT, #tableT) == nil
 end

function remove(targetTable, removeMe)
--check if this is an array
if isarray(targetTable) then
    --flag for when a table needs to squish in to fill cleared space
    local shouldMoveDown = false
    --iterate over table in order
    for i = 1, #targetTable do
        --check if the value is found
        if targetTable[i] == removeMe then
            --if so, set flag to start collapsing the table to write over it
            shouldMoveDown = true
        end
        --if collapsing needs to happen...
        if shouldMoveDown then
            --check if we're not at the end
            if i ~= #targetTable then
                --if not, copy the next value over this one
                targetTable[i] = targetTable[i+1]
            else
                --if so, delete the last value
                targetTable[#targetTable] = nil
            end 
        end
    end
else
    --loop over elements
    for k, v in pairs(targetTable) do
        --check for thing to remove
        if (v == removeMe) then
            --if found, nil it
            targetTable[k] = nil
            break
        end
    end
end
return targetTable, removeMe;

end

Athapaskan answered 23/3, 2021 at 17:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.