How to remove a lua table entry by its key?
Asked Answered
T

2

131

I have a lua table that I use as a hashmap, ie with string keys :

local map = { foo = 1, bar = 2 }

I would like to "pop" an element of this table identified by its key. There is a table.remove() method, but it only takes the index of the element to remove (ie a number) and not a generic key. I would like to be able to do table.remove(map, 'foo') and here is how I implemented it :

function table.removekey(table, key)
    local element = table[key]
    table[key] = nil
    return element
end

Is there a better way to do that ?

Tarnation answered 18/11, 2009 at 20:51 Comment(5)
+1 to the very useful question because you gave the "answer" i needed; even if there isn't a "better" way.Iorio
Is it safe todo this in a pairs operator context?Glosseme
@Glosseme depends on what you mean by safe, but you can say t = {1,2,3,4}; for k, v in pairs(t) do t[k] = nil; print(k, v); end and it will work fine.Rebatement
So it does not corrupt the order of execution, leave out elements or execute elements twice ? Sorry to be a stickler about this. Pairs was not really discussed in this thread: #12395341. Found the answer: lua-users.org/lists/lua-l/2012-07/msg00543.html Everything is save, except adding elements to the table in pairs.Glosseme
an important thing to take care is that if your key is a integer, that you want to use as a string, it should be table[tostring(key)] = nilMoisten
M
117

No, setting the key's value to nil is the accepted way of removing an item in the hashmap portion of a table. What you're doing is standard. However, I'd recommend not overriding table.remove() - for the array portion of a table, the default table.remove() functionality includes renumbering the indices, which your override would not do. If you do want to add your function to the table function set, then I'd probably name it something like table.removekey() or some such.

Moniliform answered 18/11, 2009 at 20:53 Comment(10)
Thanks for the feedback about the deletion. About the name of the function, this was mainly for making my point clear. I usually don't override standard functions. I will definitely not use it under that name (table.removekey() would my best choice, too).Tarnation
If you're only using it within a single block, you be even better off performance-wise by simply making it a local function instead (saves the overhead of a global lookup for each call). I quite often import table.insert and table.remove into the local namespace if I'm using them frequently, often as something like tinsert() and tremove().Moniliform
Thanks for the performance tip. I'll try to import the functions I use often into the local namespace to see if this makes a big difference.Tarnation
Note that you should use table.remove(indez, position) instead of setting it to nil so the indexes will be decremented and therefore the elements repositioned.Accompany
@HakanBoztepe That is not relevant for the hashmap portion of the table (string keys).Moniliform
@Amber: The array-part is an implementation-detail, and table.remove might have to (and actually does) re-number even if the key is not in the array part. What you wanted is the sequence part, which is actually well-defined and independent of implementation-details. The sequence-part can be far greater than the array-part, or vice versa.Stenotype
@Stenotype To the best of my understanding, there's no such thing as a "sequence part" of a table either; a table just can be a sequence or not (with some particular content). I elaborated on that in my answer to this question.Exertion
@Exertion A sequence is an uninterrupted run of non-null elements starting at index one with step 1, until the first null. Any other elements are not part of the sequence, or if you want to emphasize that it might not be all there is, the sequence part. As an implementation-detail tables are generally optimized for having a big (or only consisting of) that sequence part, though the optimization tolerates a few holes.Stenotype
@Stenotype Close, but there are some gotchas, like the fact that {1, 2, nil, 4} is not a sequence at all. I have a lengthy answer on that. But yeah, it's about corner cases.Exertion
{1, 2} is the sequence-part, [4]=4 is the rest. A true sequence has no rest.Stenotype
E
15

TLDR

(because you're only damn trying to remove a thing from a map, how hard can that be)

Setting a key to nil (e.g. t[k] = nil) is not only accepted and standard, but it's the only way of removing the entry from the table's "content" in general. And that makes sense. Also, array portion and hashmap portion are implementation details and shouldn't have ever be mentioned in this Q/A.

Understanding Lua's tables

(and why you can't remove from an infinite)

Lua tables don't literally have concept of "removing an entry from a table" and it hardly has a concept of "inserting an entry to a table". This is different from many other data structures from different programming languages.

In Lua, tables are modelling a perfect associative structure of infinite size.

Tables in Lua are inherently mutable and there's only one fundamental way to construct a new table: using the {} expression. Constructing a table with initial content (e.g. t = {10, 20, ["foo"] = 123, 30} or anything alike) is actually a syntactic sugar equivalent to first constructing a new table (e.g. t = {}} and then setting the "initial" entries one by one (e.g. t[1] = 10, t[2] = 20, t["foo"] = 123, t[3] = 30) . The details of how the de-sugaring works doesn't help with understanding the discussed matter, so I will be avoiding the table construction sugar in this answer.

In Lua, a freshly-constructed table initially associates all possible values with nil. That means that for a table t = {}, t[2] will evaluate to nil, t[100] will evaluate to nil, t["foo"] will evaluate to nil, t[{}] will evaluate to nil, etc.

After construction, you can mutate the table by setting a value at some key. Then that key will be now associated with that value. For example, after setting t["foo"] = "bar", the key "foo" will now be associated with the value "bar". In consequence, t["foo"] will now evaluate to "bar".

Setting nil at some key will associate that key to nil. For example, after setting t["foo"] = nil, "foo" will (again) be associated with nil. In consequence, t["foo"] will (again) evaluate to nil.

While any key can be associated to nil (and initially all possible keys are), such entries (key/value pairs) are not considered a part of the table (i.e. aren't considered part of the table content).

Functions pairs and ipairs (and multiple others) operate on table's content, i.e. the of associations in which the value isn't nil. The number of such associations is always finite.

Having everything said above in mind, associating a key with nil will probably do everything you could expect when saying "removing an entry from a table", because t[k] will evaluate to nil (like it did after constructing the table) and functions like pairs and ipairs will ignore such entries, as entries (associations) with value nil aren't considered "a part of the table".

Sequences

(if tables weren't already tricky)

In this answer, I'm talking about tables in general, i.e. without any assumption about their keys. In Lua, tables with a particular set of keys can be called a sequence, and you can use table.remove to remove an integer key from such table. But, first, this function is effectively undefined for non-sequences (i.e. tables in general) and, second, there's no reason to assume that it's more than a util, i.e. something that could be directly implemented in Lua using primitive operations.

Which tables are or aren't a sequence is another hairy topic and I won't get into details here.

Referencing the reference

(I really didn't make up all that)

Everything said above is based on the official language reference manual. The interesting parts are mostly chapter 2.1 – Values and Types...

Tables can be heterogeneous; that is, they can contain values of all types (except nil). Any key with value nil is not considered part of the table. Conversely, any key that is not part of a table has an associated value nil.

This part is not worded perfectly. First, I find the phrase "tables can be heterogeneous" confusing. It's the only use of this term in the reference and the "can be" part makes it non-obvious whether "being heterogeneous" is a possible property of a table, or whether it tables are defined that way. The second sentence make the first explanation more reasonable, because if "any key with value nil is not considered part of the table", then it means that "a key with value nil" is not a contradiction. Also, the specification of the rawset function, which (indirectly) gives semantics to the t[k] = v syntax, in the 6.1 – Basic Functions chapter says...

Sets the real value of table[index] to value, without invoking any metamethod. table must be a table, index any value different from nil and NaN, and value any Lua value.

As nil values are not special-cased here, that means that you can set t[k] to nil. The only way to understand that is to accept that from now on, that key will be "a key with value nil", and in consequence "will not be considered part of the table" (pairs will ignore it, etc.), and as "any key that is not part of a table has an associated value nil", t[k] will evaluate to nil.

The whole reference also doesn't mention "removing" a key or an entry from tables in any other place.

Another perspective on tables

(if you hate infinities)

While I personally find the perspective from the reference elegant, I also understand that the fact that it's different from other popular models might make it more difficult to reason about.

I believe that the following perspective is effectively equivalent to the previous one.

You can think that...

  • {} returns an empty table.
  • t[k] evaluates to v if t contains key k, and nil otherwise
  • Setting t[k] = v inserts a new entry (k, v) to the table if it doesn't contain key k, updates such entry if t already contains key k, and finally, as a special case, removes the entry for the key k if v is nil
  • The content of the table (i.e. what's considered "a part of the table") is the set of all entries from the table

In this model, tables aren't capable of "containing" nil values.

This is not how the language reference defines things, but to the best of my understanding, such model is observably equivalent.

Don't talk implementation details

(unless you're sure that that's what you mean)

The so-called "hashmap portion" of the table (which supplements the so-called "array portion" of the table) are implementation details and talking about them, unless we discuss performance or the explanation of specific undefined or implementation-defined behaviors, is in my opinion confusing in the best case and plain wrong in the worst.

For example, in a table constructed like this... t = {}, t[1] = 10, t[2] = 20, t[3] = 30, the array portion will (probably!) be [10, 20, 30] and setting t[2] = nil will "remove" the entry (2, 20) "from the array part", possibly also resizing it or moving 3 -> 30 to the hashmap part. I'm not really sure. I'm just saying this to prove that discussing implementation details is not what we want to do here.

Exertion answered 22/8, 2022 at 12:38 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.