How to retrieve a list of ets keys without scanning entire table?
Asked Answered
J

3

6

I'm using ets via elixir as a simple in-memory persistence layer to store and retrieve keys and also for the occasional foldl which involves reducing many duplicate keys that have different values. I am using the bag option.

Is there a simple, perhaps O(1) way to retrieve a list of just the current keys without having to do a more involved table traversal or match or foldl?

Erlang or Elixir syntax responses welcome.

:ets.new(:cache, [:bag, :named_table, :protected])

I have a static map of atom keys indexed by integer I am using to assist with insertions. But not all the keys are used..

chunk_key_map = %{2 => :chunk_key_2, ..... 28 => :chunk_key_28}

If there's no quick way, I am aware I can do an ets:lookup trying each of my static atom key values and testing for != [] and generating my own list, but wanted to see if ets supports such a feature.

Thanks

Jarvisjary answered 1/2, 2016 at 3:29 Comment(0)
J
2

So I didn't find the ets technique, but rather implemented the key list retrieve code in constant time in elixir as my key map is static.

    list = Enum.reduce(2..28, [], fn head, acc -> 
            case :ets.lookup(:cache, Map.get(chunk_key_map, head)) do
                [] -> acc
                _ -> [acc, head]
            end
        end)

    List.flatten(list)

UPDATE: Based on the replies, I took Hamidreza's ets traversal logic and wrapped it into an Elixir Stream using Stream.resource/3.

defp get_ets_keys_lazy(table_name) when is_atom(table_name) do
    eot = :"$end_of_table"

    Stream.resource(
        fn -> [] end,

        fn acc ->
            case acc do
                [] -> 
                    case :ets.first(table_name) do
                        ^eot -> {:halt, acc}
                        first_key -> {[first_key], first_key}                       
                    end

                acc -> 
                    case :ets.next(table_name, acc) do  
                        ^eot -> {:halt, acc}
                        next_key -> {[next_key], next_key}
                    end
            end
        end,

        fn _acc -> :ok end
    )
end

Then I ran the stream through a pipeline

get_ets_keys_lazy(table_name) 
    |> Stream.map(lambda1) 
    |> Stream.each(lambda2)
    |> Stream.run
Jarvisjary answered 1/2, 2016 at 5:50 Comment(3)
The problem with this code is that you end up copying all the elements in the ETS table into your process. Remember that ETS tables are kept "outside" of all process so all access to ETS tables is through copying. One major reason for match, match_object and select is that it allows you to be more specific in what elements are copied, testing can be done while the element data is still "in the table".Provincial
Thank you Robert. I am assuming you are only referring to the ets.lookup code segment. Not the ets.first,next snippet as well above? This is very helpful. What then is the appropriate match or select clause for key retrieval?Jarvisjary
Yes, using first/next means that you will access every element, though it does mean that you maybe able to only copy the keys.Provincial
T
5

Thanks, that put me on the right track :)

Same thing but passing the previous key as accumulator:

def key_stream(table_name) do
  Stream.resource(
    fn -> :ets.first(table_name) end,
    fn :"$end_of_table" -> {:halt, nil}
       previous_key -> {[previous_key], :ets.next(table_name, previous_key)} end,
    fn _ -> :ok end)
end
Tabbi answered 8/5, 2017 at 8:17 Comment(0)
J
2

So I didn't find the ets technique, but rather implemented the key list retrieve code in constant time in elixir as my key map is static.

    list = Enum.reduce(2..28, [], fn head, acc -> 
            case :ets.lookup(:cache, Map.get(chunk_key_map, head)) do
                [] -> acc
                _ -> [acc, head]
            end
        end)

    List.flatten(list)

UPDATE: Based on the replies, I took Hamidreza's ets traversal logic and wrapped it into an Elixir Stream using Stream.resource/3.

defp get_ets_keys_lazy(table_name) when is_atom(table_name) do
    eot = :"$end_of_table"

    Stream.resource(
        fn -> [] end,

        fn acc ->
            case acc do
                [] -> 
                    case :ets.first(table_name) do
                        ^eot -> {:halt, acc}
                        first_key -> {[first_key], first_key}                       
                    end

                acc -> 
                    case :ets.next(table_name, acc) do  
                        ^eot -> {:halt, acc}
                        next_key -> {[next_key], next_key}
                    end
            end
        end,

        fn _acc -> :ok end
    )
end

Then I ran the stream through a pipeline

get_ets_keys_lazy(table_name) 
    |> Stream.map(lambda1) 
    |> Stream.each(lambda2)
    |> Stream.run
Jarvisjary answered 1/2, 2016 at 5:50 Comment(3)
The problem with this code is that you end up copying all the elements in the ETS table into your process. Remember that ETS tables are kept "outside" of all process so all access to ETS tables is through copying. One major reason for match, match_object and select is that it allows you to be more specific in what elements are copied, testing can be done while the element data is still "in the table".Provincial
Thank you Robert. I am assuming you are only referring to the ets.lookup code segment. Not the ets.first,next snippet as well above? This is very helpful. What then is the appropriate match or select clause for key retrieval?Jarvisjary
Yes, using first/next means that you will access every element, though it does mean that you maybe able to only copy the keys.Provincial
T
2

For getting a list of keys in ets without touching their values you can use the combination of ets:first/1 and ets:next/2 functions this way:

-export([keys/1]).

keys(TableName) ->
    FirstKey = ets:first(TableName),
        keys(TableName, FirstKey, [FirstKey]).

keys(_TableName, '$end_of_table', ['$end_of_table'|Acc]) ->
    Acc;
keys(TableName, CurrentKey, Acc) ->
    NextKey = ets:next(TableName, CurrentKey),
    keys(TableName, NextKey, [NextKey|Acc]).

The exported API is keys/1. It gets the table name, fetches the first key of it, initiates an accumulator as state and internally calls keys/2 which fetches other keys and accumulating them in a recursive manner.

Note that bag tables type do not have order, so if your table type is bag the return value of keys/1 wouldn't be ordered.

Tarry answered 1/2, 2016 at 8:0 Comment(3)
Since bags are allowed to have duplicate keys but with different values, will this return all the duplicate keys?Jarvisjary
I verified that it does return only the unique keysJarvisjary
@Heron28 Actually there is no duplicate key in bag and duplicate_bag tables. The bag tables can have many unique values per key and duplicate_bag tables can have many duplicated values per key.Tarry

© 2022 - 2024 — McMap. All rights reserved.