How can I get the nested keys of a map in clojure?
Asked Answered
K

12

10

if my structure is

{ :a :A
  :b :B
  :c {
       :d :D
     }
  :e {
       :f {
            :g :G
            :h :H
          }
     }
}

I would like to get a function called keys-in that returns something like:

[[:a] [:b] [:c :d] [:e :f :g] [:e :f :h]]

so then I can do something like:

(not-any? nil? (map #(get-in my-other-map %1) (keys-in my-map)))

So I can be sure that my-other-map has the same keys that my-map

Kieserite answered 14/2, 2014 at 0:51 Comment(2)
Really good answers here.Kieserite
In an earlier comment, I reported that several of the functions given here are equally fast on small embedded maps, according to Criterium. I now think that my testing method was flawed, and I've deleted that comment.Magnetics
H
16
(defn keys-in [m]
  (if (map? m)
    (vec 
     (mapcat (fn [[k v]]
               (let [sub (keys-in v)
                     nested (map #(into [k] %) (filter (comp not empty?) sub))]
                 (if (seq nested)
                   nested
                   [[k]])))
             m))
    []))

;; tests
user=> (keys-in nil)
[]
user=> (keys-in {})
[]
user=> (keys-in {:a 1 :b 2}))
[[:a] [:b]]
user=> (keys-in {:a {:b {:c 1}}})
[[:a :b :c]]
user=> (keys-in {:a {:b {:c 1}} :d {:e {:f 2}}})
[[:a :b :c] [:d :e :f]]
Hedva answered 14/2, 2014 at 2:35 Comment(2)
Out of curiosity, how can we change this so that keys that have no nested maps return the key instead of a vector with the key inside? For example, (keys-in {:a 1 :b {:c 3 :d 4}}) => [:a [:b :c] [:b :d]]Upbear
Could this be expanded to have the keys of an array too ?Musso
T
9
(defn keys-in [m]
  (if (or (not (map? m))
          (empty? m))
    '(())
    (for [[k v] m
          subkey (keys-in v)]
      (cons k subkey))))
Temblor answered 14/2, 2014 at 2:19 Comment(1)
@AlexMiller That's not exactly wrong - (get-in {} '()) returns {}, because the empty keyseq is the one and only valid keyseq in {}. But I suppose it is inconsistent: () should either be present as an answer for all maps, or for none.Temblor
S
5

If you don't need a lazy result and just want to be fast, try using reduce-kv.

(defn keypaths
  ([m] (keypaths [] m ()))
  ([prev m result]
   (reduce-kv (fn [res k v] (if (map? v)
                              (keypaths (conj prev k) v res)
                              (conj res (conj prev k))))
              result
              m)))

If you also want to support vector indices (as with get-in or update-in), test with associative? instead of map?. If you want intermediate paths, you can conj those on too. Here's a variant:

(defn kvpaths-all2
  ([m] (kvpaths-all2 [] m ()))
  ([prev m result]
   (reduce-kv (fn [res k v] (if (associative? v)
                              (let [kp (conj prev k)]
                                (kvpaths-all2 kp v (conj res kp)))
                              (conj res (conj prev k))))
              result
              m)))
Skill answered 18/7, 2016 at 18:35 Comment(2)
The original question asked for intermediates, the variant does this, thanks!Semiology
I revised my old answer to use an accumulator, which avoids some copying. This makes it significantly faster.Skill
I
4

Obligatory zippers version

(require '[clojure.zip :as z])

(defn keys-in [m] 
  (letfn [(branch? [[path m]] (map? m)) 
          (children [[path m]] (for [[k v] m] [(conj path k) v]))] 
    (if (empty? m) 
      []
      (loop [t (z/zipper branch? children nil [[] m]), paths []] 
        (cond (z/end? t) paths 
              (z/branch? t) (recur (z/next t), paths) 
              :leaf (recur (z/next t), (conj paths (first (z/node t)))))))))
Isabelisabelita answered 14/2, 2014 at 3:22 Comment(0)
N
2

You can build this with clojure.zip or tree-seq fairly easily though I strongly prefer the prismatic.schema library for verifying the structure of nested maps

user> (def my-data-format                                 
  {:a Keyword                                             
   :b Keyword                                             
   :c {:d Keyword}                                        
   :e {:f {:g Keyword                                     
           :h Keyword}}})                                 
#'user/my-data-format                                     
user> (def some-data                                      
         {:a :A                                            
          :b :B                                            
          :c {:d :D}                                       
          :e {:f {:g :G                                    
                  :h :G}}})                                
#'user/some-data                                          
user> (schema/validate my-data-format some-data)          
{:a :A, :c {:d :D}, :b :B, :e {:f {:g :G, :h :G}}}
user> (def some-wrong-data
        {:a :A
         :b :B
         :c {:wrong :D}
         :e {:f {:g :G
                 :h :G}}})
 #'user/some-wrong-data             

 user> (schema/validate my-data-format some-wrong-data)  

ExceptionInfo Value does not match schema: 
{:c {:d missing-required-key, 
     :wrong disallowed-key}}  
schema.core/validate (core.clj:132)
Nikolia answered 14/2, 2014 at 2:11 Comment(3)
Zippers and tree-seq both seem like pretty bad solutions to this problem. I wonder how you plan to do it.Temblor
user> (filter seq (map #(if (map? %) (keys %)) (tree-seq map? vals some-data))) ((:a :c :b :e) (:d) (:f) (:g :h)) prints the keys breadth first and makes a fine equality function in one line for these purposes. Though all such solutions are less favorable for data validation.Nikolia
@Temblor I've obliged with a zippers solution. It ain't pretty, but it ain't bad per se if you are paranoid about consuming stack for a really nested map.Isabelisabelita
V
2

Got a similar question, wasn't satisfied by current solutions:

"Naive" recursive approach

(require '[clojure.set :as set])

(defn all-paths
  ([m current]
   ;; base case: map empty or not a map
   (if (or (not (map? m)) (empty? m))
     #{current}
   ;; else: recursive call for every (key, value) in the map
     (apply set/union #{current}
            (map (fn [[k v]]
                   (all-paths v (conj current k)))
                 m))))
  ([m]
   (-> m (all-paths []) (disj []))))


(all-paths {:a 1
            :b 2
            :c {:ca 3
                :cb {:cba 4
                     :cbb 5}}
            :d {:da 6
                :db 7}})
=> #{[:a] [:b] [:c] [:d] [:c :ca] [:c :cb] [:d :da] [:d :db] [:c :cb :cba] [:c :cb :cbb]}
Villanelle answered 10/3, 2016 at 21:18 Comment(1)
Naive or not, this solution that actually provides a key for every keypath, the solutions above miss the [:c] [:d] key paths.Semiology
M
2

Here are solutions (without intermediate paths) using Specter. They're by Nathan Marz, Specter's author, from a conversation on the Specter Slack channel (with his permission). I claim no credit for these definitions.

Simple version:

(defn keys-in [m]
  (let [nav (recursive-path [] p
              (if-path map?
                [ALL (collect-one FIRST) LAST p]
                STAY))]
    (map butlast (select nav m))))

More efficient version:

(defn keys-in [m]
  (let [nav (recursive-path [] p
              (if-path map?
                [ALL
                 (if-path [LAST map?]
                  [(collect-one FIRST) LAST p]
                  FIRST)]))]
    (select nav m)))

My informal explanation of what's happening in these definitions:

In the simple version, since the top-level argument is a map, if-path map? passes it to the first collection of navigators in brackets. These begin with ALL, which says here to do the rest for each element in the map. Then for each MapEntry in the map, (collect-one FIRST) says to add its first element (key) to the result of passing its last element (val) to if-path again. p was bound by recursive-path to be a reference to that same recursive-path expression. By this process we eventually get to a non-map. Return it and stop processing on that branch; that's what STAY means. However, this last thing returned is not one of the keys; it's the terminal val. So we end up with the leaf vals in each sequence. To strip them out, map butlast over the entire result.

The second version avoids this last step by only recursing into the val in the MapEntry if that val is itself a map. That's what the inner if-path does: [LAST map?] gets the last element, i.e. the val of the current MapEntry generated by ALL, and passes it to map?.


I used Criterium to test all of the key path functions on this page that don't return intermediate paths, plus one by noisesmith that's part of an answer to another question. For a 3-level, 3 keys per level map and for a 6-level, 6 keys per level map, miner49r's version and the second, faster Specter version have similar speeds, and are much faster than any of the other versions.

Timings on a 3-level, 3 keys per level (27 paths) map, in order:

  • miner49r's: 29.235649 µs
  • N. Marz's second Specter: 30.590085 µs
  • N. Marz's first Specter: 62.840230 µs
  • amalloy's: 75.740468 µs
  • noisesmith's (from the other question): 87.693425 µs
  • AWebb's: 162.281035 µs
  • AlexMiller's (without vec): 243.756275 µs

Timings on a 6-level, 6 keys per level (6^6 = 46656 paths) map, in order:

  • N. Marz's second Specter: 34.435956 ms
  • miner49r's: 37.897345 ms
  • N. Marz's first Specter: 119.600975 ms
  • noisesmith's: 180.448860 ms
  • amalloy's: 191.718783 ms
  • AWebb's: 193.172784 ms
  • AlexMiller's (without vec): 839.266448 ms

All calls were wrapped in doall so that lazy results would be realized. Since I was doalling them, I took out vec wrapper in Alex Miller's definition. Full details about timings can be found here. The test code is here.

(The simple Specter version is slower than the faster version because of the use of map butlast to strip out the leaf values. If this is step is removed, the simple Specter definition's times are similar to those of the second definition.)

Magnetics answered 17/3, 2017 at 20:8 Comment(0)
K
1

This answer of mine is just to illustrate how NOT to do it since it is still procedural.

(defn keys-in [data] (genkeys [] data))

(defn genkeys [parent data]
  (let [mylist (transient [])]
    (doseq [k (keys data)]
      (do
        (if ( = (class (k data)) clojure.lang.PersistentHashMap )
          (#(reduce conj! %1 %2) mylist (genkeys (conj parent  k ) (k data) ))
          (conj! mylist  (conj parent  k ) )
          )))
    (persistent! mylist)))
Kieserite answered 14/2, 2014 at 2:53 Comment(1)
It also doesn't produce the answer you requested, btw: (keys-in dra) ;=> [[:a] [:b] [:c] [:e]], where dra is defined as your original data structure example: (def dra {:a :A :b :B :c {:d :D} :e {:f {:g :G :h :H}}})Magnetics
C
1

Here is an implementation which returns all keys (not just the terminal keys) based on lazy-seq:

(defn keys-in
  ([m] (if (map? m) (keys-in (seq m) [])))
  ([es c]
   (lazy-seq
    (when-let [e (first es)]
      (let [c* (conj c (key e))]
        (cons c* (concat (if (map? (val e)) (keys-in (seq (val e)) c*))
                         (keys-in (rest es) c))))))))
Coarsegrained answered 16/11, 2015 at 5:36 Comment(0)
A
0

Working on something similar for a personal project and this is my naive implementation:

(defn keys-in
  [m parent-keys]
  (mapcat (fn [[k v]]
        (if (map? v)
          (keys-in v (conj parent-keys k))
          (vector (conj parent-keys k v))))
      m))

Use it from the repl:

(keys-in <your-map> [])

Fancy way:

(map (comp vec drop-last) (keys-in <your-map> []))
Agglomeration answered 26/1, 2018 at 22:24 Comment(0)
L
0

Here is a generic solution for known collection types, including maps (look for "Key Paths" on the Readme page for usage examples).

It handles mixed types as well (sequential types, maps and sets), and the API (protocols) can be extended to other types.

Lowell answered 15/7, 2020 at 13:44 Comment(0)
K
0

Here's a tree-seq-based version that's lazy and doesn't blow the stack.

(defn map-paths [m]
  (letfn [(branch? [[path v]] ((some-fn map? nil?) v))
          (children [[path m]] (for [[k v] m]
                                 [(conj path k) v]))]
    (remove branch?
            (tree-seq branch? children [[] m]))))

It produces a lazy sequence of path-value pairs (not just paths). Maps and nil are the only things considered a branch.

Note that branches can be empty: (map-paths {:foo :bar}) is the same as (map-paths {:foo :bar, :baz {}}). This preserves the property that (map-paths {:k foo}) is the same as (map-paths foo) with :k prepended to all of the paths.

(Kudos to @a-webb for demonstrating the applicability of zippers, which inspired this answer.)

Kristianson answered 8/2 at 3:26 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.