How can I merge two sequences in clojure?
Asked Answered
B

5

33

What is an idiomatic way to merge (or retrieve the union of) two lists (or sequences) in Clojure?

(merge l1 l2)

doesn't seem to be the solution:

a=> (merge '(1 2 3) '(2 3 4))
((2 3 4) 1 2 3)
Burks answered 30/9, 2012 at 3:15 Comment(3)
how do you define "merge"? e.g. do duplicates exist, and if so how are duplicates handled? also do you know if the lists are already sorted?Stillwell
FYI. The function name, merge, has already been taken by clojure.core. To avoid confusion, you may choose another name for your merge function. See clojuredocs.org/clojure_core/clojure.core/mergeTipple
The poster was in fact using clojure.core/merge, but not on hash-maps or otherwise associative data, and said function has undefined behavior in that context.Directory
C
30

I think andih's solution works great. Here is an alternate way because hey why not. It uses concat and distinct:

user> (distinct (concat '(1 2 3) '(2 3 4)))
=> (1 2 3 4)
Cooky answered 30/9, 2012 at 6:11 Comment(2)
I would just advise against concat for performance reasons, as it is surprisingly slow. See my answer below for further discussion.Directory
@Directory thanks for the comment (and your answer below). I had not realized that concat has performance problem.Cooky
D
18

If what you want is actually distinct unsorted data (sets), you should be using Clojure's set data structure instead of vectors or lists. And as andih suggested indirectly, there is a core library for set operations: http://clojure.github.com/clojure/clojure.set-api.html

(require '[clojure.set :refer [union]])

(union #{1 2 3} #{3 4 5})
=> #{1 2 3 4 5}

If sets are for whatever reason not what you want, then read on. Careful with concat when you have a significant amount of data in your sequences, and consider using into which is much better optimized as a vector merging algorithm. I don't know why concat is not implemented using into (or better yet-- why does concat even exist? BTW while into is significantly faster than concat, it is still way way slower than conj. Bagwell's RRB trees, compatible with both Clojure and Scala, will solve this problem, but are not yet implemented for Clojure).

To rephrase Omri's non-set solution in terms of 'into':

(distinct (into [1 2 3] [3 4 5]))
=> (1 2 3 4 5)
Directory answered 30/9, 2012 at 21:26 Comment(6)
what's so bad about concat? It's constant-time because it's lazy, not the linear-time implementation you get in strict languages.Barrybarrymore
also, how can into be slower than conj? it's implemented using conj. (into foo bar) is the same as (reduce conj foo bar), except that it will use transients if they are available.Barrybarrymore
No, I said conj is faster! (but if you need to concatenate vectors and not merely conj one onto the other, these are different needs to have.)Directory
Philip: into uses transients, very similar to the transients for vector concatenation example in Joy of Clojure in fact, which is a good place to look for more discussion on this.Directory
@PhilipPotter surely concat cannot be constant time for all inputs? Has to be linear time as size of input lenghts grow?Roughandtumble
@pete sorry for late reply. concat is lazy, so yes it can be constant time because it doesn't really do much work. it returns something which when consumed as a seq, first fetches items from the first arg, then from the second when the first is exhausted.Barrybarrymore
S
15

One way to get the union of two lists is to use union

Clojure> (into #{} (clojure.set/union '(1,2,3) '(3,4,5)))
#{1 2 3 4 5}

or if you want to get a list

(into '() (into #{} (clojure.set/union '(1,2,3) '(3,4,5))))
(5 4 3 2 1)
Saurian answered 30/9, 2012 at 4:1 Comment(2)
(-> #{} (into [1 2 3]) (into [3 4 5]) seq)Tipple
You aren't really achieving anything with union operating on list arguments. All it does is (reduce conj '(1,2,3) '(3,4,5)). clojure.set functions are designed to work with set arguments.Affection
D
10

If you don't mind duplicates, you can try concat :

(concat '(1 2 3 ) '(4 5 6 1) '(2 3)) 
;;==> (1 2 3 4 5 6 1 2 3) 
Doodlebug answered 18/3, 2015 at 5:58 Comment(0)
T
2

One option is flatten:

(def colls '((1 2 3) (2 3 4)))
(flatten colls) ;; => (1 2 3 2 3 4)
(distinct (flatten colls)) ;; => (1 2 3 4)

One thing to be aware of is that it will flatten deeply nested collections:

(flatten [[1 2 [3 4 5]] [1 [2 [3 4]]]]) ;; => (1 2 3 4 5 1 2 3 4)

But works well for maps:

(flatten [[{} {} {}] [{} {} {}]]) ;; => ({} {} {} {} {} {})
Taeniacide answered 28/3, 2017 at 9:45 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.