Clojure sub-sequence position in sequence
Asked Answered
R

2

6

Does Clojure provide any builtin way to find the position of a sub-sequence in a given sequence?

Rinaldo answered 5/3, 2013 at 12:33 Comment(0)
D
7

Clojure provides a builtin way for easy Java Interop.

(java.util.Collections/indexOfSubList '(a b c 5 6 :foo g h) '(5 6 :foo))
;=> 3
Denison answered 5/3, 2013 at 13:5 Comment(5)
Thank you for your answer. That's what I'll use in the end, but I'm usually trying to avoid explicitly calling Java Interop from 'business' code, as I find it to be a little verbose. Thank you nevertheless.Rinaldo
While this may work, be aware that a collection isn't a sequence.Ectype
@Ectype Philosophical notions aside, I think you'll find java.util.List as a superclass of a seq and that the java method is on pair of java.util.Lists. As such, you could use this on lazy sequences (just be careful not to evaluate an infinite one) (java.util.Collections/indexOfSubList (range 10) (range 3 7)) ;=> 3, vectors, sorted-maps, etc.Denison
That's interesting to know. However, it seems it does work on lazy seqs, but the operation itself doesn't seem to be lazy. I tried both methods on (range 10000000) (range 999998 999999), and got a GC overhead limit on the Collections way, and the normal answer with my find-pos. So there must be more than purely philosophical notions.Ectype
@Ectype You are right. That does indeed appear to be trying to realize the entire range into memory when you disable the GC overhead check.Denison
E
4

A sequence is an abstraction, not a concretion. Certain concretions that you can use through the sequence abstraction have a way to find the position of a subsequence (strings and java collections, for instance), but sequences in general don't, because the underlying concretion doesn't have to have an index.

What you can do however, is create a juxt of the element identity and an index function. Have a look at map-indexed.

Here's a naive implementation that will lazily find the position of (all) the subsequence(s) in a sequence. Just use first or take 1 to find only one:

(defn find-pos
  [sq sub]
  (->>
    (partition (count sub) 1 sq)
    (map-indexed vector)
    (filter #(= (second %) sub))
    (map first)))

=> (find-pos  [:a :b \c 5 6 :foo \g :h]
                [\c 5 6 :foo])
(2)

=> (find-pos  "the quick brown fox"
                (seq "quick"))
(4)

Take care that index-based algorithms generally aren't something you would do in a functional language. Unless there are good reasons you need the index in the final result, lavish use of index lookup is considered code smell.

Ectype answered 5/3, 2013 at 13:21 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.