Return all the indexes of a particular substring
Asked Answered
O

2

6

Is there a Scala library API method (and if not, an idiomatic way) to obtain a list of all the indexes for a substring (target) within a larger string (source)? I have tried to look through the ScalaDoc, but was not able to find anything obvious. There are SO many methods doing so many useful things, I am guessing I am just not submitting the right search terms.

For example, if I have a source string of "name:Yo,name:Jim,name:name,name:bozo" and I use a target string of "name:", I would like to get back a List[Int] of List(0, 8, 17, 27).

Here's my quick hack to resolve the problem:

def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
    def recursive(index: Int, accumulator: List[Int]): List[Int] = {
      if (!(index < source.size)) accumulator
      else {
        val position = source.indexOf(target, index)
        if (position == -1) accumulator
        else {
          recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
        }
      }
    }

    if (target.size <= source.size) {
      if (!source.equals(target)) {
        recursive(0, Nil).reverse
      }
      else List(0)
    }
    else Nil
  }

Any guidance you can give me replacing this with a proper standard library entry point would be greatly appreciated.

UPDATE 2019/Jun/16:

Further code tightening:

  def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
    def recursive(indexTarget: Int = index, accumulator: List[Int] = Nil): List[Int] = {
      val position = source.indexOf(target, indexTarget)
      if (position == -1)
        accumulator
      else
        recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
    }
    recursive().reverse
  }

UPDATE 2014/Jul/22:

Inspired by Siddhartha Dutta's answer, I tighted up my code. It now looks like this:

  def indexesOf(source: String, target: String, index: Int = 0, withinOverlaps: Boolean = false): List[Int] = {
    @tailrec def recursive(indexTarget: Int, accumulator: List[Int]): List[Int] = {
      val position = source.indexOf(target, indexTarget)
      if (position == -1) accumulator
      else
        recursive(position + (if (withinOverlaps) 1 else target.size), position :: accumulator)
    }
    recursive(index, Nil).reverse
  }

Additionally, if I have a source string of "aaaaaaaa" and I use a target string of "aa", I would like by default to get back a List[Int] of List(0, 2, 4, 6) which skips a search starting inside of a found substring. The default can be overridden by passing "true" for the withinOverlaps parameter which in the "aaaaaaaa"/"aa" case would return List(0, 1, 2, 3, 4, 5, 6).

Overmeasure answered 21/7, 2014 at 20:57 Comment(5)
No, there is not "a [standard] method". Also, since this is working code, it might be more fitting for code-review.Semiannual
@Overmeasure Any way you could BSD License that method so the boss man doesn't get mad at me if I copy/adapt it? :)Cyclotron
@Cyclotron It's my understanding that any code snippets posted here on StackOverflow may be assumed to essentially be public domain; i.e. unencumbered by any license constraints limiting your ability to cut/paste/modify/customize the snippet to whatever context you need.Overmeasure
@Overmeasure It's a funny gray area, they don't technically become MIT-ish with attribution until March (meta.stackexchange.com/questions/271080/…)Cyclotron
You should update the answer to use a Vector (and append instead of prepend) instead of a List to avoid the extra .reverse at the end. Probably would be helpful to return a type IndexedSeq[Int] as wellCyclotron
O
6

I am always inclined to reach into the bag of regex tricks with problems like this one. I wouldn't say it is proper, but it's a hell of a lot less code. :)

val r = "\\Qname\\E".r
val ex = "name:Yo,name:Jim,name:name,name:bozo"

val is = r.findAllMatchIn(ex).map(_.start).toList

The quotes \\Q and \\E aren't necessary for this case, but if the string you're looking for has any special characters, then it will be.

Ottavia answered 21/7, 2014 at 21:45 Comment(2)
Very nice. I spent less than two minutes evaluating the regex approach before whipping up my code Scala. It's nice to have more than one way to skin the string search cat.Overmeasure
BTW, you can also change the first line to """\Qname\E""".r if you want to use the pure regex (as an unescaped copy/paste from some other source). The triple quotes option in Scala is awesome!Overmeasure
M
2

A small code to get all the indexes
call the below method as getAllIndexes(source, target)

def getAllIndexes(source: String, target: String, index: Int = 0): List[Int] = {
        val targetIndex = source.indexOf(target, index)
        if(targetIndex != -1)
          List(targetIndex) ++ getAllIndexes(source, target, targetIndex+1)
        else
          List()
      }
Miun answered 22/7, 2014 at 6:27 Comment(5)
This appears to return the list in reverse order, i.e. List(27, 17, 8, 0), right? Additionally, you can optimize the two if pathways. The first replacing "List(targetIndex) ++ get..." with "targetIndex :: get...". And the second replacing "List()" with "Nil".Overmeasure
No the method returns the list in ascending order as per indexes i.e., List(0,8,17,27). The optimizations are correct.Miun
I just tried your call and after adding the @tailrec annotation, I am getting a compiler error stating it is not tail recursive (with either ++ or ::). However, your smaller code inspired me, so I provided an update to show my code tightened up. I also added another test case (the "aaaaaaaa", "aa" example) to show the benefit of the optional withinOverlaps parameter.Overmeasure
Yes the method is not tail recursive as it is first calling itself then adding the result.Miun
Ah. I have a string where I'm getting hundreds to thousands of hits back and cannot afford possibly blowing the call stack.Overmeasure

© 2022 - 2024 — McMap. All rights reserved.