What is most idiomatic way to create an iterator in Go?
Asked Answered
S

13

74

One option is to use channels. Channels are like iterators in a way and you can iterate over them using range keyword. But when you find out you can't break out of this loop without leaking goroutine the usage becomes limited.

What is the idiomatic way to create iterator pattern in go?

Edit:

The fundamental problem with channels is they are a push model. Iterator is is a pull model. You don't have to tell iterator to stop. I'm looking for a way to iterate over collections in a nice expressive way. I would also like to chain iterators (map, filter, fold alternatives).

Sizzle answered 22/12, 2012 at 6:8 Comment(1)
For an up to date solution since Go 1.23 see my post down therePericles
P
84

Channels are useful, but closures are often more suitable.

package main

import "fmt"

func main() {
    gen := newEven()
    fmt.Println(gen())
    fmt.Println(gen())
    fmt.Println(gen())
    gen = nil // release for garbage collection
}

func newEven() func() int {
    n := 0
    // closure captures variable n
    return func() int {
        n += 2
        return n
    }
}

Playground: http://play.golang.org/p/W7pG_HUOzw

Don't like closures either? Use a named type with a method:

package main

import "fmt"

func main() {
    gen := even(0)
    fmt.Println(gen.next())
    fmt.Println(gen.next())
    fmt.Println(gen.next())
}

type even int

func (e *even) next() int {
    *e += 2
    return int(*e)
}

Playground: http://play.golang.org/p/o0lerLcAh3

There are tradeoffs among the three techniques so you can't nominate one as idiomatic. Use whatever best meets your needs.

Chaining is easy because functions are first class objects. Here's an extension of the closure example. I added a type intGen for integer generator which makes it clear where generator functions are used as arguments and return values. mapInt is defined in a general way to map any integer function to an integer generator. Other functions such as filter and fold could be defined similarly.

package main

import "fmt"

func main() {
    gen := mapInt(newEven(), square)
    fmt.Println(gen())
    fmt.Println(gen())
    fmt.Println(gen())
    gen = nil // release for garbage collection
}

type intGen func() int

func newEven() intGen {
    n := 0
    return func() int {
        n += 2
        return n
    }
}

func mapInt(g intGen, f func(int) int) intGen {
    return func() int {
        return f(g())
    }
}

func square(i int) int {
    return i * i
}

Playground: http://play.golang.org/p/L1OFm6JuX0

Primitivism answered 22/12, 2012 at 12:18 Comment(7)
Very nice. But you can't use range keyword with closure. Can you provide an example of for loops? How does iteration stop? You probably need a second return value to indicate that.Sizzle
Range works with built in types, not user defined things like functions. You're right that you could use a second return value to stop iteration. That would be clear and idomatic. You might also just stop iterating when you've found what you are looking for or when you have processed the desired amount of data.Primitivism
Note that if you actually know how much data you will process, it probably makes sense just to store it all in a slice. That should be fine unless you are processing a really huge amount of data that doesn't fit comfortably in memory. If you want, you make sure to release all references to the data when you're done with it and then it can be garbage collected. And of course with a slice, you can use range.Primitivism
So I played fast and loose with the meaning of "iterator" here to try to show some useful techniques. Apologies for chasing wishes and not answering the question as asked. In my second attempt at an answer I tried to follow a more rigorous definition.Primitivism
Thanks for the example. I didn't notice function literals in Go are closures until now. It's wonderful!Muscat
I do this often for slices, but I just realized I don't know how to build one of these iterators over maps. Any suggestions?Moulding
to make it work, need to demonstrate using it with a loop, like range, or whateverAlarise
C
42

TL;DR: Forget closures and channels, too slow. If the individual elements of your collection are accessible by index, go for the classic C iteration over an array-like type. If not, implement a stateful iterator.

I needed to iterate over some collection type for which the exact storage implementation is not set in stone yet. This, plus the zillions other reasons to abstract the implementation details from the client, lead me to do some testing with various iteration methods. Full code here, including some implementations that make use of errors as values. Here are the benchmark results:

  • classic C iteration over an array-like structure. The type provides the methods ValueAt() and Len():

    l := Len(collection)
    for i := 0; i < l; i++ { value := collection.ValueAt(i) }
    // benchmark result: 2492641 ns/op
    
  • Closure style iterator. The collection's Iterator method returns a next() function (a closure over the collection and cursor) and a hasNext boolean. next() returns the next value and a hasNext boolean. Note that this runs much faster than using separate next() and hasNext() closures returning single values:

    for next, hasNext := collection.Iterator(); hasNext; {
        value, hasNext = next()
    }
    // benchmark result: 7966233 ns/op !!!
    
  • Stateful iterator. A simple struct with two data fields, the collection and a cursor, and two methods: Next() and HasNext(). This time the Iterator() method of the collection returns a pointer to a properly initialized iterator structure:

    for iter := collection.Iterator(); iter.HasNext(); {
        value := iter.Next()
    }
    // benchmark result: 4010607 ns/op
    

As much as I like closures, performance wise it's a no-Go. As for design patterns, well, Gophers prefer the term "idiomatic way to do" stuff for good reason. Also grep the go source tree for iterators: with so few files that mention the name, iterators are definitely not a Go thing.

Also check out this page: http://ewencp.org/blog/golang-iterators/

Anyhow, interfaces do not help in any way here, unless you want to define some Iterable interface, but this is a completely different subject.

Carliecarlile answered 15/4, 2015 at 17:49 Comment(2)
"performance wise it's a no-Go" I like the pun, but I'm not sure I agree with the conclusion. Looking at your numbers, I see that all of them are within about a 3x difference of each other, and my first thought is that you should arbitrarily choose whichever you think is most expressive, unless you have actually profiled the code and determined that this iteration is a meaningful bottleneck in your application. In many applications, the work inside the loop will dominate, and this level of iteration difference will be negligible.Patronage
Laughing with iterrr in Nim.github.com/hamidb80/iterrrUnicuspid
P
26

TL;DR: Iterators are not idiomatic in Go. Leave them to other languages.

In depth then, the Wikipedia entry "Iterator pattern" begins, "In object-oriented programming, the iterator pattern is a design pattern..." Two red flags there: First, object oriented programming concepts often don't translate well into Go, and second, many Go programmers don't think much of design patterns. That first paragraph also includes "The iterator pattern decouples algorithms from containers", but only after stating "an iterator [accesses] the container's elements. Well which is it? If an algorithm is accessing the container's elements it can hardly claim to be decoupled. The answer in many languages involves some kind of generics that allows the language to generalize over similar data structures. The answer in Go is interfaces. Interfaces enforce a stricter decoupling of algorithms and objects by denying access to structure and requiring all interaction be based on behavior. Behavior means capabilites expressed through methods on the data.

For a minimal iterator type the needed capability is a Next method. A Go interface can represent an iterator object by simply specifying this single method signature. If you want a container type to be iterable, it must satisfy the iterator interface by implementing all of the methods of the interface. (We only have one here, and in fact it is common for interfaces to have only a single method.)

A minimal working example:

package main

import "fmt"

// IntIterator is an iterator object.
// yes, it's just an interface.
type intIterator interface {
    Next() (value int, ok bool)
}

// IterableSlice is a container data structure
// that supports iteration.
// That is, it satisfies intIterator.
type iterableSlice struct {
    x int
    s []int
}

// iterableSlice.Next implements intIterator.Next,
// satisfying the interface.
func (s *iterableSlice) Next() (value int, ok bool) {
    s.x++
    if s.x >= len(s.s) {
        return 0, false
    }
    return s.s[s.x], true
}

// newSlice is a constructor that constructs an iterable
// container object from the native Go slice type.
func newSlice(s []int) *iterableSlice {
    return &iterableSlice{-1, s}
}

func main() {
    // Ds is just intIterator type.
    // It has no access to any data structure.
    var ds intIterator

    // Construct.  Assign the concrete result from newSlice
    // to the interface ds.  ds has a non-nil value now,
    // but still has no access to the structure of the
    // concrete type.
    ds = newSlice([]int{3, 1, 4})

    // iterate
    for {
        // Use behavior only.  Next returns values
        // but without insight as to how the values
        // might have been represented or might have
        // been computed.
        v, ok := ds.Next()
        if !ok {
            break
        }
        fmt.Println(v)
    }
}

Playground: http://play.golang.org/p/AFZzA7PRDR

This is the basic idea of interfaces, but it's absurd overkill for iterating over a slice. In many cases where you would reach for an iterator in other languages, you write Go code using built in language primitives that iterate directly over basic types. Your code stays clear and concise. Where that gets complicated, consider what features you really need. Do you need to emit results from random places in some function? Channels provide a yield-like capability that allows that. Do you need infinite lists or lazy evaluation? Closures work great. Do you have different data types and you need them to transparently support the same operations? Interfaces deliver. With channels, functions, and interfaces all first class objects, these techniques are all easily composable. So what then is the most idiomatic way? It is to experiment with different techniques, get comfortable with them, and use whatever meets your needs in the simplest way possible. Iterators, in the object oriented sense anyway, are almost never simplest.

Primitivism answered 22/12, 2012 at 20:1 Comment(7)
Well I agree that c++ iterators are not that simple. But compare to Python generators or IEnumerable<T> in C#. Your last paragraph is great, maybe I need to experiment with different techniques in place of traditional iterators.Sizzle
This is an amazing answer with quite a lot of depth to it. @Sonia, have you considered writing a "Way of Go" type of book or blog series?Dodecagon
Iterators are very idiomatic in Go, and there are a number of examples in the standard library including bufio.Scanner and tar.Reader off the top of my head.Moulding
Well done, I think your answer is one of the better one's here because it includes an example of an exhaustible iterator. The idea to return a tuple from the Next method instead of a single value is simple and effective.Waylen
"If an algorithm is accessing the container's elements it can hardly claim to be decoupled." Each iterator implementation is coupled to its container but the user of the iterator decouples their algorithm from the container they're using. Generics are not related to iterators - Java had iterators before generics. "Do you have different data types and you need them to transparently support the same operations? Interfaces deliver" - that is precisely the point of Iterator+Iterable, which are interfaces that are built in to other languages allowing you to swap out iterables universally.Braddock
How to return errors during iteration. Imagine the use case of returning documents from a json file, if a document is not valid.Eponymy
You're taking the definition of the pattern to heart too much. One of the main purposes of an iterator is to not hold the entire thing in memory while still making the generation of values reusable. Anything that does that can be considered an iterator of sorts and Go absolutely supports this.Vahe
M
4

You can break out without leaking by giving your goroutines a second channel for control messages. In the simplest case it's just a chan bool. When you want the goroutine to stop, you send on this channel. Inside the goroutine, you put the iterator's channel send and the listen on the control channel inside a select.

Here's an example.

You can take this further by allowing different control messages, such as "skip".

Your question is pretty abstract, so to say more, a concrete example would be helpful.

Monolayer answered 22/12, 2012 at 8:4 Comment(2)
That is pretty contrived for a simple iteration. How about performance?Sizzle
I use this model a lot because it's otherwise pretty hard to have a paginated, but arbitrarily long data stream you want to consume. -- BTW, just close that channel when you're done with it. close is a persistent send.Thornton
B
4

Looking to the container/list package it looks like there is no way to do it. C-like way should be used if you iterate over object.

Something like this.

type Foo struct {
...
}

func (f *Foo) Next() int {
...
}

foo := Foo(10)

for f := foo.Next(); f >= 0; f = foo.Next() {
...
}
Bendicta answered 22/12, 2012 at 15:51 Comment(0)
M
4

Here's a way I thought of to do it with channels and goroutines:

package main

import (
    "fmt"
)

func main() {
    c := nameIterator(3)
    for batch := range c {
        fmt.Println(batch)
    }
}

func nameIterator(batchSize int) <-chan []string {
    names := []string{"Cherry", "Cami", "Tildy", "Cory", "Ronnie", "Aleksandr", "Billie", "Reine", "Gilbertina", "Dotti"}

    c := make(chan []string)

    go func() {
        defer close(c)
        for i := 0; i < len(names); i++ {
            startIdx := i * batchSize
            endIdx := startIdx + batchSize

            if startIdx > len(names) {
                continue
            }
            if endIdx > len(names) {
                c <- names[startIdx:]
            } else {
                c <- names[startIdx:endIdx]
            }
        }    
    }()

    return c
}

https://play.golang.org/p/M6NPT-hYPNd

I got the idea from Rob Pike's Go Concurrency Patterns talk.

Monda answered 19/5, 2018 at 0:48 Comment(1)
I like the look of this pattern. My one concern is that if for some reason the user does not complete iterating over the generator, then we have a hanging go routine. If this is a long lived application that generates a lot of these objects, it will become a problem.Libertinage
S
2

I published an article on this subject :

https://serge-hulne.medium.com/iterators-map-filter-reduce-and-list-processing-in-go-golang-implementing-python-functional-2d24d780051f

There's an associated Git repo: https://github.com/serge-hulne/iter/tree/main/iterate

The main idea is :

func Fib(n int) chan int {
    out := make(chan int)
    go func() {
        defer close(out)
        for i, j := 0, 1; i < n; i, j = i+j, i {
            out <- i
        }
    }()
    return out
}

Used as:

fibs = Fib(100)
for i := range Map(fibs) {
    fmt.Printf("i = %6v\n", i)
}
Seumas answered 22/8, 2021 at 16:57 Comment(1)
You'll have a memory leak if the user of the iterator breaks in the middle of the loop. The go routine we remail spinning and the chanel will stay open.Apocynthion
A
1

The very fact that there are so many seemingly different solutions here means that the doesn't seem to be an idiomatic way of doing it. I am starting my way in Go, and I thought there would be a way to tap into the power of range thing. Sadly, no.

Here's what I came up with (it is similar to some of the solutions above)

// Node Basically, this is the iterator (or the head of it) 
// and the scaffolding for your itterable type
type Node struct {
    next *Node
}

func (node *Node) Next() (*Node, bool) {
    return node.next, node.next != nil
}

// Add add the next node
func (node *Node) Add(another *Node) {
    node.next = another
}

and here is how I use it:

node := &Node{}
node.Add(&Node{})

for goOn := true; goOn; node, goOn = node.Next() {
    fmt.Println(node)
}

Or probably a more elegant solution:

...
func (node *Node) Next() *Node {
    return node.next
}
...

for ; node != nil; node = node.Next() {
    fmt.Println(node)
}
Adept answered 15/3, 2019 at 21:34 Comment(0)
C
1

Update to the current upvoted answers: as of Go 1.23, there is a standard way to create iterators. They are a function that take a yield(...) callback to push values. When used in for-range, yield is just syntax sugar for running the body of the loop. This makes them as easy to write as the standard approach of pushing into a channel but without needing any concurrency or overhead.

If you have a more complex usecase, there is a new iter.Pull(itr) function that takes an iterator and returns a next(...) , stop(...) pair of functions, which is compiler optimized to be just as efficient as the closure approach outlined in other answers and does not suffer from the overhead of pushing into a channel.

Croaker answered 15/6, 2024 at 15:48 Comment(0)
P
1

Go 1.23 (August 2024)

The loop-over-function feature is now coming to the language as part of the standard library. You will be able to use functions with a specific signature in range clauses and loop over them.

The signatures are conveniently defined in the upcoming iter package:

type Seq[V any] func(yield func(V) bool)

and

type Seq2[K, V any] func(yield func(K,V) bool)

The function func(func() bool) isn't defined in iter but can also be ranged over.

These range-able functions:

  • are generic in V or K, V
  • accept a function parameter called yield — clearly, the argument identifier doesn't have to be "yield" for this to work. It's just a conventional name.
  • the yield function itself accepts one argument of type V for sequences of single values, or two arguments of type K and V for sequences of pairs of values.
  • the yield function returns a bool that means whether to keep looping or not

With this, you can write methods that return iterators for your custom types:

type MyType struct {
    data []float64
}

func (m *MyType) All() iter.Seq[float64] {
    return func(yield func(float64) bool) {
        for _, v := range m.data {
            if !yield(v) {
                return
            }
        }
    }
}

And then you can range over it.

// All returns an iterator
for _, v := range myType.All() {
    // do something with v
}

The loop is written as usual. The compiler will take care of creating and passing an yield function to the iterator. yield returns true as long as the loop can proceed and false when it has to stop — i.e. when the loop ends or when there is a break statement.

More details can be found in the language specification:

For a function f, the iteration proceeds by calling f with a new, synthesized yield function as its argument. If yield is called before f returns, the arguments to yield become the iteration values for executing the loop body once. After each successive loop iteration, yield returns true and may be called again to continue the loop. As long as the loop body does not terminate, the "range" clause will continue to generate iteration values this way for each yield call until f returns. If the loop body terminates (such as by a break statement), yield returns false and must not be called again.

Note that the compiler creates the yield function but your iterator code is responsible for calling it when appropriate.

Pericles answered 23/6, 2024 at 6:35 Comment(0)
H
0

As other mates said you can work with channels to implement Generator design pattern which is what you're looking for.

Generator functions

Channels and goroutines provide a natural substrate for implementing a form of producer/producer pattern using generator functions. In this approach, a goroutine is wrapped in a function which generates values that are sent via a channel returned by the function. The consumer goroutine receives these values as they are generated.

Example extracted from Go Design Patterns For Real-World

package main

import (
    "fmt"
    "strings"
    )

func main() {
    data := []string{"Sphinx of black quartz, judge my vow", 
             "The sky is blue and the water too", 
             "Cozy lummox gives smart squid who asks for job pen",
             "Jackdaws love my big sphinx of quartz",
             "The quick onyx goblin jumps over the lazy dwarf"}
    histogram := make(map[string]int)
    words := words(data) // returns handle to data channel
    for word := range words { // Reads each word from channel every time
        histogram[word]++
    }   
    fmt.Println(histogram)
}

// Generator function that produces data
func words(data []string) <-chan string {
    out := make(chan string)
    // Go Routine
    go func() {
        defer close(out) // closes channel upon fn return
        for _, line := range data {
            words := strings.Split(line, " ")
            for _, word := range words {
                word = strings.ToLower(word)
                out <- word // Send word to channel 
            }
        }
     }()
     return out
}

https://play.golang.org/p/f0nynFWbEam

In this example, the generator function, declared as func words(data []string) <- chan string, returns a receive-only channel of string elements. The consumer function, in this instance main(), receives the data emitted by the generator function, which is processed using a for…range loop.

An improved version of this design pattern:

https://play.golang.org/p/uyUfz3ALO6J

Adding methods like Next and Error:

type iterator struct {
    valueChan   <-chan interface{}
    okChan      <-chan bool
    errChan     <-chan error
    err     error
}

func (i *iterator) next() (interface{}, bool) {
    var (
        value   interface{}
        ok  bool
    )
    value, ok, i.err = <-i.valueChan, <-i.okChan, <-i.errChan
    return value, ok
}

func (i *iterator) error() error {
    return i.err
}

// Generator function that produces data
func NewIterator(data []string) iterator {
    out := make(chan interface{})
    ok := make(chan bool)
    err := make(chan error)
    // Go Routine
    go func() {
        defer close(out) // closes channel upon fn return
        for _, line := range data {
            words := strings.Split(line, " ")
            for _, word := range words {
                word = strings.ToLower(word)
                out <- word // Send word to channel and waits for its reading
                ok <- true
                err <- nil // if there was any error, change its value
            }
        }
        out <- ""
        ok <- false
        err <- nil
     }()

     return iterator{ out, ok, err, nil }
}
Hower answered 1/8, 2019 at 8:0 Comment(0)
K
0

I had the same problem and I wanted a solution that:

  1. Allowed me to use range
  2. Returned a "read only" data structure (i.e. avoid arrays or maps)

I solved it with a buffered channel with enough space for your elements.

type Element struct {
  name string
  whatever int
}

var elements = []*Element{
  {name: "earth"},
  {name: "wind"},
  {name: "water"},
  {name: "fire"},
}

func Iterator() <- chan *Element {
  ch := make(chan *Element, len(elements) +1) // +1 to avoid blocking
  go func() {
    for _, e := range elements {
      ch <- e
    }
    close(ch) // because channel does not block, it's closed right away
  }() // And therefore the goroutine ends, no memory leak
  return ch
}

Note how in this instance I am using pointers to Elements. That is intended because in my problem space I want Elements to be mutable by their callers. But if you don't, you can pass objects.

Here's the full example in the Go playground

Karns answered 22/5, 2024 at 12:28 Comment(0)
A
-2

I created a generic method for this task:

package main

import "fmt"

type HasNext[T any] interface {
    Next() (bool, T)
}

type Nexter[T int] struct {
    Val int
    End int
}

func (n *Nexter[int]) Next() (bool, int) {
    n.Val = n.Val + 1
    if n.Val > n.End {
        return true, int(n.Val)
    }
    return false, int(n.Val)
}

func newEven() HasNext[int] {
    n := 0
    return &Nexter[int]{n, 5}
}

func main() {

    even := newEven()

    for {
        var b, v = even.Next()
        fmt.Println(v)
        if b {
            break
        }
    }
}

an improvement would be to figure out how to do it with range. But this is better than a closure, I think, in terms of perf and garbage collection?

Alarise answered 14/12, 2023 at 17:28 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.