Slice chunking in Go
Asked Answered
F

10

45

I have a slice with ~2.1 million log strings in it, and I would like to create a slice of slices with the strings being as evenly distributed as possible.

Here is what I have so far:

// logs is a slice with ~2.1 million strings in it.
var divided = make([][]string, 0)
NumCPU := runtime.NumCPU()
ChunkSize := len(logs) / NumCPU
for i := 0; i < NumCPU; i++ {
    temp := make([]string, 0)
    idx := i * ChunkSize
    end := i * ChunkSize + ChunkSize
    for x := range logs[idx:end] {
        temp = append(temp, logs[x])
    }
    if i == NumCPU {
        for x := range logs[idx:] {
            temp = append(temp, logs[x])
        }
    }
    divided = append(divided, temp)
}

The idx := i * ChunkSize will give me the current "chunk start" for the logs index, and end := i * ChunkSize + ChunkSize will give me the "chunk end", or the end of the range of that chunk. I couldn't find any documentation or examples on how to chunk/split a slice or iterate over a limited range in Go, so this is what I came up with. However, it only copies the first chunk multiple times, so it doesn't work.

How do I (as evenly as possible) chunk an slice in Go?

Faultfinder answered 3/2, 2016 at 14:23 Comment(0)
B
107

You don't need to make new slices, just append slices of logs to the divided slice.

http://play.golang.org/p/vyihJZlDVy

var divided [][]string

chunkSize := (len(logs) + numCPU - 1) / numCPU

for i := 0; i < len(logs); i += chunkSize {
    end := i + chunkSize

    if end > len(logs) {
        end = len(logs)
    }

    divided = append(divided, logs[i:end])
}

fmt.Printf("%#v\n", divided)
Bamberger answered 3/2, 2016 at 14:38 Comment(1)
If anyone's wondering what (len(logs) + numCPU - 1) / numCPU is, it's simply the ceil of len(logs)/numCPUDextrorse
F
20

Using generics (Go version >=1.18):

func chunkBy[T any](items []T, chunkSize int) (chunks [][]T) {
    for chunkSize < len(items) {
        items, chunks = items[chunkSize:], append(chunks, items[0:chunkSize:chunkSize])
    }
    return append(chunks, items)
}

Playground URL

Or if you want to manually set the capacity:

func chunkBy[T any](items []T, chunkSize int) [][]T {
    var _chunks = make([][]T, 0, (len(items)/chunkSize)+1)
    for chunkSize < len(items) {
        items, _chunks = items[chunkSize:], append(_chunks, items[0:chunkSize:chunkSize])
    }
    return append(_chunks, items)
}

Playground URL

Flivver answered 27/5, 2022 at 16:32 Comment(1)
Best one! But return value could be improved for e.g. items := var s []int. I can update, shall I?Dragonfly
J
7

Per Slice Tricks

Batching with minimal allocation

Useful if you want to do batch processing on large slices.

actions := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
batchSize := 3
batches := make([][]int, 0, (len(actions) + batchSize - 1) / batchSize)

for batchSize < len(actions) {
    actions, batches = actions[batchSize:], append(batches, actions[0:batchSize:batchSize])
}
batches = append(batches, actions)

Yields the following:

[[0 1 2] [3 4 5] [6 7 8] [9]]
Jer answered 21/2, 2022 at 10:6 Comment(0)
T
6

Another variant. It works about 2.5 times faster than the one proposed by JimB. The tests and benchmarks are here.

https://play.golang.org/p/WoXHqGjozMI

func chunks(xs []string, chunkSize int) [][]string {
    if len(xs) == 0 {
        return nil
    }
    divided := make([][]string, (len(xs)+chunkSize-1)/chunkSize)
    prev := 0
    i := 0
    till := len(xs) - chunkSize
    for prev < till {
        next := prev + chunkSize
        divided[i] = xs[prev:next]
        prev = next
        i++
    }
    divided[i] = xs[prev:]
    return divided
}
Tebet answered 8/4, 2021 at 20:59 Comment(3)
It works about 2.5 times faster this is unexplained, unfortunately. My guess is, less JT allocations.Exegetic
@mh-cbon The main reason is the preallocated slice as we know its exact final size. It gives us 9 allocs/op instead of 53 and most of the speed gainTebet
Also works for chunkSize > len(xs)Degroot
T
1
func chunkSlice(items []int32, chunkSize int32) (chunks [][]int32) {
 //While there are more items remaining than chunkSize...
 for chunkSize < int32(len(items)) {
    //We take a slice of size chunkSize from the items array and append it to the new array
    chunks = append(chunks, items[0:chunkSize])
    //Then we remove those elements from the items array
    items = items[chunkSize:]
 }
 //Finally we append the remaining items to the new array and return it
 return append(chunks, items)
}

Visual example

Say we want to split an array into chunks of 3

items:  [1,2,3,4,5,6,7]
chunks: []

items:  [1,2,3,4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6],[7]]
return
Tyranny answered 7/10, 2021 at 7:35 Comment(1)
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. You can find more information on how to write good answers in the help center: stackoverflow.com/help/how-to-answer . Good luck 🙂Photogrammetry
G
1

Go 1.23 (August 2024)

Use slices.Chunk. This gives an iterator that you can range over:

import (
    "fmt"
    "slices"
)

func main() {
    vals := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

    for chunk := range slices.Chunk(vals, 2) {
        fmt.Println(chunk)
    }

}

Prints:

[1 2]
[3 4]
[5 6]
[7 8]
[9 10]

Playground: https://go.dev/play/p/NvPQg5CojCb?v=gotip

Gobi answered 6/6 at 8:20 Comment(0)
B
0

use reflect for any []T

https://github.com/kirito41dd/xslice

package main

import (
    "fmt"
    "github.com/kirito41dd/xslice"
)

func main() {
    s := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
    i := xslice.SplitToChunks(s, 3)
    ss := i.([][]int)
    fmt.Println(ss) // [[0 1 2] [3 4 5] [6 7 8] [9]]
}

https://github.com/kirito41dd/xslice/blob/e50d91fa75241a3a03d262ad51c8e4cb2ea4b995/split.go#L12

func SplitToChunks(slice interface{}, chunkSize int) interface{} {
    sliceType := reflect.TypeOf(slice)
    sliceVal := reflect.ValueOf(slice)
    length := sliceVal.Len()
    if sliceType.Kind() != reflect.Slice {
        panic("parameter must be []T")
    }
    n := 0
    if length%chunkSize > 0 {
        n = 1
    }
    SST := reflect.MakeSlice(reflect.SliceOf(sliceType), 0, length/chunkSize+n)
    st, ed := 0, 0
    for st < length {
        ed = st + chunkSize
        if ed > length {
            ed = length
        }
        SST = reflect.Append(SST, sliceVal.Slice(st, ed))
        st = ed
    }
    return SST.Interface()
}
Beria answered 17/7, 2021 at 11:28 Comment(0)
S
0

Summarize:

// ChunkStringSlice divides []string into chunks of chunkSize.
func ChunkStringSlice(s []string, chunkSize int) [][]string {
    chunkNum := int(math.Ceil(float64(len(s)) / float64(chunkSize)))
    res := make([][]string, 0, chunkNum)
    for i := 0; i < chunkNum-1; i++ {
        res = append(res, s[i*chunkSize:(i+1)*chunkSize])
    }
    res = append(res, s[(chunkNum-1)*chunkSize:])
    return res
}

// ChunkStringSlice2 divides []string into chunkNum chunks.
func ChunkStringSlice2(s []string, chunkNum int) [][]string {
    res := make([][]string, 0, chunkNum)
    chunkSize := int(math.Ceil(float64(len(s)) / float64(chunkNum)))
    for i := 0; i < chunkNum-1; i++ {
        res = append(res, s[i*chunkSize:(i+1)*chunkSize])
    }
    res = append(res, s[(chunkNum-1)*chunkSize:])
    return res
}
Silica answered 29/4, 2022 at 11:50 Comment(0)
M
0

Go before 1.23

The function is written based on slices.Chunk function source code

func Chunk[T any](slice []T, n uint64) <-chan []T {
    if n == 0 {
        panic("n can`t be less than 1")
    }

    channel := make(chan []T, 1)

    go func() {
        defer close(channel)
        for i := uint64(0); i < uint64(len(slice)); i += n {
            // Clamp the last chunk to the slice bound as necessary.
            end := min(n, uint64(len(slice[i:])))

            // Set the capacity of each chunk so that appending to a chunk does
            // not modify the original slice.
            channel <- slice[i : i+end : i+end]
        }
    }()

    return channel
}

Go Playground Example

Morphology answered 8/7 at 14:33 Comment(0)
O
-1

There is go-deeper/chunks module that allows to split a slice of any type (with generics) into chunks with approximately equals sum of values.

package main

import (
    "fmt"

    "github.com/go-deeper/chunks"
)

func main() {
    slice := []int64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    sliceChunks := chunks.Split(slice, 7)

    fmt.Println(sliceChunks)
}

Output:

[[1 2 3 4 5] [6 7 8 9 10]]
Ohg answered 25/10, 2022 at 18:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.