How to split list into evenly sized chunks in Racket (Scheme)?
Asked Answered
C

7

8

Example:
How to convert list:
'(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)

Into list of lists:
'((0 1 2 3) (4 5 6 7) (8 9 10 11) (12 13 14 15))

Based on answers provided here so far, this is what I've come up with:

First define function to take up to 'n' elements from beginning of the list:

(define (take-up-to n xs)
  (define (iter xs n taken)
    (cond
      [(or (zero? n) (empty? xs)) (reverse taken)]
      [else (iter (cdr xs) (- n 1) (cons (car xs) taken))]))
  (iter xs n '()))

Second is similar function for the rest of list:

(define (drop-up-to n xs)
  (define (iter xs n taken)
    (cond
      [(or (zero? n) (empty? xs)) xs]
      [else (iter (cdr xs) (- n 1) (cons (car xs) taken))]))
  (iter xs n '()))

This could have been done as one function that returns two values and Racket has a function 'split-at' that produces same result, but I did this as an exercise.

ps. Is this correct use of tail recursion ?

Than split-into-chunks can be written like this:

(define (split-into-chunks n xs)
  (if (null? xs)
      '()
      (let ((first-chunk (take-up-to n xs))
            (rest-of-list (drop-up-to n xs)))
        (cons first-chunk (split-into-chunks n rest-of-list)))))

pps. Can this one be improved even more or is it 'good enough' ?

Concurrent answered 4/1, 2012 at 11:9 Comment(4)
You'll need to be more specific: how is either the number of lists, or the (approximate) size of each list, determined?Coremaker
As I'm new to Racket, I'm more interested in general principles behind traversing lists and recursion, but what I really need now is this: I want to make list of 4 byte chunks out of large list of bytes whose length is always multiple of 4 (So there are no 'leftovers').Concurrent
Chunks are RGBA values. Number of chunks is ~100.000 If there is a better structure than list of lists for example like this, please feel free to recommend it! Vectors, bytes maybe ?Concurrent
There are likely to be better data structures, yes. Which is best depends exactly on what you hope to do with it, but vectors are one candidate, byte strings are another; you could also use Racket's record type support to represent each pixel as a record, and have list or vector of pixels.Laden
L
8

There's a common utility function in Scheme, in the SRFI-1 library (which Racket offers, but I don't recall how to import it), called take, which takes the initial n elements from a list:

(take 4 '(0 1 2 3 4 5 6 7 8))
=> '(0 1 2 3)

There is also in the same library a function called drop which removes the initial n elements from a list:

(drop 4 '(0 1 2 3 4 5 6 7 8))
=> '(4 5 6 7 8)

You can break down the problem into smaller pieces by using functions like these. So, a first (but incorrect) approximation to solving your problem would be this:

(define (split-into-chunks n xs)
  (if (null? xs)
      '()
      (let ((first-chunk (take n xs))
            (rest (drop n xs)))
        (cons first-chunk (split-into-chunks n rest)))))

As I noted, however, this solution is suboptimal. Why? Because (take n xs) gives you an error when the list xs has fewer than n elements; translated to your problem, if the list has a non-n multiple of elements you get an error. However, you can fix this by writing a pair of functions, take-up-to and drop-up-to that work like take and drop but don't have that problem. So example usage of the functions would look like this:

(take-up-to 4 '(0 1 2))
=> '(0 1 2)

(drop-up-to 4 '(0 1 2))
=> '()

This is as much as I'm going to tell you. I suggest you do these things:

  • Write your own implementations of take, drop, take-up-to and drop-up-to, and use them to write the function you're trying to implement.
  • Skim through the documentation for the SRFI-1 library and familiarize yourself with what the functions there do. A lot of these list problems break down into easy combinations of these functions, so you want to know about them.
  • Learn how to import this library into Racket. (Can't help you there.)
  • As an exercise, try your hand at writing your own implementations of some of the SRFI-1 functions. (It's OK to simplify them a bit for the sake of the exercise; for example, while many of these functions will deal with multiple list arguments, it's OK for exercise purposes to write a version that deals with just one list.)

EDIT: Here's simple implementation of take-up-to:

(define (take-up-to n xs)
  (if (or (zero? n) (null? xs))
      '()
      (cons (car xs) (take-up-to (- n 1) (cdr xs)))))

It's possible to improve on this still some more to use only tail calls (and thus run in constant space). That's left as yet another exercise.

Laden answered 4/1, 2012 at 17:59 Comment(5)
Thank you! Have spent some time making those functions but now I understand much more :) Will add my answer...Concurrent
Here is my take on those functions: ` (define (take-up-to xs n) ` (if (< (length xs) n) xs (take xs n))) (define (drop-up-to xs n) (if (< (length xs) n) '() (drop xs n)))Concurrent
They seem to work, but splitting large lists lasts very long time compared to simple function without checking.Concurrent
@Concurrent Yeah, that's not a good implementation of take-up-to; taking the length of a large list is expensive. I edited the response to add a better one (but still not optimal).Laden
I had a vague idea how to count how many elements more are left for 'grabbing' each cycle, but couldn't figure it out last night. And it is simple as that '(function (- n 1) ...' Good examples, thanks! ps. Have to read much more :)Concurrent
E
6

as for me it's something like

(define (split-by lst n)
   (if (not (empty? lst))
       (cons (take lst n) (split-by (drop lst n) n))
       '() ))

for example

(split-by '(3 2 1 43 54 25 100 -14 -42) 3)

yields

'((3 2 1) (43 54 25) (100 -14 -42))
Encircle answered 30/4, 2014 at 17:19 Comment(2)
Looks nice and works well for multiple-of-n long lists. Thanks!Concurrent
Nice elegant, recursive solution.Allheal
S
3

You ask a nice general-purpose question, but I think that what you want here is something that uses byte-strings, rather than lists. Here's some code (including error checking), along with a test case:

#lang racket

(require rackunit)

;; given a byte string, split it into a vector of byte-strings
;; of length 4
(define (split-bytes bytes)
  (define len (bytes-length bytes))
  (unless (= 0 (modulo len 4))
    (raise-type-error 'split-bytes
                      "byte string of length divisible by 4"
                      0 bytes))
  (for/vector ([i (in-range (/ len 4))])
     (subbytes bytes (* 4 i) (* 4 (add1 i)))))

(check-equal?
 (split-bytes
  #"hhoh.h9ew,n49othsv97")
 (vector #"hhoh"
         #".h9e"
         #"w,n4"
         #"9oth"
         #"sv97"))
Strobilaceous answered 5/1, 2012 at 18:19 Comment(2)
self-comment: indeed, if all you want is to iterate over the sub-byte-strings, you may just want to create a generator, and avoid the overhead of creating the vector. Creating a vector of length 100K probably isn't too big a deal, though.Strobilaceous
Yeah, byte-string is the original type of my data (from bitmap) and I'm familiar with 'subbytes' function (made 'get-pixel' with it). Your approach with 'for' is more familiar to me (I'm so new to Racket that I didn't even know there is a 'add1' function :) ). Will have to check out generators. Thanks!Concurrent
J
2

Another way of doing that is to provide a higher-order function map-n, which takes n values from a list and applies a function to them:

(define (map-n n fn l . lists)
  (if (any (lambda(l)(< (length l) n)) (cons l lists))
      '()
      (cons (apply fn (append-map (lambda(l)(take l n)) (cons l lists)))
            (apply map-n n fn (map (lambda(l)(drop l n)) (cons l lists))))))

(e.g. (map-n 4 + '(1 2 3 4 5 6 7 8)) ===> (10 26))
(e.g. (map-n 3 (lambda (a b c) a) '(1 2 3 4 5 6)) ===> (1 4))

Having this function, one can simply

(define (split-by n l)
  (map-n n list l))

The disadvantage might be that if the length of the list isn't divisible by n, the remainder will be rejected from the result.

Another funky way is to create a function that splits a list into chunks of arbitrary size:

(define (chunk-list l . chunk-sizes)
  (assert (and (list? l)
               (every natual? chunk-sizes)
               (<= (fold + 0 chunk-sizes) (length l))))
  (let loop ((result '())
             (sizes chunk-sizes)
             (rest l))
    (match sizes
      (()
       (reverse result))
      ((size . sizes)
       (let-values (((this rest) (split-at rest size)))
         (loop `(,this ,@result) sizes rest))))))

(e.g. (chunk-list '(1 2 3 4 5 6 7 8 9 10) 0 1 2 3 4)
      ===> (() (1) (2 3) (4 5 6) (7 8 9 10))

and then define split-by using make-list:

(define (split-by n l)
  (let ((size (quotient (length l) n)))
    (apply chunk-list l (make-list size n))))

Note that the definition of map-n uses the any function from srfi-1, and chunk-list uses Alex Shinn's pattern matcher (although it could be easily rewritten using plain if, eq?, car and cdr)

Jericajericho answered 5/8, 2014 at 13:29 Comment(0)
E
2

If you're looking for a tail-recursive solution, one approach is to use the named let:

(define (group n xs)
  (let loop ([grouped '()] [xs xs])
    (if (empty? xs)
        (reverse grouped)
        (loop (cons (take xs n) grouped)
              (drop xs n)))))

However, this fails if xs would have leftover elements, so we'll need to add a case that checks for this:

(define (group n xs)
  (let loop ([grouped '()] [xs xs])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= (length xs) n)
       (loop (cons xs grouped) '())]
      [else (loop (cons (take xs n) grouped)
                  (drop xs n))])))

This works, but we can do better. The problem here is that calculating (length xs) runs in linear time, because the only way to find the length of a simple list is to walk the entire list. Since this is inside a loop that is run a number of times proportional to the size of xs, this code runs in quadratic time when it should be simple to accomplish it in linear time. We can circumvent this problem by calculating the length of xs once and then subtracting n from it each iteration:

(define (group n xs)
  (let loop ([grouped '()] [xs xs] [l (length xs)])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (loop (cons (take xs n) grouped)
                  (drop xs n)
                  (- l n))])))

And we're back to linear time while still preserving tail recursion, and therefore constant space. There is another improvement we can make however. The Racket function split-at combines the functionality of take and drop and returns both lists as two values. To use multiple-value-returning functions, we can use let-values:

(define (group n xs)
  (let loop ([grouped '()] [xs xs] [l (length xs])
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (let-values ([(taken dropped) (split-at xs n)])
              (loop (cons taken grouped)
                    dropped
                    (- l n)))])))

This is slightly faster because split-at can avoid the repeat work of ignoring the first n elements in the drop portion of it's functionality because those elements will already be consumed by take. This code does however not account for bad user input. If a user supplies an n larger than the length of xs, it will throw an error when it should return (list xs). This is simple enough to check for, but our code is getting awfully extended towards the right with all this nesting. In addition to checking for this, we can split the loop off into an internally defined function:

(define (group n xs)
  (define (loop grouped xs l)
    (cond
      [(empty? xs)
       (reverse grouped)]
      [(<= l n)
       (loop (cons xs grouped) '() 0)]
      [else (let-values ([(taken dropped) (split-at xs n)])
              (loop (cons taken grouped)
                    dropped
                    (- l n)))]))
  (let ([l (length xs)])
    (if (>= n l)
        (list xs)
        (loop '() xs l))))

This function is tail recursive, does not calculate (length xs) more than once, ensures that (group 4 '(1 2 3)) evaluates to '((1 2 3)), and uneven grouping such that (group 2 '(1 2 3) evaluates to '((1 2) (3)), running in linear time and constant space.

Euraeurasia answered 6/8, 2014 at 0:21 Comment(1)
Thanks for detailed examples! Learned something new today :)Concurrent
A
1

If you'd like to get better at solving problems like this, I'd highly recommend The Little Schemer. It will train you to think in terms that will make these problems tractable, and it only takes a few hours to run through, cover-to-cover.

Auroraauroral answered 4/1, 2012 at 20:5 Comment(1)
Thanks for suggestion! Heard about it before, will start reading it tonight. I like how you put it: "train you to think...".Concurrent
I
1

After I've used Racket for a bit longer time, I can't help to notice that the second usage of let documented at its official doc is very prevalently used in its world.

So my method of this question would be:

(let loop ([lst '(1 2 3 4 5 6 7 8 9 10 1 12 13 14)]
           [result '()])
  (if (< (length lst) 3)
      (append result (list lst))
      (loop (drop lst 3)
            (append  result (list (take lst 3))))))

The secret here is, you can set down this code to a format like below:

(let loop ([lst '(...)]
           [result '()])
  (if (empty? lst)
      result
      (loop (... lst ...)
            (append result (... lst ...)))))

and then try to flesh out it whenever you came across a problem that seems like can be resolved by a loop.

And also for your information, somebody already implemented this as a function called slice-at. The implementation code of it can be found at here.

Institutive answered 22/2, 2022 at 15:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.