Arrays vs. lists in Lisp: Why are lists so much faster in the code below?
Asked Answered
C

4

5

I got an unexpected result while solving Problem 75 in Project Euler. My code does find the correct solution, but it behaves strangely.

My solution consists of traversing a Pythagorean tree (Barning's matrices) until the perimeter limit is reached, counting the numbers of times the perimeter assumed each value, and, lastly, counting the perimeter lengths that occurred only once. My admittedly untidy but functioning code is:

(defparameter *barning-matrixes*
   '(#(1 -2  2) #(2 -1  2) #(2 -2  3)
     #(1  2  2) #(2  1  2) #(2  2  3)
     #(-1 2  2) #(-2 1  2) #(-2 2  3)))

(defparameter *lengths* (make-array 1500001 :initial-element 0))

(defun expand-node (n)
"Takes a primitive Pythagorean triple in a vector and traverses subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
   (let ((perimeter (reduce #'+ n)))
   (unless (> perimeter 1500000)
     (let ((next-nodes (mapcar #'(lambda (x)
                                   (reduce #'+ (map 'vector #'* n x))) *barning-matrixes*)))
        (loop for i from perimeter to 1500000 by perimeter
              do (incf (aref *lengths* i)))
        (expand-node (subseq next-nodes 0 3))
        (expand-node (subseq next-nodes 3 6))
        (expand-node (subseq next-nodes 6 9))))))

(expand-node #(3 4 5))  ; Takes too darn long :-(
(count 1 *lengths*)

I expected the tree expansion to run in a few milliseconds, but the expand-node function took 8.65 seconds--a lot more than expected--to traverse a not very large tree.

However, I was surprised when I tweaked the code to remove the vectors...

(defparameter *barning-matrixes*
   '((1 -2  2) (2 -1  2) (2 -2  3)
     (1  2  2) (2  1  2) (2  2  3)
     (-1 2  2) (-2 1  2) (-2 2  3)))


(defparameter *lengths* (make-array 1500001 :initial-element 0))

(defun expand-node (n)
"Takes a primitive Pythagorean triple in a list and traverses subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
   (let ((perimeter (reduce #'+ n)))
   (unless (> perimeter 1500000)
     (let ((next-nodes (mapcar #'(lambda (x) (reduce #'+ (mapcar #'* n x))) *barning-matrixes*)))
        (loop for i from perimeter to 1500000 by perimeter
              do (incf (aref *lengths* i)))
        (expand-node (subseq next-nodes 0 3))
        (expand-node (subseq next-nodes 3 6))
        (expand-node (subseq next-nodes 6 9))))))

(expand-node '(3 4 5))  ; Much faster, but why?!
(count 1 *lengths*)

...and the traversing went hugely faster, taking only 35 ms. I'm intrigued by this massive difference and am hoping someone out there can explain why it happened.

Thanks, Paulo

PS: I'm using CCL for all this.

Carlitacarlo answered 20/7, 2016 at 23:42 Comment(3)
One point that might be helpful to note is that many of the sequence functions take start and end arguments. E.g., you can (reduce ... :start s :end e), which means that often times with recursive processing of sequences you can avoid using subseq and can instead just point to a different part of the same sequence. That can help to avoid a lot of memory allocation.Ostracod
@Joshua Indeed the slicing and dicing of lists in my innermost closure is quite messy,. but I it adds a few milliseconds of "drag" to the recursion.Carlitacarlo
I didn't mean in the innermost closure; I meant in the three calls to subseq at the end. You've created a list, and then take three subsequences of it, which means creating that much list again (and traversing the original list multiple times). If expand-node were to take a start and end parameter, (and next-node were an array), you could do much more with much less copying).Ostracod
A
4

You didn't say which implementation you were using.

You would need to find out, where the time is spend.

But for me it looks like the implementation of MAP of a list and a vector of equal length to a new vector in your Common Lisp might be very inefficient. Even when consing a new vector, which has some overhead, the implementation can be much faster.

Try to implement the vector operation as a LOOP and compare:

(loop with v = (make-array (length n))
      for n1 across n
      for x1 across x
      for i from 0
      do (setf (aref v i) (* n1 x1))
      finally (return v))

This faster version conses too, but has replaced the list operations with vector operations:

(defparameter *barning-matrixes*
  #(#(1 -2  2) #(2 -1  2) #(2 -2  3) #(1  2  2) #(2  1  2) #(2  2  3) #(-1 2  2) #(-2 1  2) #(-2 2  3)))

(defparameter *lengths* (make-array 1500001 :initial-element 0))

(defun expand-node (n)
  "Takes a primitive Pythagorean triple in a vector and traverses subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
  (let ((perimeter (reduce #'+ n)))
    (unless (> perimeter 1500000)
      (let ((next-nodes
             (loop with v = (make-array (length *barning-matrixes*))
                   for e across *barning-matrixes*
                   for i from 0
                   do (setf (aref v i)
                            (reduce #'+
                                    (loop with v = (make-array (length n))
                                          for n1 across n
                                          for x1 across e
                                          for i from 0
                                          do (setf (aref v i) (* n1 x1))
                                          finally (return v))))
                   finally (return v))))
        (loop for i from perimeter to 1500000 by perimeter
              do (incf (aref *lengths* i)))
        (expand-node (subseq next-nodes 0 3))
        (expand-node (subseq next-nodes 3 6))
        (expand-node (subseq next-nodes 6 9))))))

(time (expand-node #(3 4 5)))

Let's look at your code:

(defun expand-node (n)

; here we don't know of which type N is. You call it from the toplevel
; with a vector, but recursive calls call it with a list

  "Takes a primitive Pythagorean triple in a vector and traverses
 subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
  (let ((perimeter (reduce #'+ n)))
    (unless (> perimeter 1500000)
      (let ((next-nodes (mapcar #'(lambda (x)    ; this mapcar creates a list
                                    (reduce #'+
                                            (map 'vector
                                                 #'*
                                                 n  ; <- list or vector
                                                 x))) ; <- vector
                                *barning-matrixes*)))
        (loop for i from perimeter to 1500000 by perimeter
              do (incf (aref *lengths* i)))
        (expand-node (subseq next-nodes 0 3))   ; this subseq returns a list most of the times...
        (expand-node (subseq next-nodes 3 6))
        (expand-node (subseq next-nodes 6 9))))))

So you call MAP with a list and a vector most of the times. What is the size of the result vector? MAP has to find out by traversing the list or by some other way. The result vector length is the shortest of the argument sequence lengths. Then it has to iterate over the list and the vector. If MAP now uses generic sequence operations, the element access into the list is always traversing the list. Obviously one can write an optimized version, which does all that faster, but a Common Lisp implementation might choose to provide only a generic implementation of MAP...

Anywise answered 21/7, 2016 at 7:37 Comment(0)
F
4

Welcome to the intricacies of Common Lisp optimization! The first thing to note is about the different program optimization strategies performed by the different implementations: I tried your examples in SBCL, and both of them performed very efficiently with almost the same time, while in CCL the vector version was executed much much slower than the list version. I do not know which implementation you have tried, but you can try to use different implementations to see very different execution times.

From a few tests in CCL it seems to me that the main problem arises from this form:

(map 'vector #'* n x)

which is executed much much slowly than the corresponding list version:

(mapcar #'* n x)

Using time I have seen that the vector version conses a lot.

This first impression has been confirmed by simply changing map with map-into, using an auxiliary vector. Actually the following version is slightly faster in CCL than the list version:

(defun expand-node (n)
"Takes a primitive Pythagorean triple in a vector and traverses subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
   (let ((perimeter (reduce #'+ n))
         (temp-vector (make-array 3 :initial-element 0)))
     (unless (> perimeter 1500000)
       (let ((next-nodes (mapcar #'(lambda (x)
                                   (reduce #'+ (map-into temp-vector #'* n x))) *barning-matrixes*)))
         (loop for i from perimeter to 1500000 by perimeter
               do (incf (aref *lengths* i)))
         (expand-node (subseq next-nodes 0 3))
         (expand-node (subseq next-nodes 3 6))
         (expand-node (subseq next-nodes 6 9))))))
Flunky answered 21/7, 2016 at 7:41 Comment(2)
We don't know from that if the consing is the problem, or if MAP-INTO is more efficient as MAP...Anywise
Actually if you cons the temp-vector all the time, then you see that it does not make much of a difference.Anywise
B
3

Inspecting vector #(1 2 3) on SBCL gives:

Dimensions: (3)
Element type: T
Total size: 3
Adjustable: NIL
Fill pointer: NIL
Contents:
0: 1
1: 2
2: 3

You can see that there are a little more data to store than in a list, even though the exact internal representation of vectors varies among implementations. For small vectors that keep being copied like in your example, you are likely to end up allocating more memory than with lists, which is visible in the bytes consed lines below. Allocating memory contributes to the run time. In my tests, note that the difference in time is not as big as in your tests.

;; VECTORS
(time (expand-node #(3 4 5)))
;; Evaluation took:
;;   2.060 seconds of real time
;;   2.062500 seconds of total run time (1.765625 user, 0.296875 system)
;;   [ Run times consist of 0.186 seconds GC time, and 1.877 seconds non-GC time. ]
;;   100.10% CPU
;;   4,903,137,055 processor cycles
;;   202,276,032 bytes consed

;; LISTS
(time (expand-node* '(3 4 5)))
;; Evaluation took:
;;   0.610 seconds of real time
;;   0.609375 seconds of total run time (0.609375 user, 0.000000 system)
;;   [ Run times consist of 0.016 seconds GC time, and 0.594 seconds non-GC time. ]
;;   99.84% CPU
;;   1,432,603,387 processor cycles
;;   80,902,560 bytes consed
Bookerbookie answered 21/7, 2016 at 7:38 Comment(0)
C
2

Everyone already answered while I was trying to optimize the code, so I'll just put this version here without bothering to explain too much. It should run pretty fast, at least on SBCL.

(declaim (optimize (speed 3) (safety 0) (debug 0)))

(declaim (type (simple-array (simple-array fixnum 1) 1) *barning-matrixes*))
(defparameter *barning-matrixes*
  (map '(simple-array (simple-array fixnum 1) 1)
       (lambda (list)
         (make-array 3 :element-type 'fixnum
                       :initial-contents list))
       '((1 -2 2) (2 -1 2) (2 -2 3)
         (1 2 2) (2 1 2) (2 2 3)
         (-1 2 2) (-2 1 2) (-2 2 3))))

(declaim (type (simple-array fixnum 1) *lengths*))
(defparameter *lengths* (make-array 1500001 :element-type 'fixnum
                                            :initial-element 0))

(declaim (ftype (function ((simple-array fixnum 1))) expand-node))
(defun expand-node (n)
  "Takes a primitive Pythagorean triple in a vector and traverses subsequent nodes in the the tree of primitives until perimeter > 1,500,000"
  (loop with list-of-ns = (list n)
        for n = (pop list-of-ns)
        while n
        do (let ((perimeter (let ((result 0))
                              (declare (type fixnum result))
                              (dotimes (i (length n) result)
                                (incf result (aref n i))))))
             (declare (type fixnum perimeter))
             (unless (> perimeter 1500000)
               (let ((next-nodes
                       (let ((result (list)))
                         (dotimes (matrix 9 (nreverse result))
                           (let ((matrix (aref *barning-matrixes* matrix)))
                             (push (let ((result 0))
                                     (declare (type fixnum result))
                                     (dotimes (i 3 result)
                                       (incf result
                                             (the fixnum
                                                  (* (the fixnum (aref matrix i))
                                                     (the fixnum (aref n i)))))))
                                   result))))))
                 (declare (type list next-nodes))
                 (loop for i from perimeter to 1500000 by perimeter
                       do (incf (aref *lengths* i)))
                 (dotimes (i 3)
                   (push (make-array 3 :element-type 'fixnum
                                       :initial-contents (list (pop next-nodes)
                                                               (pop next-nodes)
                                                               (pop next-nodes)))
                         list-of-ns))))))
  (values))

On my slow laptop,

CL-USER> (load (compile-file #P"e75.lisp"))
; ...compilation notes...
CL-USER> (time (expand-node (make-array 3 :element-type 'fixnum
                                          :initial-contents '(3 4 5))))
Evaluation took:
  0.274 seconds of real time
  0.264000 seconds of total run time (0.264000 user, 0.000000 system)
  96.35% CPU
  382,768,596 processor cycles
  35,413,600 bytes consed

; No values
CL-USER> (count 1 *lengths*)
161667 (18 bits, #x27783)

The original code ran at around ~1.8 seconds with vectors, and 0.8 seconds with lists.

Caudex answered 21/7, 2016 at 8:34 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.