In Racket, what is the advantage of lists over vectors?
Asked Answered
T

4

19

In my experience with Racket so far, I've not given much thought to vectors, because I gathered that their main benefit — constant-time access to elements — was not significant until you're working with lot of elements.

This doesn't seem quite accurate, however. Even with a small number of elements, vectors have a performance advantage. For example, allocating a list is slower than allocating a vector:

#lang racket

(time (for ([i (in-range 1000000)]) (make-list 50 #t)))
(time (for ([i (in-range 1000000)]) (make-vector 50 #t)))

>cpu time: 1337 real time: 1346 gc time: 987
>cpu time: 123 real time: 124 gc time: 39

And retrieving an element is slower too:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 1000000)]) (list-ref l 49)))
(time (for ([i (in-range 1000000)]) (vector-ref v 49)))

>cpu time: 77 real time: 76 gc time: 0
>cpu time: 15 real time: 15 gc time: 0

BTW this performance ratio holds if we increase to 10 million:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 10000000)]) (list-ref l 49)))
(time (for ([i (in-range 10000000)]) (vector-ref v 49)))

>cpu time: 710 real time: 709 gc time: 0
>cpu time: 116 real time: 116 gc time: 0

Sure, these are synthetic examples, to the extent that most programs don't allocate structures or use list-ref a million times in a loop. (And yes, I'm deliberately grabbing the 50th element to illustrate the performance difference.)

But they're also not, because across a whole program that relies on lists, you're going to be incurring a little extra overhead every time you touch those lists, and all those little inefficiencies will add up into slower running time for the overall program.

Thus my question: why not just use vectors all the time? In what situations should we expect better performance from lists?

My best guess is that because it's just as fast to get an item off the front of a list, e.g.:

#lang racket

(define l (range 50))
(define v (make-vector 50 0))

(time (for ([i (in-range 1000000)]) (list-ref l 0)))
(time (for ([i (in-range 1000000)]) (vector-ref v 0)))

>cpu time: 15 real time: 16 gc time: 0
>cpu time: 12 real time: 11 gc time: 0

... that lists are preferred in recursion siutations, because you're mostly working with cons and car and cdr, and it saves space to work with a list (vectors cannot be broken and put back together without copying the whole vector, right?)

But in situations where you're storing and retrieving data elements, it seems that vectors have the advantage, no matter the length.

Testaceous answered 20/12, 2014 at 21:17 Comment(8)
I dont think I bothered using list-ref. The lookup is not linear.Norrie
When to use (from a more general sense) an array vs a linked list?Daliladalis
I'm pretty sure that even though this is a C++ video, it explains the issue here: youtube.com/watch?v=YQs6IC-vgmoPhocomelia
Note that length takes linear time too, so if you want to measure list-ref alone, move (length l) outside the for loop.Fecundity
Linked lists are a foundational building block of Racket & Scheme (& Lisp). In other languages, they're just a data structure. So the question cannot quite be answered by analogy to C++ (or JavaScript etc). The list doesn't have the same status in those languages.Testaceous
@Fecundity I removed length, but the relative performance difference of list-ref and vector-ref remains. Moreover, length is an order of magnitude slower than vector-length.Testaceous
@MatthewButterick: In Lisp and Scheme a list is just a singly linked list. I cannot think of any substantial benefits Lisp or Scheme would have over any other language. I know Clojure does things differently. I suspect the difference will be much less there than with a traditional implementation.Norrie
length for lists is probably calculated anew on each call. length of vector is surely maintained internally for each vector.Ingratiating
F
25

Since list-ref uses time linear to the index, it is rarely okay to use unless for short lists. If the access pattern is sequential however and the number of elements can vary, then lists are fine. It would be interesting to see a benchmark for summing the elements of a 50 element long list of fixnums.

The access pattern to a data structure is not always sequential though.

Here is how I choose which data structure to use in Racket:

DATA STRUCTURE   ACCESS       NUMBER     INDICES
List:            sequential   Variable   not used
Struct:          random       Fixed      names
Vector:          random       Fixed      integer
Growable vector: random       Variable   integer
Hash:            random       Variable   hashable
Splay:           random       Variable   non-integer, total order
Fecundity answered 21/12, 2014 at 12:1 Comment(4)
Very clear. This table, along with idiomatic examples, should be in the Racket docs.Testaceous
By the way, although some of these data structures are described in "Datatypes", the others are in "Data: Data Structures" (the data module). For example vector is in the former section but gvector (growable vector) is in the latter.Emelineemelita
The research paper “Functional Data Structures in Typed Racket” discusses alternative functional data structures that perform better than standard lists in certain contexts, based on reference implementations.Testaceous
@GregHendershott Thanks for sharing that info. Ha, interesting, somehow, on all of my visits of the Racket docs, I never found gvector stuff. Now looking at the TOC of neither docs.racket-lang.org/guide/index.html nor docs.racket-lang.org/reference/index.html I see that "Data: Data Structures" popping out. I think this might be a visibility issue.Leora
A
7

Vectors are the same as arrays in most programming languages. As any arrays they have a fixed size they have O(1) access/update. Increasing the size is expensive since you need to copy every element to a new vector of a larger size. If you do a loop across all elements you can do it O(n).

Lists are singly linked lists. They have dynamic size, but random access/update is O(n). Accessing/modifying the head of the list is O(1) so if you iterate from beginning to end or create from end to beginning. Since list iteration do that each step a whole iteration over n elements is still done O(n) as with vectors. Doing list-ref instead would make it O(n^2) so you don't.

The reason why you have both lists and vectors are because they both have strength and weaknesses. Lists are the heart of functional programming languages as they can be used as immutable objects. You chain one and one pair in each iteration and you end up with a list with size determined by the full procedure. Imaging this:

(define odds (filter odd? lst)) 

This takes a list of numbers of any size and creates a new list with all the odd numbers in the the list. In order to do this with vector you need to do two passes. One that checks what size the resulting vector should have and one that copies every odd element from the old to the new. However, if you need to have random access to any element at any time vectors (or hash tables if you're programming in #!racket) are the obvious choice.

Autoroute answered 20/12, 2014 at 22:33 Comment(0)
E
5

In your first example:

(time (for ([i (in-range 1000000)]) (make-list   50 #t))) ;50 million list nodes
(time (for ([i (in-range 1000000)]) (make-vector 50 #t))) ; 1 million vectors

Keep in mind that you're asking for 50x allocations with the list. It's actually not so bad that the GC time is ~20x and the real time is ~10x.

Also there's the initial #t value. Although I don't know if Racket implements it this way, for an array conceptually that requires just one malloc plus one memset -- "give me a range of memory, and bitblast this value across it." Whereas with a list that's 50 million movs to do?

list-ref is IMHO is a "code smell" -- or, at least, something where I'd check that the expected list length will be quite small. If you really need to index a big something, you probably want that something to be a vector (or perhaps a hash-table).

So what are the advantages of lists over vectors? I think basically the same advantages -- and disadvantages -- of linked lists over arrays in other languages.

Also you can build things beyond singly linked lists with cons, car, and cdr (such as trees). Although I'm not an expert on the history of Lisp, I imagine that was partly the motivation for choosing these building blocks?

Finally, I think it's also worth keeping in mind that micro-benchmarks like this are true... so far as they go. What they don't necessarily tell you is the situation in a real/full application. If your application is dominated by the time to allocate a million fixed-length data structures, then you probably do want a vector instead of a list. Otherwise, it's probably pretty far down the list of optimizations to consider.

Emelineemelita answered 21/12, 2014 at 0:0 Comment(0)
B
1

Your question has nothing to do with Racket; it stands as it is for arbitrary programming languages: what are some compelling advantages of lists over vectors? Well, just try to imagine how to insert an element somewhere in the middle of a vector and you'll understand. Or how to remove an element found at the middle of a vector. Both operations are done in O(1) time with lists, while with vectors you have to move plenty of elements around. Even more, with some extra work one can come up with a way of joining two lists (that do not have the same bottom element!) in constant time. Alas, you cannot do that with vectors in O(1) (you have to allocate a new vector large enough to hold the two operands, then copy all their elements into the newly-allocated space).

Finally, as somebody else commented above, for Lisp lists are not just yet another data structure; they are to be found at the very foundational layer of the language.

So yes, do not overlook lists just because you've got vectors. List DO have their fair share of advantages.

Boll answered 1/1, 2015 at 16:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.