Why is a string not a list of characters in scheme/racket?

Asked 21/1, 2016 at 15:24 Answered 21/1, 2016 at 18:4

What I'm used to is that a string is just a list or array of characters, like in most C-like languages. However, in the scheme implementations that I use, including Chicken and Racket, a string is not the same as a list of characters. Something like (car "mystring") just won't fly. Instead there are functions to convert from and to lists. Why was this choice made? In my opinion Haskell does it the best way, there is literally no difference in any way between a list of chars and a string. I like this the most because it conveys the meaning of what is meant by a string in the clearest, simplest way. I'm not completely sure, but I'd guess that in the 'background' strings are lists or arrays of chars in almost any language. I'd especially expect a language like scheme with a focus on simplicity to handle strings in this way, or at least make is so you can do with strings what you can do with lists, like take the car or cdr What am I missing?

Thready answered 21/1, 2016 at 15:24 Comment(2)

In C strings are arrays of characters, not linked lists. In many other languages (C++, C#, Java,...) strings are their own type. – Rundell 21/1, 2016 at 15:27

I will note that while the conceptual model Haskell uses is very nice, representing strings as singly-linked lists of characters happens to be notoriously slow. For that reason, nearly everyone uses the Text datatype from the text package for practical work. – Noh 21/1, 2016 at 22:49

It looks like what you are really asking is, Why aren't there generic operations that work on both strings and lists?

Those do exist, in libraries like the generic collections library.

#lang racket/base
(require data/collection)
(first '(my list)) ; 'my
(first "mystring") ; #\m

Also, operations like map from this library can work with multiple different types of collections together.

#lang racket/base
(require data/collection)
(define (digit->char n)
  (first (string->immutable-string (number->string n))))
(first (map string (map digit->char '(1 2 3)) "abc"))
; "1a"

This doesn't mean that strings are lists, but it does mean that both strings and lists are sequences, and operations on sequences can work on both kinds of data types together.

Omentum answered 21/1, 2016 at 18:4 Comment(0)

According to the Racket documentation, strings are arrays of characters:

4.3 Strings

A string is a fixed-length array of characters.

An array, as the term is usually used in programming languages, and especially in C and C++, is a contiguous block of memory with the important property that it supports efficient random access. E.g., you can access the first element (x[0]) just as quickly as the n^th (x[n-1]). Linked lists, the lists you encounter by default in Lisps, don't support efficient random access.

So, since strings are arrays in Racket, you'd expect there to be some counterpart to the x[i] notation (which isn't very Lispy). In Racket, you use string-ref and string-set!, which are documented on that same page. E.g.:

(string-ref "mystring" 1) ;=> #\y

(Now, there are also vector-ref and vector-set! procedures for more generalized vectors. In some Lisps, strings are also vectors, so you can use general vector and array manipulation functions on strings. I'm not much of a Racket user, so I'm not sure whether that applies in Racket as well.)

Euhemerism answered 21/1, 2016 at 16:50 Comment(2)

In C, C++ and similar languages strings are stored in contiguous memory as you say, but they do not these days have efficient random access in the general case. Unix-like systems tend to use UTF-8 and windows and java use UTF-16 to hold unicode characers. Both these are variable length encodings, requiring you to start from the beginning or end to iterate to a particular character. For random access you would need to stick to something like ISO-8859 (for 8-bit encodings) or the basic multilingual plane (for 16-bit encodings). – Oneiric 22/1, 2016 at 0:13

@Chris Vine good point, with variable width characters, they're no longer just byte arrays where character I is at position i-1. – Euhemerism 22/1, 2016 at 0:37

Technically "array" is usually a continuous piece of memory, while "list" is usually understood to be a single- or double-linked list of independently allocated objects. In most common programming languages, including C, and all Lisp and Scheme dialects that I know, for performance reasons string data is stored in an array in the sense that it is stored in a continuous piece of memory.

The confusion is that sometimes they might still be colloquially referred to as lists, which is not correct when understanding "list" as the precise technical term "linked list".

Were a string truly stored as list, including how Scheme and Lisp generally store one, every single character would have the overhead of being part of an object that contains the character and at least one pointer, that to the next character.

Derte answered 21/1, 2016 at 15:32 Comment(0)

4.3 Strings

Recommended topics

Hot tags