What I'm used to is that a string is just a list or array of characters, like in most C-like languages. However, in the scheme implementations that I use, including Chicken and Racket, a string is not the same as a list of characters. Something like (car "mystring")
just won't fly. Instead there are functions to convert from and to lists. Why was this choice made? In my opinion Haskell does it the best way, there is literally no difference in any way between a list of chars and a string. I like this the most because it conveys the meaning of what is meant by a string in the clearest, simplest way. I'm not completely sure, but I'd guess that in the 'background' strings are lists or arrays of chars in almost any language. I'd especially expect a language like scheme with a focus on simplicity to handle strings in this way, or at least make is so you can do with strings what you can do with lists, like take the car
or cdr
What am I missing?
It looks like what you are really asking is, Why aren't there generic operations that work on both strings and lists?
Those do exist, in libraries like the generic collections library.
#lang racket/base
(require data/collection)
(first '(my list)) ; 'my
(first "mystring") ; #\m
Also, operations like map
from this library can work with multiple different types of collections together.
#lang racket/base
(require data/collection)
(define (digit->char n)
(first (string->immutable-string (number->string n))))
(first (map string (map digit->char '(1 2 3)) "abc"))
; "1a"
This doesn't mean that strings are lists, but it does mean that both strings and lists are sequences, and operations on sequences can work on both kinds of data types together.
According to the Racket documentation, strings are arrays of characters:
4.3 Strings
A string is a fixed-length array of characters.
An array, as the term is usually used in programming languages, and especially in C and C++, is a contiguous block of memory with the important property that it supports efficient random access. E.g., you can access the first element (x[0]) just as quickly as the nth (x[n-1]). Linked lists, the lists you encounter by default in Lisps, don't support efficient random access.
So, since strings are arrays in Racket, you'd expect there to be some counterpart to the x[i] notation (which isn't very Lispy). In Racket, you use string-ref and string-set!, which are documented on that same page. E.g.:
(string-ref "mystring" 1) ;=> #\y
(Now, there are also vector-ref and vector-set! procedures for more generalized vectors. In some Lisps, strings are also vectors, so you can use general vector and array manipulation functions on strings. I'm not much of a Racket user, so I'm not sure whether that applies in Racket as well.)
Technically "array" is usually a continuous piece of memory, while "list" is usually understood to be a single- or double-linked list of independently allocated objects. In most common programming languages, including C, and all Lisp and Scheme dialects that I know, for performance reasons string data is stored in an array in the sense that it is stored in a continuous piece of memory.
The confusion is that sometimes they might still be colloquially referred to as lists, which is not correct when understanding "list" as the precise technical term "linked list".
Were a string truly stored as list, including how Scheme and Lisp generally store one, every single character would have the overhead of being part of an object that contains the character and at least one pointer, that to the next character.
© 2022 - 2024 — McMap. All rights reserved.
Text
datatype from the text package for practical work. – Noh