C's aversion to arrays [closed]

T

3

6

In introductory books on C it is often claimed that pointers more or less are arrays. Isn't this a vast simplification, at best?

There is an array type in C and it can behave completely different from pointers, for example:

#include <stdio.h>

int main(int argc, char *argv[]){
  int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
  int *b = a;
  printf("sizeof(a) = %lu\n", sizeof(a));
  printf("sizeof(b) = %lu\n", sizeof(b));
  return 0;
}

gives the output

sizeof(a) = 40 
sizeof(b) = 8

or as another example a = b would give a compilation error (GCC: "assignment to expression with array type").

Of course there is a close relationship between pointers and arrays, in the sense that yes, the content of an array variable itself is the memory address of the first array element, e.g. int a[10] = {777, 1, 2, 3, 4, 5, 6, 7, 8, 9}; printf("a = %ul\n", a); prints the address containing the 777.

Now, on the one hand, if you 'hide' arrays in structs, you can easily copy large amounts of data (arrays if you ignore the wrapping struct) just by using the = operator (and that's even fast, too):

#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARRAY_LENGTH 100000000

typedef struct {int arr[ARRAY_LENGTH];} struct_huge_array;

int main(int argc, char *argv[]){
  struct_huge_array *a = malloc(sizeof(struct_huge_array));
  struct_huge_array *b = malloc(sizeof(struct_huge_array));

  int *x = malloc(sizeof(int)*ARRAY_LENGTH);
  int *y = malloc(sizeof(int)*ARRAY_LENGTH);

  struct timeval start, end, diff;

  gettimeofday(&start, NULL);
  *a = *b;
  gettimeofday(&end, NULL);

  timersub(&end, &start, &diff);
  printf("Copying struct_huge_arrays took %d sec, %d µs\n", diff.tv_sec, diff.tv_usec); 

  gettimeofday(&start, NULL);
  memcpy(x, y, ARRAY_LENGTH*sizeof(int));
  gettimeofday(&end, NULL);

  timersub(&end, &start, &diff);
  printf("memcpy took %d sec, %d µs\n", diff.tv_sec, diff.tv_usec); 

  return 0;
}

Output:

Copying struct_huge_arrays took 0 sec, 345581 µs
memcpy took 0 sec, 345912 µs

But you cannot do this with arrays itself. For arrays x, y (of the same size and of the same type) the expression x = y is illegal.

Then, functions can't return arrays. Or if arrays are used as arguments, C collapses them into pointers -- it does not care if the size is explicitly given, so the following program gives the output sizeof(a) = 8:

#include <stdio.h>

void f(int p[10]){
  printf("sizeof(a) = %d\n", sizeof(p));
}

int main(int argc, char *argv[]){
  int a[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};

  f(a);

  return 0;
}

Is there any logic behind this aversion to arrays? Why isn't there a true robust array type in C? What bad would happen if there was one? After all, if an array is hidden in a struct the array does behave as in Go, Rust, ..., i.e. the array is the whole chunk in memory and passing it around will copy its content, not just the memory address of the first element. For example like in Go the following program

package main

import "fmt"

func main() {
    a := [2]int{-777, 777}
    var b [2]int
    b = a
    b[0] = 666

    fmt.Println(a)
    fmt.Println(b)
}

gives the output:

[-777 777]
[666 777]

Transcendence answered 24/2, 2016 at 8:33 Comment(8)

pointers are dissolved to the index of array, therefore the term 'they are more or less like arrays'. infact pointers and arrays are two completely different things. – Deranged 24/2, 2016 at 8:36

c-faq.com/aryptr/aryptr2.html – Butane 24/2, 2016 at 8:41

Instead of %lu, you should use %zu for size_t arguments. – Manhour 24/2, 2016 at 8:46

If book says that "pointers more or less are arrays", it tells me that author doesn't understand what s/he is saying, or doesn't take time necessary to explain things properly. Getting the model wrong in the beginning will make things more miserable in the long run, than getting it correct immediately. – Sayer 24/2, 2016 at 9:12

The short answer is that it would have worked fine to have arrays as first class objects, but it wasn't done that way in the 1970s and it's too late to change now – Novikoff 24/2, 2016 at 10:1

@Transcendence It contains the memory address of the first char of "hello", This is not correct. a 'contains' the string itself, p 'contains' the address of the string. The same syntax works for both examples, but the underlying assembly is different. – Complementary 24/2, 2016 at 10:40

@Transcendence Because a decays to a pointer if evaluated. Arrays cannot be passed in C, so what does printf receive? A pointer. – Complementary 24/2, 2016 at 11:7

@Transcendence C language is not defined by the output of some debugger. – Complementary 24/2, 2016 at 11:10

W

4

This part of the question...

Is there any logic behind this aversion to arrays? Why isn't there a true robust array type in C? What bad would happen if there was one?

... is not really a code question and open to speculation, but I think a short answer might be beneficial: when C was created, it was targeted at machines with very little RAM and slow CPUs (measured in Kilo-Bytes and Megahertz, resp.). It was meant to replace Assembler as systems programming language, but without introducing the overhead that the other existing high-level languages required. For the same reasons, C is still a popular language for micro controllers, due to the control it gives you over the generated program.

Introducing a 'robust' array type would have had under-the-hood performance and complexity penalties for both the compiler and the runtime, which not all systems couldn't afford. At the same time, C offers the capabilities for the programmer to create their own 'robust' array type and use them only in those situations where its use was justified.

I found this article interesting in this context: Dennis Ritchie: Development of the C Language (1993)

Warplane answered 24/2, 2016 at 9:57 Comment(1)

The run-time penalties would only exist if you used it inappropriately and pass them by value or assign them, doing a large copy. You could argue that making it easy to write slow code is a bad thing, though (like passing a std::vector by value in C++). Especially when RAM is tight, as well as CPU time. Early C didn't have struct, so I guess passing an array by value to a function would result in a asm that you couldn't get otherwise, depending on the calling convention. (You could manually implement (non-)hidden pointer, but not a big copy to the stack.) – Cosma 11/6, 2018 at 17:4

F

6

The C language was initially designed in the early 1970's on a PDP mini-computer which reportedly just filled up half a room, despite its huge 24 kB memory. (That's kB, not MB, or GB).

Fitting a compiler at all into that memory was the real challenge. So the C language was designed to allow you to write compact programs, and quite a few special operators (like +=, --, and ?:) was added for manual optimizations.

~~Adding features for copying large arrays as parameters didn't occur to the designers. It wouldn't have been useful anyway.~~

In C's predecessor, the B language, an array was represented as a pointer to storage allocated separately (see the link in Lars' answer). Ritchie wanted to avoid this extra pointer in C and so got the idea that the array name could be turned into a pointer when used in places not expecting an array:

It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

This invention enabled most existing B code to continue to work, despite the underlying shift in the language's semantics.

And structs didn't get added to the language until later. That you can pass an array inside a struct as a parameter was then a feature that offered another option.

Changing the syntax for arrays was already too late. It would break too many programs. There were already 100s of users...

Finned answered 24/2, 2016 at 9:53 Comment(2)

Your recent answer on From compiler perspective, how is reference for array dealt with, and, why passing by value(not decay) is not allowed? says that part of the reason is that C got this from B. (Lars's answer on this question has the same link about that history.) – Cosma 11/6, 2018 at 16:55

Oh, I had forgotten about this. Added some revised info pointing to the first hand source. Thanks! – Finned 11/6, 2018 at 21:4

W

4

This part of the question...

Is there any logic behind this aversion to arrays? Why isn't there a true robust array type in C? What bad would happen if there was one?

... is not really a code question and open to speculation, but I think a short answer might be beneficial: when C was created, it was targeted at machines with very little RAM and slow CPUs (measured in Kilo-Bytes and Megahertz, resp.). It was meant to replace Assembler as systems programming language, but without introducing the overhead that the other existing high-level languages required. For the same reasons, C is still a popular language for micro controllers, due to the control it gives you over the generated program.

Introducing a 'robust' array type would have had under-the-hood performance and complexity penalties for both the compiler and the runtime, which not all systems couldn't afford. At the same time, C offers the capabilities for the programmer to create their own 'robust' array type and use them only in those situations where its use was justified.

I found this article interesting in this context: Dennis Ritchie: Development of the C Language (1993)

Warplane answered 24/2, 2016 at 9:57 Comment(1)

The run-time penalties would only exist if you used it inappropriately and pass them by value or assign them, doing a large copy. You could argue that making it easy to write slow code is a bad thing, though (like passing a std::vector by value in C++). Especially when RAM is tight, as well as CPU time. Early C didn't have struct, so I guess passing an array by value to a function would result in a asm that you couldn't get otherwise, depending on the calling convention. (You could manually implement (non-)hidden pointer, but not a big copy to the stack.) – Cosma 11/6, 2018 at 17:4

W

1

Arrays are arrays and pointers are pointers, they are not the same.
But to make anything usable of arrays the compiler must use qualified pointers.
By definition an array is a contiguous and homogeneous sequence of elements in memory. So far so good, but how interact with it?
To explain the concept I already used, on other forums, an assembly example:

;int myarray[10] would be defined as
_myarray:    .resd  10
;now the pointer p (suppose 64 bit machine)
_p:          .resq  1

This is the code emitted by compiler to reserve an array of 10 int and a pointer to int in global memory.

Now when referring to the array what you think you can get? Just the address of course (or better the address of the first element). And the address what is? The standard says that it have to be called qualified pointer, but you can really understand now why it is so.
Now look the pointer, when we refer to it the compiler emits code to fetch the contents of the location at address p, but we can even get p itself, the address of the pointer variable, using &p, but we can't do it with an array. Using &myarray will give back the address of the first element again.
This means that you can assign myarray address to p, but not the reverse ;-)

Weasand answered 24/2, 2016 at 9:2 Comment(1)

Actually the question was why C behaves in this manner. What's the logic behind this strange crutch? Is there something bad about a true robust array type? But languages like in Go have it, so it can't be too bad...? – Transcendence 24/2, 2016 at 9:29

Recommended topics

Hot tags