Why do literals of the form [...] have meaning that appears to depend upon context?

Asked 29/6, 2015 at 14:30 Answered 29/6, 2015 at 18:12

Consider the following program:

{$APPTYPE CONSOLE}

type
  TMyEnum = (enum1, enum2, enum3);

var
  Arr: TArray<TMyEnum>;
  Enum: TMyEnum;

begin
  Arr := [enum3, enum1]; // <-- this is an array
  for Enum in Arr do
    Writeln(ord(Enum));
  Writeln('---');

  for Enum in [enum3, enum1] do // <-- this looks very much like the array above
    Writeln(ord(Enum));
  Writeln('---');

  Readln;
end.

The output is:

2
0
---
0
2
---

Why do the two loops produce different output?

Lovable answered 29/6, 2015 at 14:30 Comment(0)

for Enum in Arr do
  Writeln(ord(Enum));

Here, Arr is an array, and so the items of the array are output in order. The documentation says:

The array is traversed in increasing order.

Hence 2 is output before 0.

for Enum in [enum3, enum1] do
  Writeln(ord(Enum));

Here, [enum3, enum1] is a set and the enumerator for a set happens to enumerate in order of increasing ordinal value. So the output has 0 first.

I don't think that is stated anywhere in the documentation that sets are enumerated in that order, but empirically that appears to be the case. However, since sets are an unordered type one should not be relying on their enumeration order anyway.

So the question then becomes understanding how [...] can be a set or an array at different points in the code. This all stems from the new XE7 dynamic array syntax which introduces (another) syntactical ambiguity. When we write

Arr := [enum3, enum1];

then [enum3, enum1] is an array. The compiler knows that Arr is an array and that information defines the type of the literal.

But when we write

for Enum in [enum3, enum1] do

then [enum3, enum1] is a set. Here the literal could in principle be either array or set. In such situations, I believe that the compiler will always prefer sets.

Again, I can't find any documentation that states that this is so, but empirically this is the case. Presumably since set enumerators pre-date the new dynamic array syntax they take precedence when there is ambiguity.

The meaning of a literal of the form [...] depends on its context.

Lovable answered 29/6, 2015 at 14:34 Comment(36)

Generally, until the new syntax for dynarrays came around (XE7 or 8?), [] was always a set. Only after the new synatx was introduced, dynarrays could be initialized with [] syntax. So indeed, Arr is an array (what else?), while the general literal [...] has always denoted a set, since the good old Pascal days.So there is nothing weird about it, IMO. Enumerators do not necessarily have a special order, but IIRC it was mentioned somewhere that sets are enumerated in increasing ordinal order. – Inigo 29/6, 2015 at 15:53

@Rudy Can you provide a documentation link for sets being enumerated in increasing ordinal order? – Lovable 29/6, 2015 at 15:57

@Rudy As for literal [...] has always denoted a set, well that's not true any more. It would appear to be a set, unless it is an array. It depends on the context. I cannot find any documentation for that. It would be great if you could dig it out. – Lovable 29/6, 2015 at 15:58

I only vaguely remember it, so I don't really know anymore where that was documented, sorry. When were enumerators for sets introduced? It must have been around that time. – Inigo 29/6, 2015 at 16:0

@Rudy Unless it's in the official documentation, then it's not that relevant, in my view. Documentation by way of dimly remembered articles that may or may not ever have existed isn't much use. – Lovable 29/6, 2015 at 16:1

I said "has always denoted" (except of course in open parameters and in array declarations). And that is true since Wirth's days. The open parameter syntax was a choice made in the Delphi 4 or so days, and I would have liked a different one, actually, especially since [...]was already taken by sets (and array declarations, which will never be confused with set literals anyway, IMO). – Inigo 29/6, 2015 at 16:3

Note that non-dynamic const array literals use (...) for arrays. I would have understood if they had used those for dynarray literals too, again, to avoid the confusuon with the existing set syntax. – Inigo 29/6, 2015 at 16:6

I do remember it dimly, but that does not mean it was not official documentation. I simply don't know where exactly. – Inigo 29/6, 2015 at 16:7

FWIW, it is not really safe to rely on any order of enumeration, IMO, not even that for arrays. If you want a reliable order, use indexing or some such. – Inigo 29/6, 2015 at 16:9

@RudyVelthuis For a set I think that is a valid statement. For arrays the documentation says otherwise. It says: The array is traversed in increasing order, starting at the lowest array bound and ending at the array size minus one. And of course that is important. It would be utterly useless if the enumerator for an array did not have a defined order. Likewise the enumeration order for ordered collections is well defined and can be relied upon. – Lovable 29/6, 2015 at 16:11

For sets the documentation states A set is a collection of values of the same ordinal type. The values have no inherent order[...] docwiki.embarcadero.com/RADStudio/XE8/en/Structured_Types So it doesn't say that it will traverse a set in reverse order,but it also seems to imply that there is no natural order to expect a set to be traversed in. To expect it to be traversed in reverse order may also be exploiting undefined behaviour. – Deloris 29/6, 2015 at 16:22

@Deloris I'm not expecting a set to be traversed in any order. I was pointing out that naively one might think that if [enum3, enum1] was an array in one place, then it would be an array in another place. – Lovable 29/6, 2015 at 16:25

Yes, I agree - you did ask if there was any documentation to support the notion that sets are enumerated in increasing order, however. This was only to give an example from the documentation that suggests that there is no order to be expected (and that enumeration in increasing order is not necessarily guaranteed). I agree that the dynamic array syntax is an unfortunate and confusing choice – Deloris 29/6, 2015 at 16:27

@David: it would not surprise me if someone managed to define a type of enumerator on a helper type that made for-in traverse an array in reverse order. – Inigo 29/6, 2015 at 16:27

But why would you expect [a, b] to be an array, unless you knew the type was declared as such? [...]is the classic set syntax, Only after some introduced square brackets for array literals, the confusion arose. So the safe way is to think of sets first, and then look if it might be an array. If untyped, it is a set. – Inigo 29/6, 2015 at 16:31

@David: is a dictionary an ordered collection? – Inigo 29/6, 2015 at 16:33

@Rudy In the code in the question [enum3, enum1] is an array at one point and later is a set. So it's not so much a matter of expectation, rather one of fact. – Lovable 29/6, 2015 at 16:33

@Rudy A dictionary is an unordered collection. – Lovable 29/6, 2015 at 16:34

If I am reading code, and I have not seen everything yet. and I see a [...]literal, I expect it to be a set. If I know the type beforehand, then that is different of course. – Inigo 29/6, 2015 at 16:35

The term [enum3, enum1] is an array in the first line only because the assignment target is an array. If the target were a set than the same construct would be interpreted as a set. The precedence of a set over an array in the for loop is just that the set has older rights. You might change that with an explicit type cast. – Wensleydale 29/6, 2015 at 16:40

@Rudy If I know the type beforehand. Well that's it. It's context dependent. Which is unfortunate. In an ideal world that would not be the case. Other languages strive to avoid such scenarios. – Lovable 29/6, 2015 at 16:40

It would also be perfectly plausible for the language designers to have said that, what the hell, we've already got context sensitive typing of literals, so let's decide that [...] in a for in loop is more likely to be used as an array. If they were starting from scratch and had to decide between array and set then array would be a better choice. But it is set to avoid changing the meaning of historical code. – Lovable 29/6, 2015 at 16:43

I fully agree that it is context dependent, but that is not really new. One time, 7 is an integer while another, it is a Single. Or 'A' is either a Char or a WideString or an AnsiString, etc. That is the case with literals. But [...]differ a little, because they are only arrays in some very strictly defined syntactical constructs. – Inigo 29/6, 2015 at 16:45

@Rudy In other languages though there is no such ambiguity and the benefit of that is clarity and predictability for the reader of the code. It's a shame that Pascal didn't go that way. – Lovable 29/6, 2015 at 16:46

I am ambivalent. Pascal sets are something very typical of the language. Delphi's designers should not have used a similar syntax for array literals. They already had the (..., ...) syntax for constant array literals. They should have used that for open parameters and dynarray literals too. (..., ...) can otherwise only be a parameter list, and the risk of confusion between the two is more or less zero. – Inigo 29/6, 2015 at 16:49

Pascal existed well before most of those languages, and sets are something none of those languages had. I disagree there is no such ambiguity. Many other languages use {...} for arrays as well as for structs or enums. Not sure if I would have liked that. Actually, i am sure I would not have liked it. – Inigo 29/6, 2015 at 16:52

And I find Pascal actually much more readable than many other languages. Just think of the onion-like peeling/jumping you sometimes have to do to understand a C declaration. I had a few examples on my website, but until I have a new provider, it is offline, for the moment. – Inigo 29/6, 2015 at 16:55

@RudyVelthuis I find Pascal to be more readable than many other languages and did not say otherwise. I don't think it is perfect though. All languages have their strengths and weaknesses. And even a feature that is a strength one day, can be a weakness the next. You don't need to teach me that C declarations are hard to read! – Lovable 29/6, 2015 at 16:56

@RudyVelthuis For literals in C, C++ and C#, I believe that the literals type is determined entirely by the literal. In other words, the type of the lhs of an assignment cannot influence the type of the rhs of the assignment. Am I wrong? – Lovable 29/6, 2015 at 17:3

@Deloris and FWIW my request for docs concerning set enumeration order was mainly to call Rudy out because I don't believe that any enumeration order guarantee is given. – Lovable 30/6, 2015 at 6:50

@DavidHeffernan That much was clear - thought I'd just give the position some support. – Deloris 30/6, 2015 at 9:14

@Rudy As much as people familiar with "classic" Delphi would instinctviely interpret [...] as being a set by default where context is ambiguous, "newcomers" won't have our benefit. This is certainly something that needs to be documented. – Cataract 30/6, 2015 at 11:48

I actually don't know if this is the case, but I guess you are right. In Pascal, however, 'A' can be a Char or any of a range of string types. C does not know the concept of true constants, but Delphi does. And a true constant like 7 does not have any type of its own. The type is assumed when it is assigned (or used as parameter). – Inigo 30/6, 2015 at 12:58

@Rudy A true constant like 7 does not have any type of its own. The documentation states otherwise docwiki.embarcadero.com/RADStudio/en/Declared_Constants – Lovable 30/6, 2015 at 14:30

Many things have changed in the docs, and not always correctly. – Inigo 30/6, 2015 at 14:37

@RudyVelthuis Despite the passage of time, your unshakeable self-belief remains unchallenged, an unassailable Tower of Truth! ;-) And yet the same documentation can be found in the Delphi 5 help..... – Lovable 30/6, 2015 at 15:12

Because an array contains order information and a set does not.

Explanation with the use of documentation:

The internal data format of a static or dynamic array:

is stored as a contiguous sequence of elements of the component type of the array. The components with the lowest indexes are stored at the lowest memory addresses.

Walking over these indices with a for in loop is done in incremental order:

The array is traversed in increasing order, starting at the lowest array bound and ending at the array size minus one.

On the other side, the internal data format of a set:

is a bit array where each bit indicates whether an element is in the set or not.

Thus all these "indiced bits" are stored in one and the same "value". That is why a set can be typecasted to an Integer type, and why the order in which the bits are added is lost: [enum3, enum1] = [enum1, enum3].

Angelineangelique answered 29/6, 2015 at 17:36 Comment(4)

I suppose I was more wondering why [...] was an array sometimes and a set at other times. I should have made that more clear. – Lovable 29/6, 2015 at 17:38

Then the answer would be: Because the context is determined by the declaration of the types, it is the language. ;) – Angelineangelique 29/6, 2015 at 17:42

That's what I was really driving at. And of course I knew the answer when I asked the question. If I've missed any documentation that explains this then that would be nice to see. +1 – Lovable 29/6, 2015 at 17:44

Yeah, it appears the documentation simply has to catch up with this new form of constant array declaration. – Angelineangelique 29/6, 2015 at 17:51

for Enum in Arr do
  Writeln(ord(Enum));