Why do literals of the form [...] have meaning that appears to depend upon context?
Asked Answered
L

3

5

Consider the following program:

{$APPTYPE CONSOLE}

type
  TMyEnum = (enum1, enum2, enum3);

var
  Arr: TArray<TMyEnum>;
  Enum: TMyEnum;

begin
  Arr := [enum3, enum1]; // <-- this is an array
  for Enum in Arr do
    Writeln(ord(Enum));
  Writeln('---');

  for Enum in [enum3, enum1] do // <-- this looks very much like the array above
    Writeln(ord(Enum));
  Writeln('---');

  Readln;
end.

The output is:

2
0
---
0
2
---

Why do the two loops produce different output?

Lovable answered 29/6, 2015 at 14:30 Comment(0)
L
3
for Enum in Arr do
  Writeln(ord(Enum));

Here, Arr is an array, and so the items of the array are output in order. The documentation says:

The array is traversed in increasing order.

Hence 2 is output before 0.

for Enum in [enum3, enum1] do
  Writeln(ord(Enum));

Here, [enum3, enum1] is a set and the enumerator for a set happens to enumerate in order of increasing ordinal value. So the output has 0 first.

I don't think that is stated anywhere in the documentation that sets are enumerated in that order, but empirically that appears to be the case. However, since sets are an unordered type one should not be relying on their enumeration order anyway.


So the question then becomes understanding how [...] can be a set or an array at different points in the code. This all stems from the new XE7 dynamic array syntax which introduces (another) syntactical ambiguity. When we write

Arr := [enum3, enum1];

then [enum3, enum1] is an array. The compiler knows that Arr is an array and that information defines the type of the literal.

But when we write

for Enum in [enum3, enum1] do

then [enum3, enum1] is a set. Here the literal could in principle be either array or set. In such situations, I believe that the compiler will always prefer sets.

Again, I can't find any documentation that states that this is so, but empirically this is the case. Presumably since set enumerators pre-date the new dynamic array syntax they take precedence when there is ambiguity.

The meaning of a literal of the form [...] depends on its context.

Lovable answered 29/6, 2015 at 14:34 Comment(36)
Generally, until the new syntax for dynarrays came around (XE7 or 8?), [] was always a set. Only after the new synatx was introduced, dynarrays could be initialized with [] syntax. So indeed, Arr is an array (what else?), while the general literal [...] has always denoted a set, since the good old Pascal days.So there is nothing weird about it, IMO. Enumerators do not necessarily have a special order, but IIRC it was mentioned somewhere that sets are enumerated in increasing ordinal order.Inigo
@Rudy Can you provide a documentation link for sets being enumerated in increasing ordinal order?Lovable
@Rudy As for literal [...] has always denoted a set, well that's not true any more. It would appear to be a set, unless it is an array. It depends on the context. I cannot find any documentation for that. It would be great if you could dig it out.Lovable
I only vaguely remember it, so I don't really know anymore where that was documented, sorry. When were enumerators for sets introduced? It must have been around that time.Inigo
@Rudy Unless it's in the official documentation, then it's not that relevant, in my view. Documentation by way of dimly remembered articles that may or may not ever have existed isn't much use.Lovable
I said "has always denoted" (except of course in open parameters and in array declarations). And that is true since Wirth's days. The open parameter syntax was a choice made in the Delphi 4 or so days, and I would have liked a different one, actually, especially since [...]was already taken by sets (and array declarations, which will never be confused with set literals anyway, IMO).Inigo
Note that non-dynamic const array literals use (...) for arrays. I would have understood if they had used those for dynarray literals too, again, to avoid the confusuon with the existing set syntax.Inigo
I do remember it dimly, but that does not mean it was not official documentation. I simply don't know where exactly.Inigo
FWIW, it is not really safe to rely on any order of enumeration, IMO, not even that for arrays. If you want a reliable order, use indexing or some such.Inigo
@RudyVelthuis For a set I think that is a valid statement. For arrays the documentation says otherwise. It says: The array is traversed in increasing order, starting at the lowest array bound and ending at the array size minus one. And of course that is important. It would be utterly useless if the enumerator for an array did not have a defined order. Likewise the enumeration order for ordered collections is well defined and can be relied upon.Lovable
For sets the documentation states A set is a collection of values of the same ordinal type. The values have no inherent order[...] docwiki.embarcadero.com/RADStudio/XE8/en/Structured_Types So it doesn't say that it will traverse a set in reverse order,but it also seems to imply that there is no natural order to expect a set to be traversed in. To expect it to be traversed in reverse order may also be exploiting undefined behaviour.Deloris
@Deloris I'm not expecting a set to be traversed in any order. I was pointing out that naively one might think that if [enum3, enum1] was an array in one place, then it would be an array in another place.Lovable
Yes, I agree - you did ask if there was any documentation to support the notion that sets are enumerated in increasing order, however. This was only to give an example from the documentation that suggests that there is no order to be expected (and that enumeration in increasing order is not necessarily guaranteed). I agree that the dynamic array syntax is an unfortunate and confusing choiceDeloris
@David: it would not surprise me if someone managed to define a type of enumerator on a helper type that made for-in traverse an array in reverse order.Inigo
But why would you expect [a, b] to be an array, unless you knew the type was declared as such? [...]is the classic set syntax, Only after some introduced square brackets for array literals, the confusion arose. So the safe way is to think of sets first, and then look if it might be an array. If untyped, it is a set.Inigo
@David: is a dictionary an ordered collection?Inigo
@Rudy In the code in the question [enum3, enum1] is an array at one point and later is a set. So it's not so much a matter of expectation, rather one of fact.Lovable
@Rudy A dictionary is an unordered collection.Lovable
If I am reading code, and I have not seen everything yet. and I see a [...]literal, I expect it to be a set. If I know the type beforehand, then that is different of course.Inigo
The term [enum3, enum1] is an array in the first line only because the assignment target is an array. If the target were a set than the same construct would be interpreted as a set. The precedence of a set over an array in the for loop is just that the set has older rights. You might change that with an explicit type cast.Wensleydale
@Rudy If I know the type beforehand. Well that's it. It's context dependent. Which is unfortunate. In an ideal world that would not be the case. Other languages strive to avoid such scenarios.Lovable
It would also be perfectly plausible for the language designers to have said that, what the hell, we've already got context sensitive typing of literals, so let's decide that [...] in a for in loop is more likely to be used as an array. If they were starting from scratch and had to decide between array and set then array would be a better choice. But it is set to avoid changing the meaning of historical code.Lovable
I fully agree that it is context dependent, but that is not really new. One time, 7 is an integer while another, it is a Single. Or 'A' is either a Char or a WideString or an AnsiString, etc. That is the case with literals. But [...]differ a little, because they are only arrays in some very strictly defined syntactical constructs.Inigo
@Rudy In other languages though there is no such ambiguity and the benefit of that is clarity and predictability for the reader of the code. It's a shame that Pascal didn't go that way.Lovable
I am ambivalent. Pascal sets are something very typical of the language. Delphi's designers should not have used a similar syntax for array literals. They already had the (..., ...) syntax for constant array literals. They should have used that for open parameters and dynarray literals too. (..., ...) can otherwise only be a parameter list, and the risk of confusion between the two is more or less zero.Inigo
Pascal existed well before most of those languages, and sets are something none of those languages had. I disagree there is no such ambiguity. Many other languages use {...} for arrays as well as for structs or enums. Not sure if I would have liked that. Actually, i am sure I would not have liked it.Inigo
And I find Pascal actually much more readable than many other languages. Just think of the onion-like peeling/jumping you sometimes have to do to understand a C declaration. I had a few examples on my website, but until I have a new provider, it is offline, for the moment.Inigo
@RudyVelthuis I find Pascal to be more readable than many other languages and did not say otherwise. I don't think it is perfect though. All languages have their strengths and weaknesses. And even a feature that is a strength one day, can be a weakness the next. You don't need to teach me that C declarations are hard to read!Lovable
@RudyVelthuis For literals in C, C++ and C#, I believe that the literals type is determined entirely by the literal. In other words, the type of the lhs of an assignment cannot influence the type of the rhs of the assignment. Am I wrong?Lovable
@Deloris and FWIW my request for docs concerning set enumeration order was mainly to call Rudy out because I don't believe that any enumeration order guarantee is given.Lovable
@DavidHeffernan That much was clear - thought I'd just give the position some support.Deloris
@Rudy As much as people familiar with "classic" Delphi would instinctviely interpret [...] as being a set by default where context is ambiguous, "newcomers" won't have our benefit. This is certainly something that needs to be documented.Cataract
I actually don't know if this is the case, but I guess you are right. In Pascal, however, 'A' can be a Char or any of a range of string types. C does not know the concept of true constants, but Delphi does. And a true constant like 7 does not have any type of its own. The type is assumed when it is assigned (or used as parameter).Inigo
@Rudy A true constant like 7 does not have any type of its own. The documentation states otherwise docwiki.embarcadero.com/RADStudio/en/Declared_ConstantsLovable
Many things have changed in the docs, and not always correctly.Inigo
@RudyVelthuis Despite the passage of time, your unshakeable self-belief remains unchallenged, an unassailable Tower of Truth! ;-) And yet the same documentation can be found in the Delphi 5 help.....Lovable
A
4

Because an array contains order information and a set does not.


Explanation with the use of documentation:

The internal data format of a static or dynamic array:

is stored as a contiguous sequence of elements of the component type of the array. The components with the lowest indexes are stored at the lowest memory addresses.

Walking over these indices with a for in loop is done in incremental order:

The array is traversed in increasing order, starting at the lowest array bound and ending at the array size minus one.

On the other side, the internal data format of a set:

is a bit array where each bit indicates whether an element is in the set or not.

Thus all these "indiced bits" are stored in one and the same "value". That is why a set can be typecasted to an Integer type, and why the order in which the bits are added is lost: [enum3, enum1] = [enum1, enum3].

Angelineangelique answered 29/6, 2015 at 17:36 Comment(4)
I suppose I was more wondering why [...] was an array sometimes and a set at other times. I should have made that more clear.Lovable
Then the answer would be: Because the context is determined by the declaration of the types, it is the language. ;)Angelineangelique
That's what I was really driving at. And of course I knew the answer when I asked the question. If I've missed any documentation that explains this then that would be nice to see. +1Lovable
Yeah, it appears the documentation simply has to catch up with this new form of constant array declaration.Angelineangelique
L
3
for Enum in Arr do
  Writeln(ord(Enum));

Here, Arr is an array, and so the items of the array are output in order. The documentation says:

The array is traversed in increasing order.

Hence 2 is output before 0.

for Enum in [enum3, enum1] do
  Writeln(ord(Enum));

Here, [enum3, enum1] is a set and the enumerator for a set happens to enumerate in order of increasing ordinal value. So the output has 0 first.

I don't think that is stated anywhere in the documentation that sets are enumerated in that order, but empirically that appears to be the case. However, since sets are an unordered type one should not be relying on their enumeration order anyway.


So the question then becomes understanding how [...] can be a set or an array at different points in the code. This all stems from the new XE7 dynamic array syntax which introduces (another) syntactical ambiguity. When we write

Arr := [enum3, enum1];

then [enum3, enum1] is an array. The compiler knows that Arr is an array and that information defines the type of the literal.

But when we write

for Enum in [enum3, enum1] do

then [enum3, enum1] is a set. Here the literal could in principle be either array or set. In such situations, I believe that the compiler will always prefer sets.

Again, I can't find any documentation that states that this is so, but empirically this is the case. Presumably since set enumerators pre-date the new dynamic array syntax they take precedence when there is ambiguity.

The meaning of a literal of the form [...] depends on its context.

Lovable answered 29/6, 2015 at 14:34 Comment(36)
Generally, until the new syntax for dynarrays came around (XE7 or 8?), [] was always a set. Only after the new synatx was introduced, dynarrays could be initialized with [] syntax. So indeed, Arr is an array (what else?), while the general literal [...] has always denoted a set, since the good old Pascal days.So there is nothing weird about it, IMO. Enumerators do not necessarily have a special order, but IIRC it was mentioned somewhere that sets are enumerated in increasing ordinal order.Inigo
@Rudy Can you provide a documentation link for sets being enumerated in increasing ordinal order?Lovable
@Rudy As for literal [...] has always denoted a set, well that's not true any more. It would appear to be a set, unless it is an array. It depends on the context. I cannot find any documentation for that. It would be great if you could dig it out.Lovable
I only vaguely remember it, so I don't really know anymore where that was documented, sorry. When were enumerators for sets introduced? It must have been around that time.Inigo
@Rudy Unless it's in the official documentation, then it's not that relevant, in my view. Documentation by way of dimly remembered articles that may or may not ever have existed isn't much use.Lovable
I said "has always denoted" (except of course in open parameters and in array declarations). And that is true since Wirth's days. The open parameter syntax was a choice made in the Delphi 4 or so days, and I would have liked a different one, actually, especially since [...]was already taken by sets (and array declarations, which will never be confused with set literals anyway, IMO).Inigo
Note that non-dynamic const array literals use (...) for arrays. I would have understood if they had used those for dynarray literals too, again, to avoid the confusuon with the existing set syntax.Inigo
I do remember it dimly, but that does not mean it was not official documentation. I simply don't know where exactly.Inigo
FWIW, it is not really safe to rely on any order of enumeration, IMO, not even that for arrays. If you want a reliable order, use indexing or some such.Inigo
@RudyVelthuis For a set I think that is a valid statement. For arrays the documentation says otherwise. It says: The array is traversed in increasing order, starting at the lowest array bound and ending at the array size minus one. And of course that is important. It would be utterly useless if the enumerator for an array did not have a defined order. Likewise the enumeration order for ordered collections is well defined and can be relied upon.Lovable
For sets the documentation states A set is a collection of values of the same ordinal type. The values have no inherent order[...] docwiki.embarcadero.com/RADStudio/XE8/en/Structured_Types So it doesn't say that it will traverse a set in reverse order,but it also seems to imply that there is no natural order to expect a set to be traversed in. To expect it to be traversed in reverse order may also be exploiting undefined behaviour.Deloris
@Deloris I'm not expecting a set to be traversed in any order. I was pointing out that naively one might think that if [enum3, enum1] was an array in one place, then it would be an array in another place.Lovable
Yes, I agree - you did ask if there was any documentation to support the notion that sets are enumerated in increasing order, however. This was only to give an example from the documentation that suggests that there is no order to be expected (and that enumeration in increasing order is not necessarily guaranteed). I agree that the dynamic array syntax is an unfortunate and confusing choiceDeloris
@David: it would not surprise me if someone managed to define a type of enumerator on a helper type that made for-in traverse an array in reverse order.Inigo
But why would you expect [a, b] to be an array, unless you knew the type was declared as such? [...]is the classic set syntax, Only after some introduced square brackets for array literals, the confusion arose. So the safe way is to think of sets first, and then look if it might be an array. If untyped, it is a set.Inigo
@David: is a dictionary an ordered collection?Inigo
@Rudy In the code in the question [enum3, enum1] is an array at one point and later is a set. So it's not so much a matter of expectation, rather one of fact.Lovable
@Rudy A dictionary is an unordered collection.Lovable
If I am reading code, and I have not seen everything yet. and I see a [...]literal, I expect it to be a set. If I know the type beforehand, then that is different of course.Inigo
The term [enum3, enum1] is an array in the first line only because the assignment target is an array. If the target were a set than the same construct would be interpreted as a set. The precedence of a set over an array in the for loop is just that the set has older rights. You might change that with an explicit type cast.Wensleydale
@Rudy If I know the type beforehand. Well that's it. It's context dependent. Which is unfortunate. In an ideal world that would not be the case. Other languages strive to avoid such scenarios.Lovable
It would also be perfectly plausible for the language designers to have said that, what the hell, we've already got context sensitive typing of literals, so let's decide that [...] in a for in loop is more likely to be used as an array. If they were starting from scratch and had to decide between array and set then array would be a better choice. But it is set to avoid changing the meaning of historical code.Lovable
I fully agree that it is context dependent, but that is not really new. One time, 7 is an integer while another, it is a Single. Or 'A' is either a Char or a WideString or an AnsiString, etc. That is the case with literals. But [...]differ a little, because they are only arrays in some very strictly defined syntactical constructs.Inigo
@Rudy In other languages though there is no such ambiguity and the benefit of that is clarity and predictability for the reader of the code. It's a shame that Pascal didn't go that way.Lovable
I am ambivalent. Pascal sets are something very typical of the language. Delphi's designers should not have used a similar syntax for array literals. They already had the (..., ...) syntax for constant array literals. They should have used that for open parameters and dynarray literals too. (..., ...) can otherwise only be a parameter list, and the risk of confusion between the two is more or less zero.Inigo
Pascal existed well before most of those languages, and sets are something none of those languages had. I disagree there is no such ambiguity. Many other languages use {...} for arrays as well as for structs or enums. Not sure if I would have liked that. Actually, i am sure I would not have liked it.Inigo
And I find Pascal actually much more readable than many other languages. Just think of the onion-like peeling/jumping you sometimes have to do to understand a C declaration. I had a few examples on my website, but until I have a new provider, it is offline, for the moment.Inigo
@RudyVelthuis I find Pascal to be more readable than many other languages and did not say otherwise. I don't think it is perfect though. All languages have their strengths and weaknesses. And even a feature that is a strength one day, can be a weakness the next. You don't need to teach me that C declarations are hard to read!Lovable
@RudyVelthuis For literals in C, C++ and C#, I believe that the literals type is determined entirely by the literal. In other words, the type of the lhs of an assignment cannot influence the type of the rhs of the assignment. Am I wrong?Lovable
@Deloris and FWIW my request for docs concerning set enumeration order was mainly to call Rudy out because I don't believe that any enumeration order guarantee is given.Lovable
@DavidHeffernan That much was clear - thought I'd just give the position some support.Deloris
@Rudy As much as people familiar with "classic" Delphi would instinctviely interpret [...] as being a set by default where context is ambiguous, "newcomers" won't have our benefit. This is certainly something that needs to be documented.Cataract
I actually don't know if this is the case, but I guess you are right. In Pascal, however, 'A' can be a Char or any of a range of string types. C does not know the concept of true constants, but Delphi does. And a true constant like 7 does not have any type of its own. The type is assumed when it is assigned (or used as parameter).Inigo
@Rudy A true constant like 7 does not have any type of its own. The documentation states otherwise docwiki.embarcadero.com/RADStudio/en/Declared_ConstantsLovable
Many things have changed in the docs, and not always correctly.Inigo
@RudyVelthuis Despite the passage of time, your unshakeable self-belief remains unchallenged, an unassailable Tower of Truth! ;-) And yet the same documentation can be found in the Delphi 5 help.....Lovable
H
1

While not always ideal, the compiler is using context to determine the type of the right hand side. You can look at character strings as a good example of this:

If constantExpression is a character string, the declared constant is compatible with any string type. If the character string is of length 1, it is also compatible with any character type.

In the character string case, the compiler will use the left hand side to determine the type of the right hand side. The difference between this and the code in the question is that this case is clearly documented whereas the case in the question is not.

An example using characters:

{$APPTYPE CONSOLE}

uses
  SysUtils, Classes;

var
  A: Char;
  B: AnsiChar;

begin
  A := 'a';
  B := 'a';

  Writeln(A);
  Writeln(B);

  Readln;
end.

The assembler generated from the two indicates that the right hand side is being treated differently in the two cases:

Project10.dpr.17: A := 'a';
004D6731 66C705C8034E006100 mov word ptr [$004e03c8],$0061
Project10.dpr.18: B := 'a';
004D673A C605CA034E0061   mov byte ptr [$004e03ca],$61

The compiler is using the destination type of the assignment to determine what type the character string (in this case 'a') should be. A similar thing is happening in the question.

Thanks to David for the additional information in the comments

Haemostatic answered 29/6, 2015 at 18:12 Comment(9)
What you say is true in general but the example is not compelling. It's perfectly possible for 1 to be an integer and have the compiler emit that code in the assignment to floating point type. That's a perfectly acceptable type promotion that the compiler can do at compile time. You would write MyReal := MyInt happily and not assume that MyInt had to be a real type. You'd be quite happy to see the promotion performed at runtime.Lovable
The docs back this up: docwiki.embarcadero.com/RADStudio/en/… Numerals with decimal points or exponents denote reals, while other numerals denote integers.Lovable
@DavidHeffernan Ok, changed to use a better example. I agree with you, Int and Single was a bad example.Haemostatic
Again, the compiler could be doing the same. It is quite plausible that 'a' is an AnsiChar that is happily promoted. I think that's not so but it would be nice to demonstrate it. It is quite frustrating that a literal's type can be determined by context.Lovable
@DavidHeffernan If that was the case, wouldn't one be able to assign an ansichar to a char implicitly?Haemostatic
And character strings are context dependent. The docs say If constantExpression is a character string, the declared constant is compatible with any string type. If the character string is of length 1, it is also compatible with any character type.Lovable
@DavidHeffernan But isn't that exactly the point? In this case they are saying that it's dependent on the context so the RHS (in this case a character string) is dependent on the context. The only difference between this and the case in your question is that this one is documented as such and the other isn't. I agree though, it's frustrating because the compiler can determine and outcome that you don't expect because it's changing the type on the RHS.Haemostatic
I was agreeing with you but saying that more justification would help in the answer text.Lovable
@DavidHeffernan Thanks, I have updated the answer. I hope it includes sufficient information now.Haemostatic

© 2022 - 2024 — McMap. All rights reserved.