C++ and PHP vs C# and Java - unequal results
Asked Answered
W

4

29

I found something a little strange in C# and Java. Let's look at this C++ code:

#include <iostream>
using namespace std;

class Simple
{
public:
    static int f()
    {
        X = X + 10;
        return 1;
    }

    static int X;
};
int Simple::X = 0;

int main() {
    Simple::X += Simple::f();
    printf("X = %d", Simple::X);
    return 0;
}

In a console you will see X = 11 (Look at the result here - IdeOne C++).

Now let's look at the same code on C#:

class Program
{
    static int x = 0;

    static int f()
    {
        x = x + 10;
        return 1;
    }

    public static void Main()
    {
        x += f();
        System.Console.WriteLine(x);
    }
}

In a console you will see 1 (not 11!) (look at the result here - IdeOne C# I know what you thinking now - "How that is possible?", but let's go to the following code.

Java code:

import java.util.*;
import java.lang.*;
import java.io.*;

/* Name of the class has to be "Main" only if the class is public. */
class Ideone
{
    static int X = 0;
    static int f()
    {
        X = X + 10;
        return 1;
    }
    public static void main (String[] args) throws java.lang.Exception
    {
        Formatter f = new Formatter();
        f.format("X = %d", X += f());
        System.out.println(f.toString());
    }
}

Result the same as in C# (X = 1, look at the result here).

And for the last time let's look at the PHP code:

<?php
class Simple
{
    public static $X = 0;

    public static function f()
    {
        self::$X = self::$X + 10;
        return 1;
    }
}

$simple = new Simple();
echo "X = " . $simple::$X += $simple::f();
?>

Result is 11 (look at the result here).

I have a little theory - these languages (C# and Java) are making a local copy of static variable X on the stack (are they ignoring the static keyword?). And that is reason why result in those languages is 1.

Is somebody here, who have other versions?

Wiltshire answered 15/8, 2014 at 8:11 Comment(23)
You can't expect two different languages to act in the same way, for the C# side of it it appears that since you are mid calculation, your interim results will be ignored since you are effectively doing x = (current)x + return value of f(). Just because different languages use the same operator syntax, does not make them operate the same wayKendalkendall
Rather than seeing a bug in C#, Java and PHP, I find the C++ result disturbing.Bausch
So now you've fixed your PHP code, and are getting the result 11, do you still believe that there is a bug in PHP? Because that's exactly the result I would expect to seeGrosz
Mark Baker, no (there is no PHP bug) php is OK now, i renamed topic to C++ and PHP vs C# and Java - not equal results.Wiltshire
As far as your theory goes (these languages (C#, Java and PHP) are making a local copy of static variable X on the stack): that's the behavior I would expect in a calculation.Bausch
btw, kudos for managing to get an upvoted question that contains tags for multiple languagesKendalkendall
But if you switch from using X += f() to using X = f() + X you get 11. My guess, as everyone else is saying, the languages that are outputting 1, are storing the current value of X on the stack before calling f() and then using that value in the calculation. Which TBH is not what I expected would happen by looking at the given sources.Appetence
Sayse, but X is static variable.Wiltshire
@Sheppard_ - Again, just because languages use the same keywords, doesn't make their functionality the sameKendalkendall
@Sayse, for the Java, C#, C++ and PHP the same information about static keyword - "static members belong to the class instead of a specific instance. It means that only one instance of a static field exists[1] even if you create a million instances of the class or you don't create any. It will be shared by all instances."Wiltshire
@Sheppard_ I believe he means the use of the += operator. Not the use of the static keyword.Appetence
@Sheppard_ You keep repeating the fact that the field is static, but the real meat of this issue is in how the compound assignment operations are processed. Take the compound assignment out of the equation, and C++ will produce the same result as the others.Bausch
@Sayse, yeah, maybe you right. But it's so sad, because we can't be sure about that. I think compiler developers may know the true answer.Wiltshire
@Robby Cornelissen, yeah i know it (read my last comment).Wiltshire
If you look at Christophe's answer, you will see that the result is undefined in C++. I assume that is the case for all languages portrayed.Appetence
@Sheppard_ - Its just one of the caveats of learning a new language, assume nothingKendalkendall
VB.Net returns 1...Just for your knowledge :)Lauder
@Eminem, thanks :) I think all .NET based languages will return 1.Wiltshire
JavaScript - var x = 0;function f(){x = x + 10;return 1;}console.log(x + f()); outputs 1.Mouthpart
I knew I'd already seen that exact code today: blogs.msdn.com/b/oldnewthing/archive/2014/08/14/10549885.aspx Also contains an explanation for why the result is what it is in C#.Graph
The answer is: don't do that!Jigaboo
I found something a little strange in C# and Java. Did you really? Exercise: predict @Sheppard_'s next "little strange" thing he finds.Passementerie
Cryptic crap plagiarised!Superfamily
L
48

The C++ standard states:

With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator. —end note ]

§5.17 [expr.ass]

Hence, as in the same evaluation you use X and a function with a side effect on X, the result is undefined, because:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

§1.9 [intro.execution]

It happens to be 11 on many compilers, but there is no guarantee that a C++ compiler won't give you 1 as for the other languages.

If you're still skeptical, another analysis of the standard leads to the same conclusion: THe standard also says in the same section as above:

The behavior of an expression of the form E1 op = E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once.

In you case X = X + f() except that X is evaluated only once.
As there is no guarantee on the order of evaluation, in X + f(), you cannot take for granted that first f is evaluated and then X.

Addendum

I'm not a Java expert, but the Java rules clearly specify the order of evaluation in an expression, which is guaranteed to be from left to right in section 15.7 of Java Language Specifications. In section 15.26.2. Compound Assignment Operators the Java specs also say that E1 op= E2 is equivalent to E1 = (T) ((E1) op (E2)).

In your Java program this means again that your expression is equivalent to X = X + f() and first X is evaluated, then f(). So the side effect of f() is not taken into account in the result.

So your Java compiler doesn't have a bug. It just complies with the specifications.

Launcelot answered 15/8, 2014 at 8:29 Comment(4)
I assume all languages portrayed include a similar clause and the result is undefined for all of them. They just happen to differ.Appetence
@Appetence That assumption might bite you. It is defined for C# and Java for sure, to my immediate knowledge.Mouthpart
@Smith_61: No. While Java is based on Objective-C and not C++ as is often reported, the designers were nonetheless familiar with C++, and because of their experience with C++ they very deliberately decided that there would not be any sort of undefined or implementation-defined behavior whatsoever in either the JVM Spec, the Java Language Spec or the JRE Spec. Iff a Java program is accepted by any Java compiler, its results will be completely defined by the spec. (There is some platform-defined behavior around floats, and there is of course non-deterministic scheduling of threads.)Cyclamen
@ChrisHayes: How sure are you the behavior is undefined instead of there being two possible outcomes and it being unspecified which one happens, if the right-hand-side does not modify the left-hand-side outside a function-call? AFAICT, saying it is undefined is just plain wrong.Stocks
L
21

Thanks to comments by Deduplicator and user694733, here is a modified version of my original answer.


The C++ version has undefinedunspecified behaviour.

There is a subtle difference between "undefined" and "unspecified", in that the former allows a program to do anything (including crashing) whereas the latter allows it to choose from a set of particular allowed behaviours without dictating which choice is correct.

Except of very rare cases, you will always want to avoid both.


A good starting point to understand whole issue are the C++ FAQs Why do some people think x = ++y + y++ is bad? , What’s the value of i++ + i++? and What’s the deal with “sequence points”?:

Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression.

(...)

Basically, in C and C++, if you read a variable twice in an expression where you also write it, the result is undefined.

(...)

At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (...) The “certain specified points” that are called sequence points are (...) after evaluation of all a function’s parameters but before the first expression within the function is executed.

In short, modifying a variable twice between two consecutive sequence points yields undefined behaviour, but a function call introduces an intermediate sequence point (actually, two intermediate sequence points, because the return statement creates another one).

This means the fact that you have a function call in your expression "saves" your Simple::X += Simple::f(); line from being undefined and turns it into "only" unspecified.

Both 1 and 11 are possible and correct outcomes, whereas printing 123, crashing or sending an insulting e-mail to your boss are not allowed behaviours; you'll just never get a guarantee whether 1 or 11 will be printed.


The following example is slightly different. It's seemingly a simplification of the original code but really serves to highlight the difference between undefined and unspecified behaviour:

#include <iostream>

int main() {
    int x = 0;
    x += (x += 10, 1);
    std::cout << x << "\n";
}

Here the behaviour is indeed undefined, because the function call has gone away, so both modifications of x occur between two consecutive sequence points. The compiler is allowed by the C++ language specification to create a program which prints 123, crashes or sends an insulting e-mail to your boss.

(The e-mail thing of course is just a very common humorous attempt at explaining how undefined really means anything goes. Crashes are often a more realistic result of undefined behaviour.)

In fact, the , 1 (just like the return statement in your original code) is a red herring. The following yields undefined behaviour, too:

#include <iostream>

int main() {
    int x = 0;
    x += (x += 10);
    std::cout << x << "\n";
}

This may print 20 (it does so on my machine with VC++ 2013) but the behaviour is still undefined.

(Note: this applies to built-in operators. Operator overloading changes the behaviour back to specified, because overloaded operators copy the syntax from the built-in ones but have the semantics of functions, which means that an overloaded += operator of a custom type that appears in an expression is actually a function call. Therefore, not only are sequence points introduced but the entire ambiguity goes away, the expression becoming equivalent to x.operator+=(x.operator+=(10));, which has guaranteed order of argument evaluation. This is probably irrelevant to your question but should be mentioned anyway.)

In contrast, the Java version

import java.io.*;

class Ideone
{
    public static void main(String[] args)
    {
        int x = 0;
        x += (x += 10);
        System.out.println(x);
    }
}

must print 10. This is because Java has neither undefined nor unspecified behaviour with regards to evaluation order. There are no sequence points to be concerned about. See Java Language Specification 15.7. Evaluation Order:

The Java programming language guarantees that the operands of operators appear to be evaluated in a specific evaluation order, namely, from left to right.

So in the Java case, x += (x += 10), interpreted from left to right, means that first something is added to 0, and that something is 0 + 10. Hence 0 + (0 + 10) = 10.

See also example 15.7.1-2 in the Java specification.

Going back to your original example, this also means that the more complex example with the static variable has defined and specified behaviour in Java.


Honestly, I don't know about C# and PHP but I would guess that both of them have some guaranteed evaluation order as well. C++, unlike most other programming languages (but like C) tends to allow much more undefined and unspecified behaviour than other languages. That's not good or bad. It's a tradeoff between robustness and efficiency. Choosing the right programming language for a particular task or project is always a matter of analysing tradeoffs.

In any case, expressions with such side effects are bad programming style in all four languages.

One final word:

I found a little bug in C# and Java.

You should not assume to find bugs in language specifications or compilers if you don't have many years of professional experience as a software engineer.

Lempira answered 15/8, 2014 at 9:23 Comment(18)
Yeah, you right, i fixed it. And yes that is bad programming style, but it's good to demonstrate this situation.Wiltshire
I am no expert in C++, but at least in C there is sequence point involved in function call. If this is the case, your simplified example is not equivalent, and original C++ example is well defined.Edgeworth
@user694733: Well, I am no expert in C, but I am pretty sure that C behaves 100% identically here (and would thus inhibit undefined behaviour as well). The function call being a sequence point only means that the function call's context is evaluated before its body is entered. It defines no guarantee about the order of reads and writes of data shared between the function and the context.Lempira
I found an interesting bit on N1570: "A compound assignment of the form E1 op= E2 is equivalent to the simple assignment expression E1 = E1 op (E2), except that the lvalue E1 is evaluated only once, and with respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation." Doesn't this mean that E1 is only read after evaluating E2 (X+=f() is safe, but X=X+f() is not)?Edgeworth
+1 for bad programming style in all four languagesSemitic
@user694733: I thought that was just standardese for defining the meaning of += and similar operators: a += b is mathematically equivalent to a = a + b (which may be obvious but must be mentioned somewhere to be official), just that a is evaluated only once and that the whole operator statement itself is evaluated as a unit, not as two separate operators. Thus, X += f() is still not safe if f modifies X.Lempira
Here's your answer on the C# version (and probably the inspiration for this question?): blogs.msdn.com/b/oldnewthing/archive/2014/08/14/10549885.aspx.Lulalulea
@Deduplicator: After having given it some thought, I deemed it necessary to update my original answer. The difference between undefined and unspecified should not be ignored like this. Thank you.Lempira
@ChristianHackl: Well done. You might consider adding what happens when either side is a user-defined type, because then everything suddenly looks good, which makes the C++ story look superior for a second, or maybe two.Stocks
@Deduplicator: done. Although I start to have the feeling that answering questions about C++ expressions and evaluation order is a futile task. It's more like writing a book about it...Lempira
Hm. If op= is a user-defined operator, unless it takes the lhs by value, everything is well-defined and there's no ambiguity at all. And don't write a book, write a few.Stocks
@Deduplicator: Why would the ambiguity go away if the user-defined operator has the same side effect as the function in the OP's original code?Lempira
@Deduplicator: unless I am overlooking something, this depends on whether both += are user-defined. I was more thinking about the case when only the second one was user-defined.Lempira
@ChristianHackl: Because those side-effects are all sequenced between call to it and return from it, it's a bog-standard function-call.Stocks
@Deduplicator: Hmm... I guess you are right again. The presence of an overloaded operator for the type of x means that in x += (x += 10), both += must be function calls. I'll modify that note in my answer, accordingly.Lempira
How is the C++ comma operator example any different from the function call example? There is a sequence point after the evaluation of the first operand of the comma operator (C++03 §1.9/18).Romanaromanas
@AdamRosenfield: I think you are right. That the comma operator injects a sequence point is out of question; and since a sequence point means that everything that came before must have been evaluated, x += 10 must have been evaluated at that point. Which in turn means it must have taken place before the outer += finished. It's just unspecified whether the outer += adds the result of the comma operator (1) to 0 (0 + 1 => 1) or to 10 (10 + 1 => 11), because it is unspecified whether the (x += 10, 1) part or the x += ... part comes first. You think this is a more accurate derscription?Lempira
@Christian: Actually, the behavior on the whole is still undefined (not unspecified) in both the function call and comma operator cases, because C++03 §5/4 says "Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined." (emphasis added).Romanaromanas
F
7

As Christophe has already written, this is basically an undefined operation.

So why does C++ and PHP does it one way, and C# and Java the other way?

In this case (which may be different for different compilers and platforms), the order of evaluation of arguments in C++ is inverted compared to C# - C# evaluates arguments in order of writing, while the C++ sample does it the other way around. This boils down to the default calling conventions both use, but again - for C++, this is an undefined operation, so it may differ based on other conditions.

To illustrate, this C# code:

class Program
{
    static int x = 0;

    static int f()
    {
        x = x + 10;
        return 1;
    }

    public static void Main()
    {
        x = f() + x;
        System.Console.WriteLine(x);
    }
}

Will produce 11 on output, rather than 1.

That's simply because C# evaluates "in order", so in your example, it first reads x and then calls f(), while in mine, it first calls f() and then reads x.

Now, this still might be unrealiable. IL (.NET's bytecode) has + as pretty much any other method, but optimizations by the JIT compiler might result in a different order of evaluation. On the other hand, since C# (and .NET) does define the order of evaluation / execution, so I guess a compliant compiler should always produce this result.

In any case, that's a lovely unexpected outcome you've found, and a cautionary tale - side-effects in methods can be a problem even in imperative languages :)

Oh, and of course - static means something different in C# vs. C++. I've seen that mistake made by C++ers coming to C# before.

EDIT:

Let me just expand a bit on the "different languages" issue. You've automatically assumed, that C++'s result is the correct one, because when you're doing the calculation manually, you're doing the evaluation in a certain order - and you've determined this order to comply with the results from C++. However, neither C++ nor C# do analysis on the expression - it's simply a bunch of operations over some values.

C++ does store x in a register, just like C#. It's just that C# stores it before evaluating the method call, while C++ does it after. If you change the C++ code to do x = f() + x instead, just like I've done in C#, I expect you'll get the 1 on output.

The most important part is that C++ (and C) simply didn't specify an explicit order of operations, probably because it wanted to exploit architectures and platforms that do either one of those orders. Since C# and Java were developed in a time when this doesn't really matter anymore, and since they could learn from all those failures of C/C++, they specified an explicit order of evaluation.

Franz answered 15/8, 2014 at 8:43 Comment(10)
C# Language Spec 7.17.2 for the curiousMouthpart
+1 - Thanks for these explanations on C#. Just for the minutes: the order in C++ is not inverted to C#. It's just that it's not defined so the compiler is free to choose.Launcelot
@Launcelot Yeah, I was going by the calling conventions that are usually used on Windows. I'm well aware of the whole clusterproblem with different calling and evaluation conventions. So were the designers of Java and C#, it seems :DFranz
"Since C# and Java were developed in a time when this doesn't really matter anymore, and since they could learn from all those failures of C/C++" - That's not true. First of all, what makes you think it doesn't "matter" anymore? And why would it be a "failure"? C and C++ can hardly be considered unsuccessful languages in the history of computer science. They are well and alive and may very well outlive both Java and C#. Choosing the right language is always a matter of tradeoffs, there is no generic "better" or "worse".Lempira
-1. "The default order of evaluation of arguments in C++ is inverted". No, there is NO default order of evaluation. I'm not certain to what degree argument evaluation is allowed to overlap, but certainly all orders (including but not limited to left-to-right and right-to-left) are allowed. And with modern CPU's reordering operations, C++ doesn't even care to guarantee consistency.Winterize
@ChristianHackl At no point did I say that C++ is obsolete or that C# is better. Unnecessary undefined operations are a failure in my mind, but if it doesn't bother you, that's fine too. In any case, whether you consider it a failure or not, it has no bearing on whether C++ is or isn't a successful language (and vice versa). I'm simply describing differences in those languages, the negative implications are all yours :)Franz
@Winterize I was talking about this specific example with this specific compiler etc. And as for CPU optimizations, that's a common misconception, but it's flawed. The CPU is not allowed to do a reordering that would make the output different (as long as you're doing single-threaded operations with no shared memory). All of those things are there specifically because undefined operations are a problem - if it could change, the abstraction that you're relying on would fail. Intel and AMD work very hard to ensure that doesn't happen.Franz
@Luaan: If you're talking about one single very specific example, then don't call it the "default order of evaluation". As for reordering, the CPU can definitely reorder evaluation to independent expressions, and all modern C++ compilers by default treat function arguments as independent expressions. I.e. the compiler may copy a single variable used twice as an argument to two registers, and the CPU may then reorder the accesses to those 2 registers. At assembly level, that's correct, and at C++ level it's correct for different reasons.Winterize
@Winterize You're right, I've changed the paragraph to be more explicit about the fact that this is an undefined operation (even though it was already there as the very first sentence in my answer :)). And while you're right about the compiler reordering, the CPU reordering simply must not cause this kind of "undefinedness" - if the reordering would cause a change in behaviour that violates the contract (which this example would), then it must not be done.Franz
@Luaan: Please look at the correction Christian made.Stocks
M
4

According to the Java language specification:

JLS 15.26.2, Compound Assignment Operators

A compound assignment expression of the form E1 op= E2 is equivalent to E1 = (T) ((E1) op (E2)) , where T is the type of E1 , except that E1 is evaluated only once.

This small program demonstrates the difference, and exhibits expected behavior based on this standard.

public class Start
{
    int X = 0;
    int f()
    {
        X = X + 10;
        return 1;
    }
    public static void main (String[] args) throws java.lang.Exception
    {
        Start actualStart = new Start();
        Start expectedStart = new Start();
        int actual = actualStart.X += actualStart.f();
        int expected = (int)(expectedStart.X + expectedStart.f());
        int diff = (int)(expectedStart.f() + expectedStart.X);
        System.out.println(actual == expected);
        System.out.println(actual == diff);
    }
}

In order,

  1. actual is assigned to value of actualStart.X += actualStart.f().
  2. expected is assigned to the value of the
  3. result of retrieving actualStart.X, which is 0, and
  4. applying the addition operator to actualStart.X with
  5. the return value of invoking actualStart.f(), which is 1
  6. and assigning the result of 0 + 1 to expected.

I also declared diff to show how changing the order of invocation changes the result.

  1. diff is assigned to value of the
  2. the return value of invoking diffStart.f(), with is 1, and
  3. applying the addition operator to that value with
  4. the value of diffStart.X (which is 10, a side effect of diffStart.f()
  5. and assigning the result of 1 + 10 to diff.

In Java, this is not undefined behavior.

Edit:

To address your point regarding local copies of variables. That is correct, but it has nothing to do with static. Java saves the result of evaluating each side (left side first), then evaluates result of performing the operator on the saved values.

Mouthpart answered 15/8, 2014 at 8:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.