Bug 97843 - Bad code gen when concatenating to array
Summary: Bad code gen when concatenating to array
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: d (show other bugs)
Version: 10.2.0
: P3 normal
Target Milestone: ---
Assignee: Iain Buclaw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-16 01:11 UTC by Alex
Modified: 2020-11-18 10:38 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex 2020-11-16 01:11:52 UTC
This code:

import std.stdio;

ubyte sum(ubyte[] bytes)
{
	ubyte result;
	foreach(x;bytes)
		result += x;
	return result;
}

int main()
{
	ubyte[] bytes = [0,1];
	bytes ~= bytes.sum;
	writefln("sum = %s",bytes[$-1]);
	return 0;
}

Expected output :
sum = 1 as confirmed at https://dlang.org/ your code here
Actual output :
sum = <random number>
Comment 1 Iain Buclaw 2020-11-16 11:31:30 UTC
(In reply to Alex from comment #0)
> This code:
> 
> import std.stdio;
> 
> ubyte sum(ubyte[] bytes)
> {
> 	ubyte result;
> 	foreach(x;bytes)
> 		result += x;
> 	return result;
> }
> 
> int main()
> {
> 	ubyte[] bytes = [0,1];
> 	bytes ~= bytes.sum;
> 	writefln("sum = %s",bytes[$-1]);
> 	return 0;
> }
> 
> Expected output :
> sum = 1 as confirmed at https://dlang.org/ your code here
> Actual output :
> sum = <random number>

So the array is extended before calling sum(), which doesn't violate LTR order of precedence that is expected in assignment operations.

    A() = B() + C();   // A, B, C
    A() += B() + C();  // A, B, C

Despite there being a slight deviation with array concatenation, but that down to some implementation specific detail that will be resolved upstream druntime at some point in the future.

X() = Y() ~ Z();  // x86: X, Z, Y
                  // ARM: X, Y, Z


However, order of evaluation in assignment expressions is explicitly undefined, both in 10.2.7 [1] and 12.9.5 [2] of the spec.  So it would be difficult to justify whether it is indeed bad code-gen, and writing code that depends on a specific order of evaluation is not recommended anyway.

[1]: https://dlang.org/spec/expression.html#order-of-evaluation
[2]: https://dlang.org/spec/arrays.html#array-operations
Comment 2 Alex 2020-11-16 12:16:17 UTC
I agree that the order of evaluation of operands is undefined and writing code that depends on that order would not be reliable. In this case it's the execution of the assign expression happening before all the operands have been evaluated that is the problem.

The arithmetic equivalent would be for:
X += 4/2
To be produce:
Immediate load Register with 4
Add register with 4 in it to x
Divide register with 4 in it by 2
Resulting in x being increased by 4 instead of 2

10.2.3 Binary expressions EXCEPT for AssignExpression are left to right

10.2.7 says operand order is undefined

Nowhere does the spec say that the assignment operator has to happen after operand evaluation. I think this is a hole in the spec.
Comment 3 Iain Buclaw 2020-11-16 15:34:46 UTC
(In reply to Alex from comment #2)
> I agree that the order of evaluation of operands is undefined and writing
> code that depends on that order would not be reliable. In this case it's the
> execution of the assign expression happening before all the operands have
> been evaluated that is the problem.
> 

Yes, it certainly is the most surprising outcome of all that could happen, but it's mostly down to how the run-time library helpers are implemented, which may get different results on non-x86.
Comment 4 Iain Buclaw 2020-11-16 15:35:29 UTC
Might be a regression caused by the fix for PR96924
Comment 5 Iain Buclaw 2020-11-16 16:17:57 UTC
(In reply to Alex from comment #2)
> The arithmetic equivalent would be for:
> X += 4/2
> To be produce:
> Immediate load Register with 4
> Add register with 4 in it to x
> Divide register with 4 in it by 2
> Resulting in x being increased by 4 instead of 2
> 
> 10.2.3 Binary expressions EXCEPT for AssignExpression are left to right
> 
> 10.2.7 says operand order is undefined
> 
> Nowhere does the spec say that the assignment operator has to happen after
> operand evaluation. I think this is a hole in the spec.

The equivalent code that `bytes ~= bytes.sum` equates to is:

ref ubyte[] extend(ref ubyte[] bytes)
{
    bytes.length += 1;
    bytes[$-1] = 0xde;
    return bytes;
}

extend(bytes)[bytes.length] = bytes.sum;

Which is also LTR evaluation, so everything is consistent so far.
Comment 6 Alex 2020-11-16 20:56:54 UTC
From cppreference.com :

The behavior of every builtin compound-assignment expression E1 op= E2 (where E1 is a modifiable lvalue expression and E2 is an rvalue expression or a braced-init-list (since C++11)) is exactly the same as the behavior of the expression E1 = E1 op E2.

I think the D spec should contain something similar. It's omission is an oversight. I don't think D developers intend different expression evaluation behavior to C++.

If the above were part of the D spec, the equivalent code would be :

bytes = bytes ~ bytes.sum

The assignment must happen after the ~.

In the same way that the ~= must happen after the sum in bytes ~= bytes.sum

Maybe I should raise an issue with the D spec ?
Either way I don't think the compiler should do this because it doesn't make sense. This is the first compiler release where my unit tests have detected this behaviour.

>ref ubyte[] extend(ref ubyte[] bytes)
>{
>    bytes.length += 1;
>    bytes[$-1] = 0xde;
>    return bytes;
>}

>extend(bytes)[bytes.length] = bytes.sum;

This would be ok if a trailing ~ meant 'extend the array and return the last element'. Then it could be evaluated before the assignment.

The assignment operator ~= should be considered to be both the ~ and the = grouped together.
Comment 7 GCC Commits 2020-11-18 10:32:48 UTC
The releases/gcc-10 branch has been updated by Iain Buclaw <ibuclaw@gcc.gnu.org>:

https://gcc.gnu.org/g:bbb887834d78cf6a444bf9cecc29d14b4dfb9cf8

commit r10-9043-gbbb887834d78cf6a444bf9cecc29d14b4dfb9cf8
Author: Iain Buclaw <ibuclaw@gdcproject.org>
Date:   Tue Nov 17 13:11:33 2020 +0100

    d: Fix LHS of array concatentation evaluated before the RHS.
    
    In an array append expression:
    
        array ~= fun(array);
    
    The array in the left hand side of the expression was extended before
    evaluating the result of the right hand side, which resulted in the
    newly uninitialized array index being used before set.
    
    This fixes that so that the result of the right hand side is always
    saved in a reusable temporary before assigning to the destination.
    
    gcc/d/ChangeLog:
    
            PR d/97843
            * d-codegen.cc (build_assign): Evaluate TARGET_EXPR before use in
            the right hand side of an assignment.
            * expr.cc (ExprVisitor::visit (CatAssignExp *)): Force a TARGET_EXPR
            on the element to append if it is a CALL_EXPR.
    
    gcc/testsuite/ChangeLog:
    
            PR d/97843
            * gdc.dg/pr97843.d: New test.
    
    (cherry picked from commit 798bdfa0ebcf2bd012ffce75a594f783a8cb2dd0)
Comment 8 Iain Buclaw 2020-11-18 10:37:17 UTC
Fix committed to mainline and backported to gcc-10.
Comment 9 Iain Buclaw 2020-11-18 10:38:50 UTC
Another related issue has been created in pr97889.