Source: typedef struct { int a : 20; int b : 1; int c : 1; int d : 1; } bitstr; bitstr fun(int s, int l) { bitstr res; res.a = s; res.b = 1; res.c = 0; res.d = l; return res; } This is caused by the nrv optimization.
Here is the asm after the tree-ssa merge: lwz r0,0(r3) rlwimi r0,r4,12,0,19 ori r0,r0,2048 rlwinm r0,r0,0,22,20 rlwimi r0,r5,9,22,22 stw r0,0(r3) blr Before the tree-ssa merge: slwi r4,r4,12 ori r4,r4,2048 rlwimi r4,r5,9,22,22 stw r4,0(r3) blr
It is much worse for x86. (-O3 -fomit-frame-pointer) After: fun: movl 4(%esp), %eax movl 8(%esp), %ecx movl (%eax), %edx andl $1048575, %ecx andl $-1048576, %edx orl %ecx, %edx movl %edx, (%eax) movzbl 2(%eax), %edx orb $16, %dl andb $-33, %dl movb %dl, 2(%eax) movl 12(%esp), %ecx movl (%eax), %edx andl $1, %ecx sall $22, %ecx andl $-4194305, %edx orl %ecx, %edx movl %edx, (%eax) ret $4 Before: pushl %ebp andl $-1048576, %ecx movl %esp, %ebp movl 12(%ebp), %edx movl 8(%ebp), %eax andl $1048575, %edx orl %edx, %ecx movl 16(%ebp), %edx orl $1048576, %ecx andl $-6291457, %ecx andl $1, %edx sall $22, %edx orl %edx, %ecx movl %ecx, (%eax) popl %ebp ret $4
This seems to have been fixed, somewhere, somehow.
Yes and fixed on powerpc-apple-darwin too. hmm seems like nrv not working on some structs or something else going funny.
We are now back to what we were with the tree-ssa. Darn.
On PPC we get now: lwz r0,0(r3) rlwimi r0,r5,9,22,22 rlwinm r0,r0,0,22,20 ori r0,r0,2048 rlwimi r0,r4,12,0,19 stw r0,0(r3) blr (but note it was much worse a little while back with 20050113: _fun: lwz r0,0(r3) slwi r5,r5,7 extsb r5,r5 slwi r4,r4,12 rlwimi r0,r5,10,22,22 rlwinm r0,r0,0,22,20 ori r0,r0,2048 rlwimi r0,r4,0,0,19 stw r0,0(r3) blr )
*** Bug 21198 has been marked as a duplicate of this bug. ***
Actually this is another SRA issue.
Does this mean that 22156 is a dup of this (or viceversa)?
This is so minor issue and does not come from any real code and I just made this example when looking at code gen in general so moving to 4.2.
Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.
FWIW, the patch for PR 22156 does not change the generated code on ppc, although it appears to improve the x86 code.
Closing 4.1 branch.
Closing 4.2 branch.
GCC 4.3.4 is being released, adjusting target milestone.
Looks fixed on ARM. Pinski, what say you on ppc?
GCC 4.3.5 is being released, adjusting target milestone.
Martin, perhaps a test case you want to watch if you or someone else is going to play with SRA vs. bitfields.
Even with bitfield accesses lowered at the tree level we end up with <bb 2>: D.1736_2 = (<unnamed-signed:20>) s_1(D); BF.0_3 = MEM[(struct bitstr *)&<retval>]; D.1741_4 = BF.0_3 & -1048576; D.1742_5 = (<unnamed-unsigned:20>) D.1736_2; D.1743_6 = (int) D.1742_5; BF.0_7 = D.1741_4 | 1048576; BF.1_9 = BF.0_7 | D.1743_6; D.1737_13 = (signed char) l_12(D); D.1738_14 = (<unnamed-signed:1>) D.1737_13; D.1747_16 = BF.1_9 & -6291457; D.1748_17 = (<unnamed-unsigned:1>) D.1738_14; D.1749_18 = (int) D.1748_17; D.1750_19 = D.1749_18 << 22; BF.3_20 = D.1750_19 | D.1747_16; MEM[(struct bitstr *)&<retval>] = BF.3_20; return <retval>; there's some optimization possibilities if we can recognize the truncating and extending conversions as bit manipulations.
4.3 branch is being closed, moving to 4.4.7 target.
4.4 branch is being closed, moving to 4.5.4 target.
GCC 4.6.4 has been released and the branch has been closed.
The 4.7 branch is being closed, moving target milestone to 4.8.4.
GCC 4.8.4 has been released.
If you initialise the return value to all zeroes, the good code comes out again. The problem is that currently GCC preserves all padding bits. The same is true for normal stores (instead of return value via hidden pointer as in the example code). For this case, for PowerPC, writing 0 to all padding bits is optimal. That is because we write to field "d" which is adjacent to the padding bits, and we do the access as a 32-bit word anyway. For other cases (and other targets, and other ABIs) things will be different.
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
GCC 4.9 branch is being closed
We currently generate fun: lwz 8,0(3) rlwimi 8,4,12,0,31-12 rlwinm 10,8,24,24,31 stw 8,0(3) rlwinm 10,10,0,30,27 ori 10,10,0x8 stb 10,2(3) ori 2,2,0 lwz 10,0(3) rlwimi 10,5,9,22,22 stw 10,0(3) blr which is probably the worst it has been :-/
GCC 6 branch is being closed
The GCC 7 branch is being closed, re-targeting to GCC 8.4.
Mine for GCC 11. I have patches which improve this a lot. The problem right now is the padding. I have an idea where we initialize all structs non-addressed local variables that have a non VOIDmode right before their first use. Kinda like init-regs does on the RTL but a little more complex. That is having: bitstr fun(int s, int l) { bitstr res; *(int*)&res = 0; res.a = s; res.b = 1; res.c = 0; res.d = l; return res; } Will produce better code always :).
On Tue, 14 Jan 2020, pinskia at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15596 > > Andrew Pinski <pinskia at gcc dot gnu.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Assignee|rguenth at gcc dot gnu.org |pinskia at gcc dot gnu.org > Target Milestone|8.4 |11.0 > > --- Comment #32 from Andrew Pinski <pinskia at gcc dot gnu.org> --- > Mine for GCC 11. I have patches which improve this a lot. > The problem right now is the padding. > > I have an idea where we initialize all structs non-addressed local variables > that have a non VOIDmode right before their first use. > Kinda like init-regs does on the RTL but a little more complex. > > That is having: > bitstr fun(int s, int l) > { > bitstr res; > *(int*)&res = 0; > res.a = s; > res.b = 1; > res.c = 0; > res.d = l; > return res; > } > > Will produce better code always :). Something like init-regs I'd not like. But the above should be detectable by store-merging in some way - store-merging can merge across "uninitialized" bits (by using zeroing or any other convenient value).
(In reply to rguenther@suse.de from comment #33) > Something like init-regs I'd not like. But the above should be > detectable by store-merging in some way - store-merging can > merge across "uninitialized" bits (by using zeroing or any > other convenient value). The problem is store merging might be too late. I have bit-field lowering right now right before PRE. that means we have lost information before we get to store-merging. I have been working on the regressions that bit-field lowering and store-merging testcases are the biggest one; I have decided to implement that inside reassoc instead as we are really reassociating BIT_INSERT_EXPRs and are able to optimize some of them into byte swaps, vector constructors, etc.
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
GCC 11.2 is being released, retargeting bugs to GCC 11.3
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
GCC 11.4 is being released, retargeting bugs to GCC 11.5.