15596 – [11/12/13/14 Regression] Missed optimization with bitfields with return value

Bug 15596 - [11/12/13/14 Regression] Missed optimization with bitfields with return value

Summary: [11/12/13/14 Regression] Missed optimization with bitfields with return value

Status:	ASSIGNED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	4.0.0

Importance:	P5 minor
Target Milestone:	11.5
Assignee:	Andrew Pinski

URL:
Keywords:	missed-optimization

Duplicates (1):	21198 (view as bug list)
Depends on:
Blocks:	bitfield
	Show dependency tree / graph

Reported:	2004-05-22 22:37 UTC by Andrew Pinski
Modified:	2023-07-07 10:11 UTC (History)
CC List:	7 users (show)

See Also:	22156
Host:
Target:
Build:
Known to work:
Known to fail:	4.0.4
Last reconfirmed:	2006-09-17 19:38:45

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Pinski 2004-05-22 22:37:55 UTC

Source:
typedef struct {
  int a : 20;
  int b : 1;
  int c : 1;
  int d : 1;
} bitstr;

bitstr fun(int s, int l)
{
  bitstr res;
  res.a  = s;
  res.b  = 1;
  res.c  = 0;
  res.d  = l;
  return res;
}

This is caused by the nrv optimization.

Comment 1 Andrew Pinski 2004-05-22 22:38:53 UTC

Here is the asm after the tree-ssa merge:
        lwz r0,0(r3)
        rlwimi r0,r4,12,0,19
        ori r0,r0,2048
        rlwinm r0,r0,0,22,20
        rlwimi r0,r5,9,22,22
        stw r0,0(r3)
        blr

Before the tree-ssa merge:
        slwi r4,r4,12
        ori r4,r4,2048
        rlwimi r4,r5,9,22,22
        stw r4,0(r3)
        blr

Comment 2 Andrew Pinski 2004-05-27 17:48:49 UTC

It is much worse for x86.
(-O3 -fomit-frame-pointer)
After:
fun:
        movl    4(%esp), %eax
        movl    8(%esp), %ecx
        movl    (%eax), %edx
        andl    $1048575, %ecx
        andl    $-1048576, %edx
        orl     %ecx, %edx
        movl    %edx, (%eax)
        movzbl  2(%eax), %edx
        orb     $16, %dl
        andb    $-33, %dl
        movb    %dl, 2(%eax)
        movl    12(%esp), %ecx
        movl    (%eax), %edx
        andl    $1, %ecx
        sall    $22, %ecx
        andl    $-4194305, %edx
        orl     %ecx, %edx
        movl    %edx, (%eax)
        ret     $4
Before:
        pushl   %ebp
        andl    $-1048576, %ecx
        movl    %esp, %ebp
        movl    12(%ebp), %edx
        movl    8(%ebp), %eax
        andl    $1048575, %edx
        orl     %edx, %ecx
        movl    16(%ebp), %edx
        orl     $1048576, %ecx
        andl    $-6291457, %ecx
        andl    $1, %edx
        sall    $22, %edx
        orl     %edx, %ecx
        movl    %ecx, (%eax)
        popl    %ebp
        ret     $4

Comment 3 Steven Bosscher 2004-06-29 18:10:31 UTC

This seems to have been fixed, somewhere, somehow.

Comment 4 Andrew Pinski 2004-06-29 18:19:09 UTC

Yes and fixed on powerpc-apple-darwin too. hmm seems like nrv not working on some structs or 
something else going funny.

Comment 5 Andrew Pinski 2005-02-11 05:39:24 UTC

We are now back to what we were with the tree-ssa.  Darn.

Comment 6 Andrew Pinski 2005-02-11 05:42:09 UTC

On PPC we get now:
        lwz r0,0(r3)
        rlwimi r0,r5,9,22,22
        rlwinm r0,r0,0,22,20
        ori r0,r0,2048
        rlwimi r0,r4,12,0,19
        stw r0,0(r3)
        blr

(but note it was much worse a little while back with 20050113:
_fun:
        lwz r0,0(r3)
        slwi r5,r5,7
        extsb r5,r5
        slwi r4,r4,12
        rlwimi r0,r5,10,22,22
        rlwinm r0,r0,0,22,20
        ori r0,r0,2048
        rlwimi r0,r4,0,0,19
        stw r0,0(r3)
        blr
)

Comment 7 Andrew Pinski 2005-04-25 01:09:18 UTC

*** Bug 21198 has been marked as a duplicate of this bug. ***

Comment 8 Andrew Pinski 2005-06-24 20:08:01 UTC

Actually this is another SRA issue.

Comment 9 Paolo Bonzini 2005-09-25 11:37:35 UTC

Does this mean that 22156 is a dup of this (or viceversa)?

Comment 10 Andrew Pinski 2005-10-16 22:17:26 UTC

This is so minor issue and does not come from any real code and I just made this example when looking at code gen in general so moving to 4.2.

Comment 11 Mark Mitchell 2006-05-25 02:35:27 UTC

Will not be fixed in 4.1.1; adjust target milestone to 4.1.2.

Comment 12 Alexandre Oliva 2007-04-05 23:33:03 UTC

FWIW, the patch for PR 22156 does not change the generated code on ppc, although it appears to improve the x86 code.

Comment 13 Joseph S. Myers 2008-07-04 16:32:04 UTC

Closing 4.1 branch.

Comment 14 Joseph S. Myers 2009-03-31 16:19:24 UTC

Closing 4.2 branch.

Comment 15 Richard Biener 2009-08-04 12:26:02 UTC

GCC 4.3.4 is being released, adjusting target milestone.

Comment 16 Steven Bosscher 2010-02-05 12:07:33 UTC

Looks fixed on ARM. Pinski, what say you on ppc?

Comment 17 Richard Biener 2010-05-22 18:10:02 UTC

GCC 4.3.5 is being released, adjusting target milestone.

Comment 18 Steven Bosscher 2010-08-06 23:12:49 UTC

Martin, perhaps a test case you want to watch if you or someone else is going to play with SRA vs. bitfields.

Comment 19 Richard Biener 2011-05-06 13:18:31 UTC

Even with bitfield accesses lowered at the tree level we end up with

<bb 2>:
  D.1736_2 = (<unnamed-signed:20>) s_1(D);
  BF.0_3 = MEM[(struct bitstr *)&<retval>];
  D.1741_4 = BF.0_3 & -1048576;
  D.1742_5 = (<unnamed-unsigned:20>) D.1736_2;
  D.1743_6 = (int) D.1742_5;
  BF.0_7 = D.1741_4 | 1048576;
  BF.1_9 = BF.0_7 | D.1743_6;
  D.1737_13 = (signed char) l_12(D);
  D.1738_14 = (<unnamed-signed:1>) D.1737_13;
  D.1747_16 = BF.1_9 & -6291457;
  D.1748_17 = (<unnamed-unsigned:1>) D.1738_14;
  D.1749_18 = (int) D.1748_17;
  D.1750_19 = D.1749_18 << 22;
  BF.3_20 = D.1750_19 | D.1747_16;
  MEM[(struct bitstr *)&<retval>] = BF.3_20;
  return <retval>;

there's some optimization possibilities if we can recognize the truncating
and extending conversions as bit manipulations.

Comment 20 Richard Biener 2011-06-27 12:13:49 UTC

4.3 branch is being closed, moving to 4.4.7 target.

Comment 21 Jakub Jelinek 2012-03-13 12:46:53 UTC

4.4 branch is being closed, moving to 4.5.4 target.

Comment 22 Jakub Jelinek 2013-04-12 15:16:36 UTC

GCC 4.6.4 has been released and the branch has been closed.

Comment 23 Richard Biener 2014-06-12 13:44:39 UTC

The 4.7 branch is being closed, moving target milestone to 4.8.4.

Comment 24 Jakub Jelinek 2014-12-19 13:27:45 UTC

GCC 4.8.4 has been released.

Comment 25 Segher Boessenkool 2015-02-01 07:33:22 UTC

If you initialise the return value to all zeroes, the good code comes
out again.  The problem is that currently GCC preserves all padding
bits.  The same is true for normal stores (instead of return value via
hidden pointer as in the example code).

For this case, for PowerPC, writing 0 to all padding bits is optimal.
That is because we write to field "d" which is adjacent to the padding
bits, and we do the access as a 32-bit word anyway.  For other cases
(and other targets, and other ABIs) things will be different.

Comment 26 Richard Biener 2015-06-23 08:17:09 UTC

The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.

Comment 27 Jakub Jelinek 2015-06-26 19:53:44 UTC

GCC 4.9.3 has been released.

Comment 28 Richard Biener 2016-08-03 08:37:34 UTC

GCC 4.9 branch is being closed

Comment 29 Segher Boessenkool 2017-11-03 12:59:33 UTC

We currently generate

fun:
        lwz 8,0(3)
        rlwimi 8,4,12,0,31-12
        rlwinm 10,8,24,24,31
        stw 8,0(3)
        rlwinm 10,10,0,30,27
        ori 10,10,0x8
        stb 10,2(3)
        ori 2,2,0
        lwz 10,0(3)
        rlwimi 10,5,9,22,22
        stw 10,0(3)
        blr

which is probably the worst it has been :-/

Comment 30 Jakub Jelinek 2018-10-26 10:21:31 UTC

GCC 6 branch is being closed

Comment 31 Richard Biener 2019-11-14 07:51:18 UTC

The GCC 7 branch is being closed, re-targeting to GCC 8.4.

Comment 32 Andrew Pinski 2020-01-14 07:51:24 UTC

Mine for GCC 11.  I have patches which improve this a lot.
The problem right now is the padding.

I have an idea where we initialize all structs non-addressed local variables that have a non VOIDmode right before their first use.
Kinda like init-regs does on the RTL but a little more complex.

That is having:
bitstr fun(int s, int l)
{
  bitstr res;
  *(int*)&res = 0;
  res.a  = s;
  res.b  = 1;
  res.c  = 0;
  res.d  = l;
  return res;
}

Will produce better code always :).

Comment 33 rguenther@suse.de 2020-01-14 07:59:47 UTC

On Tue, 14 Jan 2020, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15596
> 
> Andrew Pinski <pinskia at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>            Assignee|rguenth at gcc dot gnu.org         |pinskia at gcc dot gnu.org
>    Target Milestone|8.4                         |11.0
> 
> --- Comment #32 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Mine for GCC 11.  I have patches which improve this a lot.
> The problem right now is the padding.
> 
> I have an idea where we initialize all structs non-addressed local variables
> that have a non VOIDmode right before their first use.
> Kinda like init-regs does on the RTL but a little more complex.
> 
> That is having:
> bitstr fun(int s, int l)
> {
>   bitstr res;
>   *(int*)&res = 0;
>   res.a  = s;
>   res.b  = 1;
>   res.c  = 0;
>   res.d  = l;
>   return res;
> }
> 
> Will produce better code always :).

Something like init-regs I'd not like.  But the above should be
detectable by store-merging in some way - store-merging can
merge across "uninitialized" bits (by using zeroing or any
other convenient value).

Comment 34 Andrew Pinski 2020-01-14 08:19:58 UTC

(In reply to rguenther@suse.de from comment #33) 
> Something like init-regs I'd not like.  But the above should be
> detectable by store-merging in some way - store-merging can
> merge across "uninitialized" bits (by using zeroing or any
> other convenient value).

The problem is store merging might be too late.  I have bit-field lowering right now right before PRE.  that means we have lost information before we get to store-merging.  I have been working on the regressions that bit-field lowering and store-merging testcases are the biggest one; I have decided to implement that inside reassoc instead as we are really reassociating BIT_INSERT_EXPRs and are able to optimize some of them into byte swaps, vector constructors, etc.

Comment 35 Jakub Jelinek 2021-04-27 11:37:32 UTC

GCC 11.1 has been released, retargeting bugs to GCC 11.2.

Comment 36 Richard Biener 2021-07-28 07:04:13 UTC

GCC 11.2 is being released, retargeting bugs to GCC 11.3

Comment 37 Richard Biener 2022-04-21 07:47:27 UTC

GCC 11.3 is being released, retargeting bugs to GCC 11.4.

Comment 38 Jakub Jelinek 2023-05-29 10:01:37 UTC

GCC 11.4 is being released, retargeting bugs to GCC 11.5.