56069 – [7 Regression] RA pessimization

Bug 56069 - [7 Regression] RA pessimization

Summary: [7 Regression] RA pessimization

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	rtl-optimization (show other bugs)
Version:	4.8.0

Importance:	P2 normal
Target Milestone:	8.0
Assignee:	Bernd Schmidt

URL:
Keywords:	missed-optimization, ra

Duplicates (1):	39723 (view as bug list)
Depends on:
Blocks:

Reported:	2013-01-21 17:37 UTC by Jakub Jelinek
Modified:	2022-05-27 08:00 UTC (History)
CC List:	7 users (show)

See Also:	https://sourceware.org/bugzilla/show_bug.cgi?id=20295
Host:
Target:	x86_64-linux
Build:
Known to work:
Known to fail:
Last reconfirmed:	2013-10-25 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jakub Jelinek 2013-01-21 17:37:09 UTC

Since http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=139993
unsigned long foo (unsigned long mem)
{
  return (mem >> 3) | (1UL << 44);
}

unsigned long bar (unsigned long mem)
{
  return (mem >> 3) + (1UL << 44);
}

on x86_64-linux at -O2 (and -Os) the generated code for foo is one insn longer.
We used to emit:
        shrq    $3, %rdi
        movabsq $17592186044416, %rax
        orq     %rdi, %rax
        ret
but now emit:
        movq    %rdi, %rax
        movabsq $17592186044416, %rdx
        shrq    $3, %rax
        orq     %rdx, %rax
        ret

Comment 1 Vladimir Makarov 2013-01-21 21:56:02 UTC

LRA does not generate any reloads (additional insns).  So I don't think it is LRA problem.  I've checked reload.  It generates the same code.

Although I guess, it is IRA problem.  It should assign other hard registers to improve the code.

I'll try to find a solution.

Comment 2 Vladimir Makarov 2013-01-22 21:01:21 UTC

It is definitely regmove pass drawback.  IRA can do nothing in this
case.

We have the following code before and after regmove:

    2: r62:DI=di:DI                          2: r63:DI=di:DI
      REG_DEAD di:DI                           REG_DEAD di:DI
    6: {r64:DI=r62:DI 0>>0x3;clobber fla     6: {r63:DI=r63:DI 0>>0x3;clobber fl
      REG_DEAD r62:DI
    7: r65:DI=0x100000000000                 7: r65:DI=0x100000000000
    8: {r63:DI=r64:DI|r65:DI;clobber fla     8: {r63:DI=r63:DI|r65:DI;clobber fl
      REG_DEAD r65:DI                          REG_DEAD r65:DI
      REG_DEAD r64:DI
   13: ax:DI=r63:DI                         13: ax:DI=r63:DI
      REG_DEAD r63:DI                          REG_DEAD r63:DI
   16: use ax:DI                            16: use ax:DI

Regmove changes r64 to r63.  It makes two equal hard reg preferences
for r63: AX or DI. Choosing either one results in worse code.

The original generated code can be achieved if regmove changes r65 to
r63.  In this case we have only one hard register preference for r63
(AX) and for r62 (DI).

It can be achieved if regmove tries all orders of commutative operands
(now regmove pass uses the first found order) using additional
heuristics (live range length and/or number of preferred hard regs) to
choose the best order.

It is not a trivial change and can not be done for given release
(gcc4.8).

Also now we have LRA and may be such regmove transformations are not
necessary.  I am going to try this for gcc4.9 when I have more time.
Still to assign ax to r65 and r63 we also need some hard register
preference propagations in IRA which is currently absent.

In any case, I think that older releases were just lucky and generated
the best code as some passes before regmove put r65 as a first input
operand.

Comment 3 Jakub Jelinek 2013-01-22 21:10:34 UTC

Deferring for 4.9 then.

Comment 4 Richard Biener 2013-10-25 12:17:23 UTC

Confirmed on trunk.

Comment 5 Jeffrey A. Law 2013-12-20 21:23:09 UTC

Maybe I'm missing something here.  We have this immediately prior to IRA:

(insn 2 4 3 2 (set (reg/v:DI 86 [ mem ])
        (reg:DI 5 di [ mem ])) j.c:2 89 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 5 di [ mem ])
        (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (parallel [
            (set (reg:DI 88 [ D.1760 ])
                (lshiftrt:DI (reg/v:DI 86 [ mem ])
                    (const_int 3 [0x3])))
            (clobber (reg:CC 17 flags))
        ]) j.c:3 562 {*lshrdi3_1}
     (expr_list:REG_DEAD (reg/v:DI 86 [ mem ])
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))
(insn 7 6 8 2 (set (reg:DI 89)
        (const_int 17592186044416 [0x100000000000])) j.c:3 89 {*movdi_internal}
     (nil))
(insn 8 7 13 2 (parallel [
            (set (reg:DI 87 [ D.1760 ])
                (ior:DI (reg:DI 88 [ D.1760 ])
                    (reg:DI 89)))
            (clobber (reg:CC 17 flags))
        ]) j.c:3 421 {*iordi_1}
     (expr_list:REG_DEAD (reg:DI 89)
        (expr_list:REG_DEAD (reg:DI 88 [ D.1760 ])
            (expr_list:REG_UNUSED (reg:CC 17 flags)
                (expr_list:REG_EQUAL (ior:DI (reg:DI 88 [ D.1760 ])
                        (const_int 17592186044416 [0x100000000000]))
                    (nil))))))
(insn 13 8 16 2 (set (reg/i:DI 0 ax)
        (reg:DI 87 [ D.1760 ])) j.c:4 89 {*movdi_internal}
     (expr_list:REG_DEAD (reg:DI 87 [ D.1760 ])
        (nil)))
(insn 16 13 0 2 (use (reg/i:DI 0 ax)) j.c:4 -1
     (nil))

ISTM that we want (reg 86) to prefer di and (reg 87) to prefer ax by way of the hard register copies.  (reg 88) should then prefer di by way of insn 6.

(reg 89) really doesn't need a preference and can go anywhere that doesn't conflict.

So if we look at the IRA dump we have:

Pass 1 for finding pseudo/allocno costs

    r89: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r88: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r87: preferred AREG, alternative GENERAL_REGS, allocno GENERAL_REGS
    r86: preferred DIREG, alternative GENERAL_REGS, allocno GENERAL_REGS

Which I guess is OK.  A bit surprised to not see r88 preferring DIreg at this point, but I guess copies implied by the 2 operand nature are handled later.

ISTM that the copy implied by insn 6 should result in 88 somehow preferring DIreg.  Then, ideally we'd see that while 89 can go anywhere it's best to match it with 87.

Vlad, what am I missing here?

Comment 6 Vladimir Makarov 2013-12-20 22:40:09 UTC

(In reply to Jeffrey A. Law from comment #5)
> Maybe I'm missing something here.  We have this immediately prior to IRA:
> 

> ISTM that we want (reg 86) to prefer di and (reg 87) to prefer ax by way of
> the hard register copies.  (reg 88) should then prefer di by way of insn 6.
> 
> (reg 89) really doesn't need a preference and can go anywhere that doesn't
> conflict.
> 
> So if we look at the IRA dump we have:
> 
> Pass 1 for finding pseudo/allocno costs
> 
>     r89: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
>     r88: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
>     r87: preferred AREG, alternative GENERAL_REGS, allocno GENERAL_REGS
>     r86: preferred DIREG, alternative GENERAL_REGS, allocno GENERAL_REGS
> 
> Which I guess is OK.  A bit surprised to not see r88 preferring DIreg at
> this point, but I guess copies implied by the 2 operand nature are handled
> later.
> 
> ISTM that the copy implied by insn 6 should result in 88 somehow preferring
> DIreg.  Then, ideally we'd see that while 89 can go anywhere it's best to
> match it with 87.
> 
> Vlad, what am I missing here?

Preferred/alternative class is misleading here.  It is not used anywhere in IRA/LRA for allocation decision (but may be it is used by reload -- i don't remember).  It is only used as a temporary result for allocno class calculation.

Hard reg preferences and copies are further in the dump.  They looks as following

cp0:a2(r88)<->a3(r86)@1000:constraint
cp1:a0(r87)<->a1(r89)@1000:constraint
cp2:a0(r87)<->a2(r88)@1000:constraint
pref0:a0(r87)<-hr0@1500
pref1:a3(r86)<-hr5@1500

This problem is well known me.  Jakub filled analogous PR.  I am working on it.  The reason of the problem are copies created for commutative ops insns.  Two copies give different signals for hard reg preferences through the propagation mechanism.  It is easy to make a test for this problem because the test should be small and involve arguments and result on small number of insns.  Therefore we have a few such PRs.  In bigger function, the probability of the problem occurrence is quite small.

Comment 7 Jakub Jelinek 2014-04-22 11:35:59 UTC

GCC 4.9.0 has been released

Comment 8 Jakub Jelinek 2014-07-16 13:28:12 UTC

GCC 4.9.1 has been released.

Comment 9 Jakub Jelinek 2014-10-30 10:38:17 UTC

GCC 4.9.2 has been released.

Comment 10 Jakub Jelinek 2015-06-26 19:52:08 UTC

GCC 4.9.3 has been released.

Comment 11 Jeffrey A. Law 2016-02-11 05:04:34 UTC

*** Bug 39723 has been marked as a duplicate of this bug. ***

Comment 12 Bernd Schmidt 2016-02-17 13:35:33 UTC

Looking at this.

Comment 13 Bernd Schmidt 2016-02-18 16:51:12 UTC

I think I got far enough that I'll just assign myself. Patches in testing.

Comment 14 Bernd Schmidt 2016-03-03 18:02:49 UTC

Patch and discussion here.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00182.html

Patch approved for gcc-7.

Comment 15 Jeffrey A. Law 2016-06-23 15:51:08 UTC

On 04/27/2016 05:33 AM, Bernd Schmidt wrote:
> On 04/27/2016 06:02 AM, Jeff Law wrote:
>> AFAICT the sra-1.c expects to see the incremented value and I'm at a
>> loss to understand what's really going on here.  Can you give more
>> details?
> 
> Yeah, maybe my first impression wasn't very accurate.
> 
> When I try to run gdb manually, it just crashes:
> 
> (gdb) show version
> GNU gdb (Gentoo 7.10.1 vanilla) 7.10.1
> (gdb) b 43
> Breakpoint 1 at 0x40059b: file sra-1.c, line 43.
> (gdb) run
> Starting program: /local/src/egcs/bscommit/gcc/a.out
> 
> Breakpoint 1, f3 (k=<optimized out>) at sra-1.c:43
> 43      bar (a.j);        /* { dg-final { gdb-test 43 "a.j" "14" } } */
> (gdb) p a.j
> Segmentation fault (core dumped)
> 
[ ... ]

> 
> 
> I don't really understand the var-tracking stuff too well, so no idea
> where to go from here. I suppose I'm withdrawing my patch.
Based on the above, there's some kind of GDB bug.  So your patch may still be a good thing.

I did a build on F23 which has effectively the same version of gdb and can reproduce the gdb segfault.  It also reproduces on F24 which has gdb-7.11.1

AFAICT gdb thinks the value of "a" has been optimized out, but goes absolutely bananas and segfaults if you try to examine a field within "a".

[law@torsion gcc]$ ./xgcc -B./ -g sra-1.c -O2
[law@torsion gcc]$ gdb ./a.out
GNU gdb (GDB) Fedora 7.10.1-30.fc23
[ ... ]
(gdb) b 43
Breakpoint 1 at 0x40056b: file sra-1.c, line 43.
(gdb) r
Starting program: /opt/notnfs/law/gcc-testing/obj/gcc/a.out 
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.22-10.fc23.x86_64

Breakpoint 1, f3 (k=<optimized out>) at sra-1.c:43
43        bar (a.j);            /* { dg-final { gdb-test 43 "a.j" "14" } } */
(gdb) p a
$1 = <optimized out>
(gdb) p a.i
Segmentation fault (core dumped)


My gdb skills are far too rusty to take this further.  I've filed an upstream report (BZ20295 in the gdb tracker).  Probably not much we can do until the GDB side gets fixed.

I'm going to attach this to 56069 for future reference.
jeff

Comment 16 Bernd Schmidt 2017-01-23 15:20:19 UTC

I retested again with a few different combinations of things. With an older gdb, I can still reproduce the issue that sra-1 becomes UNSUPPORTED (presumably through a gdb crash). With gdb-7.12 installed the patch now causes sra-1.c to fail (apparently a value becomes optimized out). If we consider the guality testsuite in some way meaningful we still can't apply this patch.
Also, Vlad's reasoning for waiting until gcc-7 last time now holds for waiting until gcc-8.

Comment 17 Jeffrey A. Law 2017-01-24 14:56:31 UTC

OK.  Changing milestone to gcc-8.  I think we can reasonably allow guality to regress here.  Waiting until gcc-8 also gives the updated gdb more time to get deployed in the wild.

Comment 18 Jeffrey A. Law 2018-01-17 19:55:20 UTC

Same reasoning in c#16/c#17 applies now, sadly.

Comment 19 Jeffrey A. Law 2018-12-10 01:42:55 UTC

This was fixed way back in 2017:

commit a6e6a4df68e9e2cfbf4680b7f794254bc533d27c
Author: uros <uros@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Aug 14 16:42:15 2017 +0000

            PR target/46091
            * config/i386/i386.md (*anddi_1_btr): New insn_and_split pattern.
            (*iordi_1_bts): Ditto.
            (*xordi_1_btc): Ditto.
    
    testsuite/ChangeLog:
    
            PR target/46091
            * gcc.target/i386/pr46091-1.c: New test.
            * gcc.target/i386/pr46091-2.c: Ditto.
            * gcc.target/i386/pr46091-3.c: Ditto.
    
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251095 138bc75d-0d04-0410-961f-82ee72b054a4

:040000 040000 05739aaeccb137e9725a14cd94782096013169fa 53297b094d658372050f33eb62877dd228569169 M      gcc

Comment 20 vfdff 2019-04-11 14:21:05 UTC

> 
> Preferred/alternative class is misleading here.  It is not used anywhere in
> IRA/LRA for allocation decision (but may be it is used by reload -- i don't
> remember).  It is only used as a temporary result for allocno class
> calculation.
> 

In the newest GCC 9.x , Does the Preferred/alternative class still not used in LRA ?

Comment 21 Jakub Jelinek 2019-05-03 09:14:24 UTC

GCC 9.1 has been released.

Comment 22 Jakub Jelinek 2019-08-12 08:53:44 UTC

GCC 9.2 has been released.

Comment 23 Jakub Jelinek 2020-03-12 11:58:51 UTC

GCC 9.3.0 has been released, adjusting target milestone.

Comment 24 Richard Biener 2021-06-01 08:05:51 UTC

GCC 9.4 is being released, retargeting bugs to GCC 9.5.

Comment 25 Richard Biener 2022-05-27 08:00:45 UTC

Fixed for GCC 8 it seems.