Bug 81803 - [7/8 regression] miscompilation at -O1 on mips64el
Summary: [7/8 regression] miscompilation at -O1 on mips64el
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 7.1.1
: P3 normal
Target Milestone: 7.3
Assignee: Eric Botcazou
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-10 15:38 UTC by Aurelien Jarno
Modified: 2017-11-07 07:47 UTC (History)
5 users (show)

See Also:
Host: mips64el-unknown-linux-gnu
Target: mips64el-unknown-linux-gnu
Build: mips64el-unknown-linux-gnu
Known to work:
Known to fail:
Last reconfirmed: 2017-08-10 00:00:00


Attachments
testcase (1.64 KB, text/plain)
2017-08-10 15:38 UTC, Aurelien Jarno
Details
testcase-b (232 bytes, text/plain)
2017-08-10 16:15 UTC, James Cowgill
Details
testcase-c (140 bytes, text/plain)
2017-08-29 14:33 UTC, James Cowgill
Details
Proposed fix (418 bytes, patch)
2017-08-29 15:40 UTC, mpf
Details | Diff
Updated fix (547 bytes, patch)
2017-10-30 10:20 UTC, mpf
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Aurelien Jarno 2017-08-10 15:38:39 UTC
Created attachment 41965 [details]
testcase

Since the switch to GCC 7 as the default compiler, Debian sees many miscompilations on mips64el where variables which are stored on the stack and smaller than 8 bytes long (e.g. uint16_t, bool) are wrongly reloaded with the LD instruction.

I unfortunately haven't been able to get a self-contained reduced testcase, but the attached testcase (from apparmor) should show the issue.

The aa_policy_cache_new function takes a uint16_t as the fourth argument. It gets passed the max_caches, which also has a uint16_t type. When compiled with g++  -Wfatal-errors -g -O2 -fstack-protector-strong -c -o testcase.o testcase.ii the generated code contains:

 164:   dea50000        ld      a1,0(s5)
                        164: R_MIPS_GOT_OFST    .bss+0x38
                        164: R_MIPS_NONE        *ABS*+0x38
                        164: R_MIPS_NONE        *ABS*+0x38
 168:   dfa80000        ld      a4,0(sp)
 16c:   2406ff9c        li      a2,-100
 170:   0320f809        jalr    t9
                        170: R_MIPS_JALR        aa_policy_cache_new
                        170: R_MIPS_NONE        *ABS*


As you can see, a4 is loaded with the LD instruction, so the upper bytes are just garbage and causes the aa_policy_cache_new to not function correctly.
Comment 1 James Cowgill 2017-08-10 16:15:46 UTC
Created attachment 41966 [details]
testcase-b

Here's another testcase which is reduced a bit further using creduce. It requires -O2 to trigger the bug (but that may be unrelated).

Compile using:
> mips64el-linux-gnuabi64-g++ -g -O2 -fstack-protector-strong -c -o test.o test.c

mips64el-linux-gnuabi64-objdump -dr test.o:
>  98:   8c420000        lw      v0,0(v0)
>  9c:   0002102b        sltu    v0,zero,v0
>  a0:   afa20008        sw      v0,8(sp)
>  a4:   df990000        ld      t9,0(gp)
>                        a4: R_MIPS_CALL16       _Z19aa_policy_cache_newPiiiis
>                        a4: R_MIPS_NONE *ABS*
>                        a4: R_MIPS_NONE *ABS*
>  a8:   dfa80008        ld      a4,8(sp)
>  ac:   00003825        move    a3,zero
>  b0:   00003025        move    a2,zero
>  b4:   00002825        move    a1,zero
>  b8:   0320f809        jalr    t9
>                        b8: R_MIPS_JALR _Z19aa_policy_cache_newPiiiis
>                        b8: R_MIPS_NONE *ABS*
>                        b8: R_MIPS_NONE *ABS*

Here, the value calculated in v0 is stored to the stack in 8(sp) using sw, but then loaded later using ld.
Comment 2 Aurelien Jarno 2017-08-10 16:27:13 UTC
(In reply to James Cowgill from comment #1)
> Here's another testcase which is reduced a bit further using creduce. It
> requires -O2 to trigger the bug (but that may be unrelated).

I reported it as a miscompilation with -O1, as at least in the clamav case, the bug is not reproducible with -O0, but is reproducible with both -O1 and -O2. But it seems this bug is quite dependent on the surrounding code, so my guess is that the issue is not triggered by a specific optimization.
Comment 3 Eric Botcazou 2017-08-10 17:45:54 UTC
Very likely LRA, there is a known weakness on little-endian MIPS.
Comment 4 Eric Botcazou 2017-08-10 17:46:41 UTC
Investigating.
Comment 5 James Cowgill 2017-08-17 16:36:59 UTC
I have just noticed this which seems curious. Is the 39 -> 40 combine really a valid transformation? It seems we've lost the sign extension and we're just putting a 32-bit value into a 64-bit register without trying to clear the upper bits anymore?

Before
-----
(insn 38 37 39 3 (set (reg:SI 235)
        (ne:SI (reg:SI 234 [ h ])
            (const_int 0 [0]))) "reduced.c":16 500 {*sne_zero_sisi}
     (expr_list:REG_DEAD (reg:SI 234 [ h ])
        (nil)))
(insn 39 38 40 3 (set (reg:HI 233)
        (subreg:HI (reg:SI 235) 0)) "reduced.c":16 358 {*movhi_internal}
     (expr_list:REG_DEAD (reg:SI 235)
        (nil)))
(insn 40 39 133 3 (set (reg:DI 207 [ iftmp.3_18 ])
        (sign_extend:DI (reg:HI 233))) "reduced.c":16 244 {*extendhidi2_seh}
     (expr_list:REG_DEAD (reg:HI 233)
        (nil)))

combine
-----
Trying 39 -> 40:
Successfully matched this instruction:
(set (reg:DI 207 [ iftmp.3_18 ])
    (subreg:DI (reg:SI 235) 0))
allowing combination of insns 39 and 40
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 39.
modifying insn i3    40: r207:DI=r235:SI#0
      REG_DEAD r235:SI
deferring rescan insn with uid = 40.

Trying 38 -> 40:
Successfully matched this instruction:
(set (subreg:SI (reg:DI 207 [ iftmp.3_18 ]) 0)
    (ne:SI (reg:SI 234 [ h ])
        (const_int 0 [0])))
allowing combination of insns 38 and 40
original costs 4 + 4 = 8
replacement cost 4
deferring deletion of insn with uid = 38.
modifying insn i3    40: r207:DI#0=r234:SI!=0
      REG_DEAD r234:SI
deferring rescan insn with uid = 40.

After
------
(note 38 37 39 3 NOTE_INSN_DELETED)
(note 39 38 40 3 NOTE_INSN_DELETED)
(insn 40 39 133 3 (set (subreg:SI (reg:DI 207 [ iftmp.3_18 ]) 0)
        (ne:SI (reg:SI 234 [ h ])
            (const_int 0 [0]))) "reduced.c":16 500 {*sne_zero_sisi}
     (expr_list:REG_DEAD (reg:SI 234 [ h ])
        (nil)))
Comment 6 Eric Botcazou 2017-08-17 17:42:14 UTC
> I have just noticed this which seems curious. Is the 39 -> 40 combine really
> a valid transformation? It seems we've lost the sign extension and we're
> just putting a 32-bit value into a 64-bit register without trying to clear
> the upper bits anymore?

Yes, for WORD_REGISTER_OPERATIONS architectures the combiner can do things like that, although there might be bugs lurking of course.
Comment 7 mpf 2017-08-29 13:58:37 UTC
(In reply to Eric Botcazou from comment #6)
> > I have just noticed this which seems curious. Is the 39 -> 40 combine really
> > a valid transformation? It seems we've lost the sign extension and we're
> > just putting a 32-bit value into a 64-bit register without trying to clear
> > the upper bits anymore?
> 
> Yes, for WORD_REGISTER_OPERATIONS architectures the combiner can do things
> like that, although there might be bugs lurking of course.

Yes, this looks like a valid transformation to me. The upper 63 bits on r235 are guaranteed to be zero so the sign extension can be eliminated trivially but also it is actually a zero extension and could be eliminated because the LOAD_EXTEND_OP is ZERO_EXT for HImode.

Not looking forward to investigating this!
Comment 8 James Cowgill 2017-08-29 14:33:57 UTC
Created attachment 42072 [details]
testcase-c

Here is another smaller testcase which was manually created. It works by using asm to clobber all the registers and passing -ffixed for the saved registers to force a stack spill and trigger the bug.

Compile with:
> gcc -S -O2 -fstack-protector-strong -ffixed-s0 -ffixed-s1 -ffixed-s2 -ffixed-s3 -ffixed-s4 -ffixed-s5 -ffixed-s6 -ffixed-s7 test.c
Comment 9 mpf 2017-08-29 15:40:38 UTC
Created attachment 42075 [details]
Proposed fix

Off-thread James pointed out that one of my patches I did last year appeared to fix this issue but it was one I reverted owing to breaking ARM (and probably others). The thread was:

https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html

At the time I thought I had fixed the same problem twice and that the changes to simplify_operand_subreg were sufficient. It occurs to me however that simplify_operand_subreg only has the opportunity to fix WORD_REGISTER_OPERATIONS issues affecting subreg(mem) patterns not subreg(reg) patterns where reg is a pseudo that may yet be spilled.

I can't say that I am 100% convinced yet with my thinking here but I've attached an updated version of the original patch with some changes:

* Incorporated Eric's feedback on the original patch to check GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is strictly narrower
* Limited the test to word sized inner modes or smaller
* Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it would matter if we do a narrower input reload

This is barely tested but does fix testcase-c which is the only one I can get to trigger.
Comment 10 Eric Botcazou 2017-08-29 19:10:42 UTC
> Created attachment 42075 [details]
> Proposed fix
> 
> Off-thread James pointed out that one of my patches I did last year appeared
> to fix this issue but it was one I reverted owing to breaking ARM (and
> probably others). The thread was:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html

Great, reassigned to you then, but I can provide testing on SPARC of course.
Comment 11 Chen Qi 2017-10-30 02:32:27 UTC
mpf,

I'm working on Yocto project. I also encountered a similar problem.

In brief, on qemumips64, `systemctl status <xxx>' has the output of `systemctl show <xxx>'. I debugged a little bit and found that it's not the code logic's problem. Instead it's gcc optimization problem. With '-O1', the problem is reproduced; with '-O0', the problem is gone.

Please see https://bugzilla.yoctoproject.org/show_bug.cgi?id=12266 for more details.

I used the patch you created in attachment. And things work correctly with '-O2'.

Is there any plan sending the patch to gcc mailing list?
Is there any known problem about the patch?

Best Regards,
Chen Qi

(In reply to mpf from comment #9)
> Created attachment 42075 [details]
> Proposed fix
> 
> Off-thread James pointed out that one of my patches I did last year appeared
> to fix this issue but it was one I reverted owing to breaking ARM (and
> probably others). The thread was:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html
> 
> At the time I thought I had fixed the same problem twice and that the
> changes to simplify_operand_subreg were sufficient. It occurs to me however
> that simplify_operand_subreg only has the opportunity to fix
> WORD_REGISTER_OPERATIONS issues affecting subreg(mem) patterns not
> subreg(reg) patterns where reg is a pseudo that may yet be spilled.
> 
> I can't say that I am 100% convinced yet with my thinking here but I've
> attached an updated version of the original patch with some changes:
> 
> * Incorporated Eric's feedback on the original patch to check
> GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is
> strictly narrower
> * Limited the test to word sized inner modes or smaller
> * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it
> would matter if we do a narrower input reload
> 
> This is barely tested but does fix testcase-c which is the only one I can
> get to trigger.
Comment 12 Eric Botcazou 2017-10-30 08:25:52 UTC
(In reply to Chen Qi from comment #11)
> mpf,
> 
> I'm working on Yocto project. I also encountered a similar problem.
> 
> I used the patch you created in attachment. And things work correctly with
> '-O2'.

Great, thanks for testing.

> Is there any plan sending the patch to gcc mailing list?
> Is there any known problem about the patch?

I'm going to submit the patch for review.
Comment 13 Eric Botcazou 2017-10-30 09:37:08 UTC
> I can't say that I am 100% convinced yet with my thinking here but I've
> attached an updated version of the original patch with some changes:
> 
> * Incorporated Eric's feedback on the original patch to check
> GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is
> strictly narrower
> * Limited the test to word sized inner modes or smaller
> * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it
> would matter if we do a narrower input reload

I'm not convinced for the 3rd restriction though, as push_reload treats the IN and OUT cases exactly the same wrt WORD_REGISTER_OPERATIONS.
Comment 14 mpf 2017-10-30 10:20:03 UTC
Created attachment 42498 [details]
Updated fix

I'm really sorry for my lack of work on this. Too much going on in day to day work for quite a while now.

(In reply to Eric Botcazou from comment #13)
> > I can't say that I am 100% convinced yet with my thinking here but I've
> > attached an updated version of the original patch with some changes:
> > 
> > * Incorporated Eric's feedback on the original patch to check
> > GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is
> > strictly narrower
> > * Limited the test to word sized inner modes or smaller
> > * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it
> > would matter if we do a narrower input reload
> 
> I'm not convinced for the 3rd restriction though, as push_reload treats the
> IN and OUT cases exactly the same wrt WORD_REGISTER_OPERATIONS.

I don't think the restriction is required for functional correctness but I thought we may as well take advantage of a narrower load in the OP_IN case if we could get away with it.

The patch has a serious bug that I started testing but failed to report here. The bracketing was wrong by one level, an updated version is attached.
Comment 15 Eric Botcazou 2017-10-30 10:36:11 UTC
> I don't think the restriction is required for functional correctness but I
> thought we may as well take advantage of a narrower load in the OP_IN case
> if we could get away with it.

I don't think you're guaranteed that this reload will yield a load though.

> The patch has a serious bug that I started testing but failed to report
> here. The bracketing was wrong by one level, an updated version is attached.

Right, but the compiler fortunately complained.
Comment 16 mpf 2017-10-30 11:08:58 UTC
(In reply to Eric Botcazou from comment #15)
> > I don't think the restriction is required for functional correctness but I
> > thought we may as well take advantage of a narrower load in the OP_IN case
> > if we could get away with it.
> 
> I don't think you're guaranteed that this reload will yield a load though.

If it ends up as a simple register move though then that shouldn't be harmful either as it would be a WORD_REGISTER_OPERATION. I have no objection to removing the restriction though; it is easier to reason about without the special case.
Comment 17 Eric Botcazou 2017-10-30 11:35:10 UTC
> If it ends up as a simple register move though then that shouldn't be
> harmful either as it would be a WORD_REGISTER_OPERATION. I have no objection
> to removing the restriction though; it is easier to reason about without the
> special case.

Yes, this would be my recommendation.  I just successfully tested this variant of the fix on SPARC64/Linux.
Comment 18 Eric Botcazou 2017-10-30 12:16:41 UTC
Recategorizing.
Comment 19 Eric Botcazou 2017-10-31 18:28:23 UTC
Author: ebotcazou
Date: Tue Oct 31 18:27:52 2017
New Revision: 254275

URL: https://gcc.gnu.org/viewcvs?rev=254275&root=gcc&view=rev
Log:
	PR rtl-optimization/81803
	* lra-constraints.c (curr_insn_transform): Also reload the whole
	register for a strict subreg no wider than a word if this is for
	a WORD_REGISTER_OPERATIONS target.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-constraints.c
Comment 20 Eric Botcazou 2017-11-01 13:44:44 UTC
Assigning back.
Comment 21 Eric Botcazou 2017-11-07 07:45:39 UTC
Author: ebotcazou
Date: Tue Nov  7 07:44:58 2017
New Revision: 254488

URL: https://gcc.gnu.org/viewcvs?rev=254488&root=gcc&view=rev
Log:
	PR rtl-optimization/81803
	* lra-constraints.c (curr_insn_transform): Also reload the whole
	register for a strict subreg no wider than a word if this is for
	a WORD_REGISTER_OPERATIONS target.

Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/lra-constraints.c
Comment 22 Eric Botcazou 2017-11-07 07:47:36 UTC
Reopen if necessary.