Created attachment 41965 [details] testcase Since the switch to GCC 7 as the default compiler, Debian sees many miscompilations on mips64el where variables which are stored on the stack and smaller than 8 bytes long (e.g. uint16_t, bool) are wrongly reloaded with the LD instruction. I unfortunately haven't been able to get a self-contained reduced testcase, but the attached testcase (from apparmor) should show the issue. The aa_policy_cache_new function takes a uint16_t as the fourth argument. It gets passed the max_caches, which also has a uint16_t type. When compiled with g++ -Wfatal-errors -g -O2 -fstack-protector-strong -c -o testcase.o testcase.ii the generated code contains: 164: dea50000 ld a1,0(s5) 164: R_MIPS_GOT_OFST .bss+0x38 164: R_MIPS_NONE *ABS*+0x38 164: R_MIPS_NONE *ABS*+0x38 168: dfa80000 ld a4,0(sp) 16c: 2406ff9c li a2,-100 170: 0320f809 jalr t9 170: R_MIPS_JALR aa_policy_cache_new 170: R_MIPS_NONE *ABS* As you can see, a4 is loaded with the LD instruction, so the upper bytes are just garbage and causes the aa_policy_cache_new to not function correctly.
Created attachment 41966 [details] testcase-b Here's another testcase which is reduced a bit further using creduce. It requires -O2 to trigger the bug (but that may be unrelated). Compile using: > mips64el-linux-gnuabi64-g++ -g -O2 -fstack-protector-strong -c -o test.o test.c mips64el-linux-gnuabi64-objdump -dr test.o: > 98: 8c420000 lw v0,0(v0) > 9c: 0002102b sltu v0,zero,v0 > a0: afa20008 sw v0,8(sp) > a4: df990000 ld t9,0(gp) > a4: R_MIPS_CALL16 _Z19aa_policy_cache_newPiiiis > a4: R_MIPS_NONE *ABS* > a4: R_MIPS_NONE *ABS* > a8: dfa80008 ld a4,8(sp) > ac: 00003825 move a3,zero > b0: 00003025 move a2,zero > b4: 00002825 move a1,zero > b8: 0320f809 jalr t9 > b8: R_MIPS_JALR _Z19aa_policy_cache_newPiiiis > b8: R_MIPS_NONE *ABS* > b8: R_MIPS_NONE *ABS* Here, the value calculated in v0 is stored to the stack in 8(sp) using sw, but then loaded later using ld.
(In reply to James Cowgill from comment #1) > Here's another testcase which is reduced a bit further using creduce. It > requires -O2 to trigger the bug (but that may be unrelated). I reported it as a miscompilation with -O1, as at least in the clamav case, the bug is not reproducible with -O0, but is reproducible with both -O1 and -O2. But it seems this bug is quite dependent on the surrounding code, so my guess is that the issue is not triggered by a specific optimization.
Very likely LRA, there is a known weakness on little-endian MIPS.
Investigating.
I have just noticed this which seems curious. Is the 39 -> 40 combine really a valid transformation? It seems we've lost the sign extension and we're just putting a 32-bit value into a 64-bit register without trying to clear the upper bits anymore? Before ----- (insn 38 37 39 3 (set (reg:SI 235) (ne:SI (reg:SI 234 [ h ]) (const_int 0 [0]))) "reduced.c":16 500 {*sne_zero_sisi} (expr_list:REG_DEAD (reg:SI 234 [ h ]) (nil))) (insn 39 38 40 3 (set (reg:HI 233) (subreg:HI (reg:SI 235) 0)) "reduced.c":16 358 {*movhi_internal} (expr_list:REG_DEAD (reg:SI 235) (nil))) (insn 40 39 133 3 (set (reg:DI 207 [ iftmp.3_18 ]) (sign_extend:DI (reg:HI 233))) "reduced.c":16 244 {*extendhidi2_seh} (expr_list:REG_DEAD (reg:HI 233) (nil))) combine ----- Trying 39 -> 40: Successfully matched this instruction: (set (reg:DI 207 [ iftmp.3_18 ]) (subreg:DI (reg:SI 235) 0)) allowing combination of insns 39 and 40 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 39. modifying insn i3 40: r207:DI=r235:SI#0 REG_DEAD r235:SI deferring rescan insn with uid = 40. Trying 38 -> 40: Successfully matched this instruction: (set (subreg:SI (reg:DI 207 [ iftmp.3_18 ]) 0) (ne:SI (reg:SI 234 [ h ]) (const_int 0 [0]))) allowing combination of insns 38 and 40 original costs 4 + 4 = 8 replacement cost 4 deferring deletion of insn with uid = 38. modifying insn i3 40: r207:DI#0=r234:SI!=0 REG_DEAD r234:SI deferring rescan insn with uid = 40. After ------ (note 38 37 39 3 NOTE_INSN_DELETED) (note 39 38 40 3 NOTE_INSN_DELETED) (insn 40 39 133 3 (set (subreg:SI (reg:DI 207 [ iftmp.3_18 ]) 0) (ne:SI (reg:SI 234 [ h ]) (const_int 0 [0]))) "reduced.c":16 500 {*sne_zero_sisi} (expr_list:REG_DEAD (reg:SI 234 [ h ]) (nil)))
> I have just noticed this which seems curious. Is the 39 -> 40 combine really > a valid transformation? It seems we've lost the sign extension and we're > just putting a 32-bit value into a 64-bit register without trying to clear > the upper bits anymore? Yes, for WORD_REGISTER_OPERATIONS architectures the combiner can do things like that, although there might be bugs lurking of course.
(In reply to Eric Botcazou from comment #6) > > I have just noticed this which seems curious. Is the 39 -> 40 combine really > > a valid transformation? It seems we've lost the sign extension and we're > > just putting a 32-bit value into a 64-bit register without trying to clear > > the upper bits anymore? > > Yes, for WORD_REGISTER_OPERATIONS architectures the combiner can do things > like that, although there might be bugs lurking of course. Yes, this looks like a valid transformation to me. The upper 63 bits on r235 are guaranteed to be zero so the sign extension can be eliminated trivially but also it is actually a zero extension and could be eliminated because the LOAD_EXTEND_OP is ZERO_EXT for HImode. Not looking forward to investigating this!
Created attachment 42072 [details] testcase-c Here is another smaller testcase which was manually created. It works by using asm to clobber all the registers and passing -ffixed for the saved registers to force a stack spill and trigger the bug. Compile with: > gcc -S -O2 -fstack-protector-strong -ffixed-s0 -ffixed-s1 -ffixed-s2 -ffixed-s3 -ffixed-s4 -ffixed-s5 -ffixed-s6 -ffixed-s7 test.c
Created attachment 42075 [details] Proposed fix Off-thread James pointed out that one of my patches I did last year appeared to fix this issue but it was one I reverted owing to breaking ARM (and probably others). The thread was: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html At the time I thought I had fixed the same problem twice and that the changes to simplify_operand_subreg were sufficient. It occurs to me however that simplify_operand_subreg only has the opportunity to fix WORD_REGISTER_OPERATIONS issues affecting subreg(mem) patterns not subreg(reg) patterns where reg is a pseudo that may yet be spilled. I can't say that I am 100% convinced yet with my thinking here but I've attached an updated version of the original patch with some changes: * Incorporated Eric's feedback on the original patch to check GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is strictly narrower * Limited the test to word sized inner modes or smaller * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it would matter if we do a narrower input reload This is barely tested but does fix testcase-c which is the only one I can get to trigger.
> Created attachment 42075 [details] > Proposed fix > > Off-thread James pointed out that one of my patches I did last year appeared > to fix this issue but it was one I reverted owing to breaking ARM (and > probably others). The thread was: > > https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html Great, reassigned to you then, but I can provide testing on SPARC of course.
mpf, I'm working on Yocto project. I also encountered a similar problem. In brief, on qemumips64, `systemctl status <xxx>' has the output of `systemctl show <xxx>'. I debugged a little bit and found that it's not the code logic's problem. Instead it's gcc optimization problem. With '-O1', the problem is reproduced; with '-O0', the problem is gone. Please see https://bugzilla.yoctoproject.org/show_bug.cgi?id=12266 for more details. I used the patch you created in attachment. And things work correctly with '-O2'. Is there any plan sending the patch to gcc mailing list? Is there any known problem about the patch? Best Regards, Chen Qi (In reply to mpf from comment #9) > Created attachment 42075 [details] > Proposed fix > > Off-thread James pointed out that one of my patches I did last year appeared > to fix this issue but it was one I reverted owing to breaking ARM (and > probably others). The thread was: > > https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00518.html > > At the time I thought I had fixed the same problem twice and that the > changes to simplify_operand_subreg were sufficient. It occurs to me however > that simplify_operand_subreg only has the opportunity to fix > WORD_REGISTER_OPERATIONS issues affecting subreg(mem) patterns not > subreg(reg) patterns where reg is a pseudo that may yet be spilled. > > I can't say that I am 100% convinced yet with my thinking here but I've > attached an updated version of the original patch with some changes: > > * Incorporated Eric's feedback on the original patch to check > GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is > strictly narrower > * Limited the test to word sized inner modes or smaller > * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it > would matter if we do a narrower input reload > > This is barely tested but does fix testcase-c which is the only one I can > get to trigger.
(In reply to Chen Qi from comment #11) > mpf, > > I'm working on Yocto project. I also encountered a similar problem. > > I used the patch you created in attachment. And things work correctly with > '-O2'. Great, thanks for testing. > Is there any plan sending the patch to gcc mailing list? > Is there any known problem about the patch? I'm going to submit the patch for review.
> I can't say that I am 100% convinced yet with my thinking here but I've > attached an updated version of the original patch with some changes: > > * Incorporated Eric's feedback on the original patch to check > GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is > strictly narrower > * Limited the test to word sized inner modes or smaller > * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it > would matter if we do a narrower input reload I'm not convinced for the 3rd restriction though, as push_reload treats the IN and OUT cases exactly the same wrt WORD_REGISTER_OPERATIONS.
Created attachment 42498 [details] Updated fix I'm really sorry for my lack of work on this. Too much going on in day to day work for quite a while now. (In reply to Eric Botcazou from comment #13) > > I can't say that I am 100% convinced yet with my thinking here but I've > > attached an updated version of the original patch with some changes: > > > > * Incorporated Eric's feedback on the original patch to check > > GET_MODE_PRECISION instead of GET_MODE_SIZE for comparing whether a mode is > > strictly narrower > > * Limited the test to word sized inner modes or smaller > > * Limited the test to OP_OUT or OP_INOUT as I can't see any reason why it > > would matter if we do a narrower input reload > > I'm not convinced for the 3rd restriction though, as push_reload treats the > IN and OUT cases exactly the same wrt WORD_REGISTER_OPERATIONS. I don't think the restriction is required for functional correctness but I thought we may as well take advantage of a narrower load in the OP_IN case if we could get away with it. The patch has a serious bug that I started testing but failed to report here. The bracketing was wrong by one level, an updated version is attached.
> I don't think the restriction is required for functional correctness but I > thought we may as well take advantage of a narrower load in the OP_IN case > if we could get away with it. I don't think you're guaranteed that this reload will yield a load though. > The patch has a serious bug that I started testing but failed to report > here. The bracketing was wrong by one level, an updated version is attached. Right, but the compiler fortunately complained.
(In reply to Eric Botcazou from comment #15) > > I don't think the restriction is required for functional correctness but I > > thought we may as well take advantage of a narrower load in the OP_IN case > > if we could get away with it. > > I don't think you're guaranteed that this reload will yield a load though. If it ends up as a simple register move though then that shouldn't be harmful either as it would be a WORD_REGISTER_OPERATION. I have no objection to removing the restriction though; it is easier to reason about without the special case.
> If it ends up as a simple register move though then that shouldn't be > harmful either as it would be a WORD_REGISTER_OPERATION. I have no objection > to removing the restriction though; it is easier to reason about without the > special case. Yes, this would be my recommendation. I just successfully tested this variant of the fix on SPARC64/Linux.
Recategorizing.
Author: ebotcazou Date: Tue Oct 31 18:27:52 2017 New Revision: 254275 URL: https://gcc.gnu.org/viewcvs?rev=254275&root=gcc&view=rev Log: PR rtl-optimization/81803 * lra-constraints.c (curr_insn_transform): Also reload the whole register for a strict subreg no wider than a word if this is for a WORD_REGISTER_OPERATIONS target. Modified: trunk/gcc/ChangeLog trunk/gcc/lra-constraints.c
Assigning back.
Author: ebotcazou Date: Tue Nov 7 07:44:58 2017 New Revision: 254488 URL: https://gcc.gnu.org/viewcvs?rev=254488&root=gcc&view=rev Log: PR rtl-optimization/81803 * lra-constraints.c (curr_insn_transform): Also reload the whole register for a strict subreg no wider than a word if this is for a WORD_REGISTER_OPERATIONS target. Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/lra-constraints.c
Reopen if necessary.