This is a slightly modified test case: gcc.c-torture/execute/pr34971.c struct foo { unsigned long long b:40; } x; extern void abort (void); void test1(unsigned long long res) { /* Build a rotate expression on a 40 bit argument. */ if ((x.b<<9) + (x.b>>31) != res) abort (); } int main() { x.b = 0x0100000001; test1(0x0000000202); x.b = 0x0100000000; test1(0x0000000002); return 0; } when compiled with these options it fails: gcc -mcpu=cortex-a9 -mthumb -mfpu=neon-fp16 -O0
some background about this bug can be found here: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html
(In reply to Bernd Edlinger from comment #1) > some background about this bug can be found here: > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html The <shift>di3_neon pattern does not constrain the input not to overlap with the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code does not handle partial overlaps. However it is feasible to fix that by swapping the order for the various cases.
(In reply to Wilco from comment #2) > (In reply to Bernd Edlinger from comment #1) > > some background about this bug can be found here: > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html > > The <shift>di3_neon pattern does not constrain the input not to overlap with > the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code > does not handle partial overlaps. However it is feasible to fix that by > swapping the order for the various cases. from <shift>di3_neon: if (INTVAL (operands[2]) < 1) { emit_insn (gen_movdi (operands[0], operands[1])); DONE; } Will the movdi pattern work with partial overlaps? It does basically this: emit_move_insn (gen_lowpart (SImode, operands[0]), gen_lowpart (SImode, operands[1])); emit_move_insn (gen_highpart (SImode, operands[0]), gen_highpart (SImode, operands[1]));
(In reply to Bernd Edlinger from comment #3) > (In reply to Wilco from comment #2) > > (In reply to Bernd Edlinger from comment #1) > > > some background about this bug can be found here: > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01561.html > > > > The <shift>di3_neon pattern does not constrain the input not to overlap with > > the output for immediate shifts, and the arm_emit_coreregs_64bit_shift code > > does not handle partial overlaps. However it is feasible to fix that by > > swapping the order for the various cases. > > from <shift>di3_neon: > > if (INTVAL (operands[2]) < 1) > { > emit_insn (gen_movdi (operands[0], operands[1])); > DONE; > } > > Will the movdi pattern work with partial overlaps? > It does basically this: > > emit_move_insn (gen_lowpart (SImode, operands[0]), > gen_lowpart (SImode, operands[1])); > emit_move_insn (gen_highpart (SImode, operands[0]), > gen_highpart (SImode, operands[1])); I think it's OK - that code only triggers if a movdi has a physical register that is not a valid DI register which is not the case we're dealing with. movdi has a split that does check for partial overlap around line 5900 in arm.md: /* Handle a partial overlap. */ if (rtx_equal_p (operands[0], operands[3])) ... However dealing with partial overlaps is complex so maybe the best option would be to add alternatives to <shift>di3_neon to either allow full overlap "r 0 X X X" or no overlap "&r r X X X". The shift code works with full overlap.
(In reply to Wilco from comment #4) > However dealing with partial overlaps is complex so maybe the best option > would be to add alternatives to <shift>di3_neon to either allow full overlap > "r 0 X X X" or no overlap "&r r X X X". The shift code works with full > overlap. That sounds like a good idea. Then this condition in <shift>di3_neon could go away too: && (!reg_overlap_mentioned_p (operands[0], operands[1]) || REGNO (operands[0]) == REGNO (operands[1])))
(In reply to Bernd Edlinger from comment #5) > (In reply to Wilco from comment #4) > > However dealing with partial overlaps is complex so maybe the best option > > would be to add alternatives to <shift>di3_neon to either allow full overlap > > "r 0 X X X" or no overlap "&r r X X X". The shift code works with full > > overlap. > > That sounds like a good idea. > > Then this condition in <shift>di3_neon could go away too: > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > || REGNO (operands[0]) == REGNO (operands[1]))) Note that we don't want to restrict complete overlaps, only partial overlaps. Restricting complete overlaps leads to significant increase in register pressure and a lot of redundant copying.
(In reply to Richard Earnshaw from comment #6) > (In reply to Bernd Edlinger from comment #5) > > (In reply to Wilco from comment #4) > > > However dealing with partial overlaps is complex so maybe the best option > > > would be to add alternatives to <shift>di3_neon to either allow full overlap > > > "r 0 X X X" or no overlap "&r r X X X". The shift code works with full > > > overlap. > > > > That sounds like a good idea. > > > > Then this condition in <shift>di3_neon could go away too: > > > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > > || REGNO (operands[0]) == REGNO (operands[1]))) > > Note that we don't want to restrict complete overlaps, only partial > overlaps. Restricting complete overlaps leads to significant increase in > register pressure and a lot of redundant copying. Yes. That is Wilco's idea: instead of =r 0r X X X use =r 0 X X X and =&r r X X X, that should ensure that no partial overlap happens, just full overlap or nothing. That's what arm_emit_coreregs_64bit_shift and arm_ashldi3_1bit can handle. Who will do it?
(In reply to Bernd Edlinger from comment #7) > (In reply to Richard Earnshaw from comment #6) > > (In reply to Bernd Edlinger from comment #5) > > > (In reply to Wilco from comment #4) > > > > However dealing with partial overlaps is complex so maybe the best option > > > > would be to add alternatives to <shift>di3_neon to either allow full overlap > > > > "r 0 X X X" or no overlap "&r r X X X". The shift code works with full > > > > overlap. > > > > > > That sounds like a good idea. > > > > > > Then this condition in <shift>di3_neon could go away too: > > > > > > && (!reg_overlap_mentioned_p (operands[0], operands[1]) > > > || REGNO (operands[0]) == REGNO (operands[1]))) > > > > Note that we don't want to restrict complete overlaps, only partial > > overlaps. Restricting complete overlaps leads to significant increase in > > register pressure and a lot of redundant copying. > > Yes. > > That is Wilco's idea: instead of =r 0r X X X > use =r 0 X X X and =&r r X X X, that should ensure that > no partial overlap happens, just full overlap or nothing. > > That's what arm_emit_coreregs_64bit_shift > and arm_ashldi3_1bit can handle. > > Who will do it? I've got a patch that fixes it, it's being tested. While looking at how DI mode operations get expanded, I noticed there is a CQ issue with your shift change. Shifts that are expanded early now use extra registers due to the DI mode write of zero. Given all other DI mode operations are expanded after reload, it may be better to do the same for shifts too.
(In reply to Wilco from comment #8) > > I've got a patch that fixes it, it's being tested. > > While looking at how DI mode operations get expanded, I noticed there is a > CQ issue with your shift change. Shifts that are expanded early now use > extra registers due to the DI mode write of zero. Given all other DI mode > operations are expanded after reload, it may be better to do the same for > shifts too. Interesting idea. After reload there is no need anymore to zero the DI mode register at all, so that could become completely unnecessary.
Confirmed then. Wilco, if you're working on this can you please assign it to yourself?
(In reply to ktkachov from comment #10) > Confirmed then. Wilco, if you're working on this can you please assign it to > yourself? Unfortunately the form doesn't allow me to do anything with the headers...
(In reply to Wilco from comment #11) > (In reply to ktkachov from comment #10) > > Confirmed then. Wilco, if you're working on this can you please assign it to > > yourself? > > Unfortunately the form doesn't allow me to do anything with the headers... I know that happens.
.
Patch at: https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01926.html
FYI: You could merge the two alternatives into one. =?r,?&r 0, r i, i is equivalent to =?&r 0r i
Author: wilco Date: Tue Oct 25 10:25:28 2016 New Revision: 241508 URL: https://gcc.gnu.org/viewcvs?rev=241508&root=gcc&view=rev Log: With -fpu=neon DI mode shifts are expanded after reload. DI mode registers can either fully or partially overlap on both ARM and Thumb-2. However the shift expansion code can only deal with the full overlap case, and generates incorrect code for partial overlaps. The fix is to add new variants that support either full overlap or no overlap. gcc/ PR target/78041 * config/arm/neon.md (ashldi3_neon): Add "r 0 i" and "&r r i" variants. Remove partial overlap check for shift by 1. (ashldi3_neon): Likewise. testsuite/ * gcc.target/arm/pr78041.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/pr78041.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/neon.md trunk/gcc/testsuite/ChangeLog
(In reply to Bernd Edlinger from comment #15) > FYI: You could merge the two alternatives into one. > > =?r,?&r > 0, r > i, i > > is equivalent to > > =?&r > 0r > i Yes, that seems possible. These patterns will need to be changed significantly to fix PR 77308, so this can be done as part of that work.
Author: wilco Date: Fri Jan 6 14:26:06 2017 New Revision: 244161 URL: https://gcc.gnu.org/viewcvs?rev=244161&root=gcc&view=rev Log: With -fpu=neon DI mode shifts are expanded after reload. DI mode registers can either fully or partially overlap on both ARM and Thumb-2. However the shift expansion code can only deal with the full overlap case, and generates incorrect code for partial overlaps. The fix is to add new variants that support either full overlap or no overlap. Backport from mainline 2016-10-25 Wilco Dijkstra <wdijkstr@arm.com> gcc/ PR target/78041 * config/arm/neon.md (ashldi3_neon): Add "r 0 i" and "&r r i" variants. Remove partial overlap check for shift by 1. (ashldi3_neon): Likewise. testsuite/ * gcc.target/arm/pr78041.c: New test. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/pr78041.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/arm/neon.md branches/gcc-6-branch/gcc/testsuite/ChangeLog
Author: wilco Date: Tue Jan 24 14:14:12 2017 New Revision: 244872 URL: https://gcc.gnu.org/viewcvs?rev=244872&root=gcc&view=rev Log: With -fpu=neon DI mode shifts are expanded after reload. DI mode registers can either fully or partially overlap on both ARM and Thumb-2. However the shift expansion code can only deal with the full overlap case, and generates incorrect code for partial overlaps. The fix is to add new variants that support either full overlap or no overlap. Backport from mainline gcc/ PR target/78041 * config/arm/neon.md (ashldi3_neon): Add "r 0 i" and "&r r i" variants. Remove partial overlap check for shift by 1. (ashldi3_neon): Likewise. testsuite/ * gcc.target/arm/pr78041.c: New test. Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/config/arm/neon.md branches/gcc-5-branch/gcc/testsuite/ChangeLog
Fixed in GCC5, GCC6 and GCC7.