This bug _might_ be related to PR56339, although that report talks about a regression compared to 4.7, while this bug seems to be a regression compared to 4.4. I was converting some hand-written asm code to SSE-intrinsics, but unfortunately the version using intrinsics generates worse code. It contains two unnecessary 'movdqa' instructions. I managed to reduce my test to this routine: //-------------------------------------------------------------- #include <emmintrin.h> void test1(const __m128i* in1, const __m128i* in2, __m128i* out, __m128i f, __m128i zero) { __m128i c = _mm_avg_epu8(*in1, *in2); __m128i l = _mm_unpacklo_epi8(c, zero); __m128i h = _mm_unpackhi_epi8(c, zero); __m128i m = _mm_mulhi_epu16(l, f); __m128i n = _mm_mulhi_epu16(h, f); *out = _mm_packus_epi16(m, n); } //-------------------------------------------------------------- A (few days old) gcc snapshot generates the following code. Versions 4.5, 4.6 and 4.7 generate similar code: 0: 66 0f 6f 17 movdqa (%rdi),%xmm2 4: 66 0f e0 16 pavgb (%rsi),%xmm2 8: 66 0f 6f da movdqa %xmm2,%xmm3 c: 66 0f 68 d1 punpckhbw %xmm1,%xmm2 10: 66 0f 60 d9 punpcklbw %xmm1,%xmm3 14: 66 0f e4 d0 pmulhuw %xmm0,%xmm2 18: 66 0f 6f cb movdqa %xmm3,%xmm1 1c: 66 0f e4 c8 pmulhuw %xmm0,%xmm1 20: 66 0f 6f c1 movdqa %xmm1,%xmm0 24: 66 0f 67 c2 packuswb %xmm2,%xmm0 28: 66 0f 7f 02 movdqa %xmm0,(%rdx) 2c: c3 retq Gcc version 4.3 and 4.4 (and clang) generate the following optimal(?) code: 0: 66 0f 6f 17 movdqa (%rdi),%xmm2 4: 66 0f e0 16 pavgb (%rsi),%xmm2 8: 66 0f 6f da movdqa %xmm2,%xmm3 c: 66 0f 68 d1 punpckhbw %xmm1,%xmm2 10: 66 0f 60 d9 punpcklbw %xmm1,%xmm3 14: 66 0f e4 d8 pmulhuw %xmm0,%xmm3 18: 66 0f e4 c2 pmulhuw %xmm2,%xmm0 1c: 66 0f 67 d8 packuswb %xmm0,%xmm3 20: 66 0f 7f 1a movdqa %xmm3,(%rdx) 24: c3 retq
Confirmed.
It is caused by revision 156641: http://gcc.gnu.org/ml/gcc-cvs/2010-02/msg00222.html
Re-confirmed on trunk.
It seems like incomplete reload inheritance: (insn 19 16 21 2 (set (reg:V8HI 107) (truncate:V8HI (lshiftrt:V8SI (mult:V8SI (zero_extend:V8SI (subreg:V8HI (reg:V16QI 105) 0)) (zero_extend:V8SI (subreg:V8HI (reg/v:V2DI 101 [ f ]) 0))) (const_int 16 [0x10])))) include/emmintrin.h:1362 2134 {*umulv8hi3_highpart} (expr_list:REG_DEAD (reg:V16QI 105) (nil))) Creating newreg=111 from oldreg=107, assigning class SSE_REGS to r111 19: r111:V8HI=trunc(zero_extend(r111:V8HI)*zero_extend(r101:V2DI#0) 0>>0x10) REG_DEAD r105:V16QI Inserting insn reload before: 31: r111:V8HI=r105:V16QI#0 Inserting insn reload after: 32: r107:V8HI=r111:V8HI The new register r111 does wind up inheriting from r107, but not transitively to r105. Thus we wind up leaving the copy insn 31.
The 4.7 branch is being closed, moving target milestone to 4.8.4.
GCC 4.8.4 has been released.
The gcc-4_8-branch is being closed, re-targeting regressions to 4.9.3.
GCC 4.9.3 has been released.
It looks like the situation is as follows (X is the LRA-created reload reg) X = a op on X b = X where a and b are different registers already allocated by IRA, hence we can avoid one copy at most. I'm not very familiar with LRA yet, but I see no code to rethink such register allocation choices. -frename-registers gets rid of one unnecessary copy, it was enhanced to detect such situations for gcc-6. Maybe we should finally enable that for -O2 and higher?
Look in lra-coalesce, if we have code to eliminate those copies, that's where I'd expect to find it.
Author: bernds Date: Tue Apr 26 12:43:42 2016 New Revision: 235442 URL: https://gcc.gnu.org/viewcvs?rev=235442&root=gcc&view=rev Log: Enable -frename-registers at -O2. PR rtl-optimization/57193 * opts.c (default_options_table): Add OPT_frename_registers at -O2 and above. * doc/invoke.texi (-frename-registers, -O2): Update documentation. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/invoke.texi trunk/gcc/opts.c
Author: bernds Date: Tue May 3 22:48:03 2016 New Revision: 235848 URL: https://gcc.gnu.org/viewcvs?rev=235848&root=gcc&view=rev Log: PR rtl-optimization/57193 * opts.c (default_options_table): Revert OPT_frename_registers change. * doc/invoke.texi (-frename-registers, -O2): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/doc/invoke.texi trunk/gcc/opts.c
GCC 4.9 branch is being closed
GCC 5 branch is being closed
Reconfirmed.
Author: vmakarov Date: Fri Feb 9 18:23:58 2018 New Revision: 257537 URL: https://gcc.gnu.org/viewcvs?rev=257537&root=gcc&view=rev Log: 2018-02-09 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/57193 * ira-color.c (struct allocno_color_data): Add member conflict_allocno_hard_prefs. (update_conflict_allocno_hard_prefs): New. (bucket_allocno_compare_func): Add a preference based on conflict_allocno_hard_prefs. (push_allocno_to_stack): Update conflict_allocno_hard_prefs. (color_allocnos): Remove a dead code. Initiate conflict_allocno_hard_prefs. Call update_costs_from_prefs. 2018-02-09 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/57193 * gcc.target/i386/57193.c: New. Added: trunk/gcc/testsuite/gcc.target/i386/pr57193.c Modified: trunk/gcc/ChangeLog trunk/gcc/ira-color.c trunk/gcc/testsuite/ChangeLog
GCC 6 branch is being closed
It's back in GCC 9.1.0 and trunk: https://gcc.godbolt.org/z/KrrZW7
(In reply to Xi Ruoyao from comment #18) > It's back in GCC 9.1.0 and trunk: > > https://gcc.godbolt.org/z/KrrZW7 Wrong link. Should be: https://gcc.godbolt.org/z/bxAeCR
Hi, It looks like the GCC 9/10 re-occurence is being tracked in this bug 87716. Cheers, Romain
Fixed.