$ cat bad.c unsigned ud_x_641_mul(unsigned x) { /* optimized version of x / 641 */ return ((unsigned long long)x * 0x663d81) >> 32; } With gcc from current svn: $ gcc -m32 -fomit-frame-pointer -O2 bad.c -S && cat bad.s ... ud_x_641_mul: .cfi_startproc movl $6700417, %ecx movl %ecx, %eax mull 4(%esp) movl %edx, %ecx movl %ecx, %eax ret Same result with -Os. Note two pointless mov insns. gcc 5.3.1 is "better", it adds only one unnecessary insn: ud_x_641_mul: .cfi_startproc movl $6700417, %ecx movl %ecx, %eax mull 4(%esp) movl %edx, %eax ret gcc 4.4.x and 4.7.2 were generating this code, which looks optimal: ud_x_641_mul: .cfi_startproc movl $6700417, %eax mull 4(%esp) movl %edx, %eax ret I did not test other versions of gcc yet.
Cc Uros
GCC 6.1 has been released.
CCing Vlad and Richard S. Started with r211072 https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02482.html
GCC 6.2 is being released, adjusting target milestone.
GCC 6.3 is being released, adjusting target milestone.
Vlad, any thoughts on this? I'll try to deal with this in the widening_mul pass too, that pass converts: _1 = (long long unsigned int) x_4(D); - _2 = _1 * 6700417; + _2 = x_4(D) w* 6700417; _3 = _2 >> 32; _5 = (unsigned int) _3; but as the only user cares about the high part of the widened result, we could convert it into: _1 = (long long unsigned int) x_4(D); _2 = _1 * 6700417; _3 = _2 >> 32; - _5 = (unsigned int) _3; + _5 = x_4(D) h* 6700417; There is nothing that would attempt match.pd foldings after this.
Created attachment 41094 [details] gcc7-pr70703-widen.patch The widening_mult change. We get tiny bit better code with it with the #c0 testcase: - movl $6700417, %ecx - movl %ecx, %eax + movl $6700417, %edx + movl %edx, %eax mull 4(%esp) - movl %edx, %ecx - movl %ecx, %eax + movl %edx, %eax but still not ideal. On the other side, we regress on -m64: unsigned long foo (unsigned long x) { return ((__uint128_t) x * 0x663d811234567ULL) >> 64; } - movabsq $1798629511873895, %rax - mulq %rdi + movq %rdi, %rax + movabsq $1798629511873895, %rdx + mulq %rdx Another option is to deal with this at combine time, I see on the unpatched compiler: Failed to match this instruction: (set (reg:SI 95) (subreg:SI (mult:DI (zero_extend:DI (mem/c:SI (reg/f:SI 16 argp) [1 x+0 S4 A32])) (const_int 6700417 [0x663d81])) 4)) Maybe we could add some define_insn_and_split that would deal with this and make sure the constant is forced into a register (if the constant has depending on <s> all upper bits zero or set) and transform it into the highpart insns? Though, I'm worried about the regression above we got with the TImode highpart.
(In reply to Jakub Jelinek from comment #9) > Vlad, any thoughts on this? > I've been working on this and a wrong allocation is a culprit. According to IRA dump, the allocation should be what we expect. But it is not. I guess there are a few factors to this. First of all IRA has cost 65535 for moving AREG into/from GENERAL_REGS. And second factor is an overflow on 32-bit machine when such big cost is used in IRA. I'll continue my work on it next week. I hope the patch will be ready in the middle of the next week.
Author: vmakarov Date: Wed Apr 5 15:07:51 2017 New Revision: 246707 URL: https://gcc.gnu.org/viewcvs?rev=246707&root=gcc&view=rev Log: 2017-04-05 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70703 * ira-color.c (update_costs_from_allocno): Use the smallest mode. (update_conflict_hard_regno_costs): Use long instead of unsigned arithmetic for cost calculation. 2017-04-05 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70703 * gcc.target/i386/pr70703.c: New. Added: trunk/gcc/testsuite/gcc.target/i386/pr70703.c Modified: trunk/gcc/ChangeLog trunk/gcc/ira-color.c trunk/gcc/testsuite/ChangeLog
Author: vmakarov Date: Wed Apr 5 16:14:28 2017 New Revision: 246711 URL: https://gcc.gnu.org/viewcvs?rev=246711&root=gcc&view=rev Log: 2017-04-05 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70703 * ira-color.c (update_conflict_hard_regno_costs): Use HOST_WIDE_INT instead of long. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-color.c
Fixed on the trunk, not sure about backports if it is desirable or not. Thanks Vlad!
Author: vmakarov Date: Fri Apr 7 16:06:28 2017 New Revision: 246765 URL: https://gcc.gnu.org/viewcvs?rev=246765&root=gcc&view=rev Log: 2017-04-07 Vladimir Makarov <vmakarov@redhat.com> PR rtl-optimization/70703 * ira-color.c (update_conflict_hard_regno_costs): Use int64_t instead of HOST_WIDE_INT. Modified: trunk/gcc/ChangeLog trunk/gcc/ira-color.c
GCC 6.4 is being released, adjusting target milestone.
(In reply to Jakub Jelinek from comment #14) > Fixed on the trunk, not sure about backports if it is desirable or not. > Thanks Vlad! Well now's the last chance for backports; if the "not sure" is a "let's not bother" then time to close this I guess, but if the "not sure" is "maybe there's still a chance" then let's leave it open for now.
GCC 6 branch is being closed, fixed in 7.x.