Bug 70703 - [6 regression] Regression in register usage on x86
Summary: [6 regression] Regression in register usage on x86
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: rtl-optimization (show other bugs)
Version: 6.0
: P2 normal
Target Milestone: 7.0
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization, ra
Depends on:
Blocks: 16996
  Show dependency treegraph
 
Reported: 2016-04-17 15:23 UTC by Denis Vlasenko
Modified: 2018-10-26 10:48 UTC (History)
9 users (show)

See Also:
Host:
Target: i?86-*-linux-gnu
Build:
Known to work: 4.7.2, 4.9.3
Known to fail: 5.1.0
Last reconfirmed: 2016-06-14 00:00:00


Attachments
gcc7-pr70703-widen.patch (1.16 KB, patch)
2017-03-31 10:16 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Vlasenko 2016-04-17 15:23:24 UTC
$ cat bad.c
unsigned ud_x_641_mul(unsigned x) {
    /* optimized version of x / 641 */
    return ((unsigned long long)x * 0x663d81) >> 32;
}

With gcc from current svn:
$ gcc -m32 -fomit-frame-pointer -O2 bad.c -S && cat bad.s
...
ud_x_641_mul:
	.cfi_startproc
	movl	$6700417, %ecx
	movl	%ecx, %eax
	mull	4(%esp)
	movl	%edx, %ecx
	movl	%ecx, %eax
	ret

Same result with -Os. Note two pointless mov insns.

gcc 5.3.1 is "better", it adds only one unnecessary insn:

ud_x_641_mul:
	.cfi_startproc
	movl	$6700417, %ecx
	movl	%ecx, %eax
	mull	4(%esp)
	movl	%edx, %eax
	ret

gcc 4.4.x and 4.7.2 were generating this code, which looks optimal:
ud_x_641_mul:
	.cfi_startproc
	movl	$6700417, %eax
	mull	4(%esp)
	movl	%edx, %eax
	ret

I did not test other versions of gcc yet.
Comment 1 Bernhard Reutner-Fischer 2016-04-20 11:55:28 UTC
Cc Uros
Comment 2 Jakub Jelinek 2016-04-27 10:59:13 UTC
GCC 6.1 has been released.
Comment 3 Bernhard Reutner-Fischer 2016-06-14 19:52:53 UTC
CCing Vlad and Richard S.
Started with r211072

https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02482.html
Comment 4 Richard Biener 2016-08-22 08:52:56 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 5 Richard Biener 2016-08-22 08:53:49 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 6 Richard Biener 2016-08-22 08:59:44 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 7 Richard Biener 2016-08-22 09:01:16 UTC
GCC 6.2 is being released, adjusting target milestone.
Comment 8 Jakub Jelinek 2016-12-21 10:59:59 UTC
GCC 6.3 is being released, adjusting target milestone.
Comment 9 Jakub Jelinek 2017-03-31 08:49:54 UTC
Vlad, any thoughts on this?

I'll try to deal with this in the widening_mul pass too, that pass converts:
   _1 = (long long unsigned int) x_4(D);
-  _2 = _1 * 6700417;
+  _2 = x_4(D) w* 6700417;
   _3 = _2 >> 32;
   _5 = (unsigned int) _3;
but as the only user cares about the high part of the widened result, we could convert it into:
   _1 = (long long unsigned int) x_4(D);
   _2 = _1 * 6700417;
   _3 = _2 >> 32;
-  _5 = (unsigned int) _3;
+  _5 = x_4(D) h* 6700417;
There is nothing that would attempt match.pd foldings after this.
Comment 10 Jakub Jelinek 2017-03-31 10:16:53 UTC
Created attachment 41094 [details]
gcc7-pr70703-widen.patch

The widening_mult change.  We get tiny bit better code with it with the #c0 testcase:
-       movl    $6700417, %ecx
-       movl    %ecx, %eax
+       movl    $6700417, %edx
+       movl    %edx, %eax
        mull    4(%esp)
-       movl    %edx, %ecx
-       movl    %ecx, %eax
+       movl    %edx, %eax
but still not ideal.  On the other side, we regress on -m64:
unsigned long
foo (unsigned long x)
{
  return ((__uint128_t) x * 0x663d811234567ULL) >> 64;
}

-       movabsq $1798629511873895, %rax
-       mulq    %rdi
+       movq    %rdi, %rax
+       movabsq $1798629511873895, %rdx
+       mulq    %rdx

Another option is to deal with this at combine time, I see on the unpatched compiler:
Failed to match this instruction:
(set (reg:SI 95)
    (subreg:SI (mult:DI (zero_extend:DI (mem/c:SI (reg/f:SI 16 argp) [1 x+0 S4 A32]))
            (const_int 6700417 [0x663d81])) 4))

Maybe we could add some define_insn_and_split that would deal with this and make sure the constant is forced into a register (if the constant has depending on <s> all upper bits zero or set) and transform it into the highpart insns?
Though, I'm worried about the regression above we got with the TImode highpart.
Comment 11 Vladimir Makarov 2017-04-01 16:27:31 UTC
(In reply to Jakub Jelinek from comment #9)
> Vlad, any thoughts on this?
> 
I've been working on this and a wrong allocation is a culprit.  According to IRA dump,  the allocation should be what we expect.  But it is not.  I guess there are a few factors to this.  First of all IRA has cost 65535 for moving AREG into/from GENERAL_REGS.  And second factor is an overflow on 32-bit machine when such big cost is used in IRA.

I'll continue my work on it next week.  I hope the patch will be ready in the middle of the next week.
Comment 12 Vladimir Makarov 2017-04-05 15:08:23 UTC
Author: vmakarov
Date: Wed Apr  5 15:07:51 2017
New Revision: 246707

URL: https://gcc.gnu.org/viewcvs?rev=246707&root=gcc&view=rev
Log:
2017-04-05  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70703
	* ira-color.c (update_costs_from_allocno): Use the smallest mode.
	(update_conflict_hard_regno_costs): Use long instead of unsigned
	arithmetic for cost calculation.

2017-04-05  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70703
	* gcc.target/i386/pr70703.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr70703.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-color.c
    trunk/gcc/testsuite/ChangeLog
Comment 13 Vladimir Makarov 2017-04-05 16:15:00 UTC
Author: vmakarov
Date: Wed Apr  5 16:14:28 2017
New Revision: 246711

URL: https://gcc.gnu.org/viewcvs?rev=246711&root=gcc&view=rev
Log:
2017-04-05  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70703
	* ira-color.c (update_conflict_hard_regno_costs): Use
	HOST_WIDE_INT instead of long.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-color.c
Comment 14 Jakub Jelinek 2017-04-06 05:50:52 UTC
Fixed on the trunk, not sure about backports if it is desirable or not.
Thanks Vlad!
Comment 15 Vladimir Makarov 2017-04-07 16:07:00 UTC
Author: vmakarov
Date: Fri Apr  7 16:06:28 2017
New Revision: 246765

URL: https://gcc.gnu.org/viewcvs?rev=246765&root=gcc&view=rev
Log:
2017-04-07  Vladimir Makarov  <vmakarov@redhat.com>

	PR rtl-optimization/70703
	* ira-color.c (update_conflict_hard_regno_costs): Use
	int64_t instead of HOST_WIDE_INT.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/ira-color.c
Comment 16 Richard Biener 2017-07-04 08:50:53 UTC
GCC 6.4 is being released, adjusting target milestone.
Comment 17 Eric Gallager 2018-10-12 23:20:30 UTC
(In reply to Jakub Jelinek from comment #14)
> Fixed on the trunk, not sure about backports if it is desirable or not.
> Thanks Vlad!

Well now's the last chance for backports; if the "not sure" is a "let's not bother" then time to close this I guess, but if the "not sure" is "maybe there's still a chance" then let's leave it open for now.
Comment 18 Jakub Jelinek 2018-10-26 10:48:36 UTC
GCC 6 branch is being closed, fixed in 7.x.