[RFC] enable flags-unchanging asms, add_overflow/expand/combine woes
Alexandre Oliva
oliva@adacore.com
Thu Sep 3 19:31:35 GMT 2020
On Sep 3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote:
>> On Sep 3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
>> >> On Sep 2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> >> >> we might succeed, but only if we had a pattern
>> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
>> >> >> the second element of the parallel, because that's where combine adds it
>> >> >> to the new i3 pattern, after splitting it out of i2.
>> >>
>> >> > That sounds like the backend pattern has it wrong then? There is a
>> >> > canonical order for this?
>> >>
>> >> Much as I can tell, there isn't, it's just an arbitrary choice of
>> >> backends, some do it one way or the other, and that causes combine to be
>> >> able to perform some combinations but not others.
>>
>> > For instructions that inherently set a condition code register, the
>> > @code{compare} operator is always written as the first RTL expression of
>> > the @code{parallel} instruction pattern.
>>
>> Interesting. I'm pretty sure I read email recently that suggested it
>> was really up to the port, but I've caught up with GCC emails from years
>> ago, so that might have been it. Or I just misremember. Whatever.
> I think you remember right. But combine depends on the documented
> order
Except when it doesn't ;-)
Under:
/* If the actions of the earlier insns must be kept
in addition to substituting them into the latest one,
we must make a new PARALLEL for the latest insn
to hold additional the SETs. */
it turns the RMW add pattern into a PARALLEL and adds to it i0's
flags-setter, that had been split out of i2. The PARALLEL
has the flag-setter as the second in the parallel, as you can see below,
and nothing (other than my patch) attempts to change that order.
> What does that RTL look like exactly?
Trying 5, 7 -> 15:
5: r91:SI=zero_extend([`i'])
7: {flags:CCC=cmp(r91:SI#0+0x1,r91:SI#0);r92:QI=r91:SI#0+0x1;}
REG_DEAD r91:SI
15: [`i']=r92:QI
REG_DEAD r92:QI
Successfully matched this instruction:
(parallel [
(set (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b4
0 i>) [0 i+0 S1 A8])
(plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ff
ff7ff7b40 i>) [0 i+0 S1 A8])
(const_int 1 [0x1])))
(set (reg:CCC 17 flags)
(compare:CCC (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])
(const_int 1 [0x1]))
(mem/c:QI (symbol_ref:DI ("i") [flags 0x2] <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])))
])
The reason it successfully matched in this combine dump is that the
above had the permuting recog for combine that I posted earlier enabled.
Without that patch, it wouldn't be recognized, and ended up rejected.
> Do you have a simple testcase? Or a -fdump-rtl-combine-all dump.
Not so simple, I'm afraid. I mean, the testcase is simple, but
reproducing this behavior isn't. The small testcase I'm using only gets
to that point if:
- we allow flags to pass through extended asms, as in the other patch
- I arrange for combine to give the above set of insns a second shot,
after the removal of the intervening flag-store insn, that prevents
the combination in the first try
typedef unsigned char T;
T i;
int inc ()
{
T c = __builtin_add_overflow (i, 1, &i);
asm ("" : "+m" (i) : : "none");
i += c;
}
--
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer
More information about the Gcc-patches
mailing list