[RFC] enable flags-unchanging asms, add_overflow/expand/combine woes

Alexandre Oliva oliva@adacore.com
Thu Sep 3 19:31:35 GMT 2020


On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:

> On Thu, Sep 03, 2020 at 02:07:24PM -0300, Alexandre Oliva wrote:
>> On Sep  3, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> > On Thu, Sep 03, 2020 at 07:03:53AM -0300, Alexandre Oliva wrote:
>> >> On Sep  2, 2020, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> >> >> we might succeed, but only if we had a pattern
>> >> >> that matched add<mode>3_cc_overflow_1's parallel with the flag-setter as
>> >> >> the second element of the parallel, because that's where combine adds it
>> >> >> to the new i3 pattern, after splitting it out of i2.
>> >> 
>> >> > That sounds like the backend pattern has it wrong then?  There is a
>> >> > canonical order for this?
>> >> 
>> >> Much as I can tell, there isn't, it's just an arbitrary choice of
>> >> backends, some do it one way or the other, and that causes combine to be
>> >> able to perform some combinations but not others.
>> 
>> > For instructions that inherently set a condition code register, the
>> > @code{compare} operator is always written as the first RTL expression of
>> > the @code{parallel} instruction pattern.
>> 
>> Interesting.  I'm pretty sure I read email recently that suggested it
>> was really up to the port, but I've caught up with GCC emails from years
>> ago, so that might have been it.  Or I just misremember.  Whatever.

> I think you remember right.  But combine depends on the documented
> order

Except when it doesn't ;-)

Under:

  /* If the actions of the earlier insns must be kept
     in addition to substituting them into the latest one,
     we must make a new PARALLEL for the latest insn
     to hold additional the SETs.  */

it turns the RMW add pattern into a PARALLEL and adds to it i0's
flags-setter, that had been split out of i2.  The PARALLEL 
has the flag-setter as the second in the parallel, as you can see below,
and nothing (other than my patch) attempts to change that order.

> What does that RTL look like exactly?

Trying 5, 7 -> 15:
    5: r91:SI=zero_extend([`i'])
    7: {flags:CCC=cmp(r91:SI#0+0x1,r91:SI#0);r92:QI=r91:SI#0+0x1;}
      REG_DEAD r91:SI
   15: [`i']=r92:QI
      REG_DEAD r92:QI
Successfully matched this instruction:
(parallel [
        (set (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b4
0 i>) [0 i+0 S1 A8])
            (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ff
ff7ff7b40 i>) [0 i+0 S1 A8])
                (const_int 1 [0x1])))
        (set (reg:CCC 17 flags)
            (compare:CCC (plus:QI (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])
                    (const_int 1 [0x1]))
                (mem/c:QI (symbol_ref:DI ("i") [flags 0x2]  <var_decl 0x7ffff7ff7b40 i>) [0 i+0 S1 A8])))
    ])

The reason it successfully matched in this combine dump is that the
above had the permuting recog for combine that I posted earlier enabled.
Without that patch, it wouldn't be recognized, and ended up rejected.

> Do you have a simple testcase?  Or a -fdump-rtl-combine-all dump.

Not so simple, I'm afraid.  I mean, the testcase is simple, but
reproducing this behavior isn't.  The small testcase I'm using only gets
to that point if:

- we allow flags to pass through extended asms, as in the other patch

- I arrange for combine to give the above set of insns a second shot,
  after the removal of the intervening flag-store insn, that prevents
  the combination in the first try

typedef unsigned char T;
T i;
int inc ()
{
  T c = __builtin_add_overflow (i, 1, &i);
  asm ("" : "+m" (i) : : "none");
  i += c;
}


-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


More information about the Gcc-patches mailing list