This is the mail archive of the gcc@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RTL CSE picking simpler but more expensive expressions (re: PR 65932)


Hi all,

I'm looking at mitigating some of the codegen regressions on arm that come from a proposed fix
to PR 65932 and I believe RTL CSE is getting in the way.

The problematic sequences involve sign-extending loads and sign-extending multiplies.
From the gcc.target/arm/wmul-1.c case we have:
(set (reg:SI 136)
     (sign_extend:SI (mem:HI (reg:SI 135))))
(set (reg:SI 138)
     (sign_extend:SI (mem:HI (reg:SI 144))))

(set (reg:SI 141)
     (plus:SI
       (mult:SI (sign_extend:SI (subreg:HI (reg:SI 136) 0))
                (sign_extend:SI (subreg:HI (reg:SI 138) 0)))
       (reg:SI 141)))

And cse1 transforms this into:
(set (reg:SI 136)
     (sign_extend:SI (mem:HI (reg:SI 135))))
(set (reg:SI 138)
     (sign_extend:SI (mem:HI (reg:SI 144))))
(set (reg:SI 141)
     (plus:SI
       (mult:SI (reg:SI 136)
                (reg:SI 138))
       (reg:SI 141)))

Now, for some arm targets the second sequence is more expensive because a sign-extending multiply-add (smlabb)
is cheaper than a full 32-bit multiply-add (mla). This is reflected in rtx costs.
That is, the second form is simpler from an rtx structure point of view but is more expensive.
I see that we have some costing logic in the cse_insn function in cse.c that is supposed to guard against this,
but it doesn't seem to be doing the right thing in this case.

I've been able to get it to do the right thing by changing the line where it initially folds the source
of the SET. That is line 4639 in cse.c:
      /* Simplify and foldable subexpressions in SRC.  Then get the fully-
     simplified result, which may not necessarily be valid.  */
      src_folded = fold_rtx (src, insn);

In this instance SRC is:
     (plus:SI
       (mult:SI (sign_extend:SI (subreg:HI (reg:SI 136) 0))
                (sign_extend:SI (subreg:HI (reg:SI 138) 0)))
       (reg:SI 141))

and the resulting src_folded is:
     (plus:SI
       (mult:SI (reg:SI 136)
                (reg:SI 138))
       (reg:SI 141))

However, fold_rtx also modifies src itself, so after the call to fold_rtx both src and src_folded contain
the plus+mult form without the extends.
So further down in cse_insn where it does the cost analysis of src, src_folded and other expressions
it never considers the original form of src.
Changing that call to fold_rtx to not modify its argument like so:
      src_folded = fold_rtx (src, 0); // An argument of 0 means "make a copy of src before modifying"

fixes the testcase and allows CSE to properly select the cheaper multiply-extend-add form and doesn't
seem to regress codegen in any way on SPEC2006 for arm.
Archeology says this line has been that way since forever, so does anyone know of the rationale for passing
insn to fold_rtx in that callsite?

Thanks,
Kyrill


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]