This is the mail archive of the gcc-patches@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Combine performance regression (was: Fix PR target/18701)


Hans-Peter Nilsson wrote:

> 	PR target/18701
> 	* combine.c (combine_simplify_rtx): Do not allow paradoxical
> 	subregs of MEM.
> 
> Index: combine.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/combine.c,v
> retrieving revision 1.463
> diff -p -c -u -p -r1.463 combine.c
> --- combine.c	15 Dec 2004 01:02:55 -0000	1.463
> +++ combine.c	20 Dec 2004 05:23:55 -0000
> @@ -3983,10 +3983,14 @@ combine_simplify_rtx (rtx x, enum machin
>        }
> 
>        /* Don't change the mode of the MEM if that would change the meaning
> -	 of the address.  */
> +	 of the address.  Similarly, don't allow widening, as that may
> +	 access memory outside the defined object or using an address
> +	 that is invalid for a wider mode.  */
>        if (MEM_P (SUBREG_REG (x))
>  	  && (MEM_VOLATILE_P (SUBREG_REG (x))
> -	      || mode_dependent_address_p (XEXP (SUBREG_REG (x), 0))))
> +	      || mode_dependent_address_p (XEXP (SUBREG_REG (x), 0))
> +	      || (GET_MODE_SIZE (mode)
> +		  > GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))))))
>  	return gen_rtx_CLOBBER (mode, const0_rtx);

This causes a noticeable performance regression on s390x-ibm-linux,
in particular the bzip2 SPEC test case is about 5% slower.

The reason for this is that we do not have a reg-to-reg zero-extend
instruction, only a mem-to-reg one.  Thus, we emulate reg-to-reg
zero-extend using an AND, which is expensive as it requires an extra
instruction to load the constant.

A crucial optimization for the bzip2 case is thus to combine a
mem-to-reg load into such an AND and form a mem-to-reg zero-extend
pattern from it.  However, the AND in question necessarily uses a
paradoxical subreg, so we have this situation:

(insn 225 224 226 5 (set (reg:QI 79)
        (mem/s:QI (plus:DI (reg/f:DI 118)
                (reg:DI 77 [ D.1074 ])) [0 len S1 A8])) 53 {*movqi} (insn_list:REG_DEP_TRUE 223 (nil))
    (nil))

(insn 226 225 227 5 (parallel [
            (set (subreg:SI (reg:HI 78) 0)
                (and:SI (subreg:SI (reg:QI 79) 0)
                    (const_int 255 [0xff])))
            (clobber (reg:CC 33 %cc))
        ]) 214 {*andsi3_zarch} (insn_list:REG_DEP_TRUE 225 (nil))
    (expr_list:REG_DEAD (reg:QI 79)
        (expr_list:REG_UNUSED (reg:CC 33 %cc)
            (nil))))

Combine would now substitute the (mem ...) for (reg 79) in
insn 226, and further optimization would detect that the
(and (subreg (mem))) pattern is actually equivalent to a
zero-extend.

However, due to the way that subst works, every intermediate
step in the substitution is run through combine_simplify_rtx,
and after your change it immediately dies as soon as the
(subreg (mem)) is noticed.


Thus, I do not like this patch ;-)  In fact, I rather disagree
with the reasoning leading up to it.  In my opinion, the construct
of a paradoxcial subreg of MEM is perfectly well defined in the
general case (at least in the absence of LOAD_EXTEND_OP): its
value is the bits addressed by the inner MEM, extended by 
'don't care' bits at the most significant end, and interpreted
in the outer mode. 

The semantics of the paradoxical subreg of MEM, as such, have
nothing do to with anything like 'changing the address and/or
accessing memory in a wider mode'.  That *may* be one option
to *implement* the above semantics, or it may not be, depending
on where the MEM came from and what the instruction set allows.
If this method happens to be invalid, there are always other
options, like reloading the MEM to a register and forming a
parodoxical subreg of that register.

In short, a paradoxical subreg of MEM is not invalid RTL, it is
at worst RTL that is not directly implementable on the target
machine.  But during a pass like combine, the intermediate steps
produced during simplifcation are *often* not directly implementable
on the target machine; this is no reason to quit simplifying.

After combine has finished the whole simplifcation process it
will re-recognize the insn.  If a target doesn't want to accept
this particular construct, it should be rejected *here*, if it
is still present, not earlier.  (Alternatively, it might be
accepted, and later on cleaned up by reload -- there is code in
reload to do so, have you investigated why it didn't fix your
problem?)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  Linux on zSeries Development
  Ulrich.Weigand@de.ibm.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]