This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/74585] powerpc64: Very poor code generation for homogeneous vector aggregates passed in registers


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Bill Schmidt from comment #11)
> With the original test case, -mcpu=power8 is problematic because of the use
> of the "swapping stores," whose RHS is a vec_select rather than a register
> or subreg.  This prevents us from saving the RHS of the store for use in
> replacing subsequent loads, running afoul of this logic in
> dse.c:record_store ():
> 
>   if (GET_CODE (body) == SET
>       /* No place to keep the value after ra.  */
>       && !reload_completed
>       && (REG_P (SET_SRC (body))                   <= this part
>           || GET_CODE (SET_SRC (body)) == SUBREG
>           || CONSTANT_P (SET_SRC (body)))
>       && !MEM_VOLATILE_P (mem)
>       /* Sometimes the store and reload is used for truncation and          
> 
>          rounding.  */
>       && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))
> 
> We can circumvent this if we can use stvx to force the parameters to the
> stack, which is legal since the stack slots are properly aligned.
> 
> However, even using -mcpu=power9, we don't handle removing the stores and
> replacing the partial loads with register logic.

You mean stores like the following?

(insn 13 12 14 2 (set (mem/c:V4SI (plus:DI (reg/f:DI 150 virtual-stack-vars)
                (const_int 112 [0x70])) [1 a+48 S16 A128])
        (vec_select:V4SI (reg:V4SI 190)
            (parallel [
                    (const_int 2 [0x2])
                    (const_int 3 [0x3])
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                ]))) t.c:14 -1
     (nil))

I wonder why dse can't simply force the rhs to a register?  Of course if
power really has stores that do this vec_select but no non-store with
the operation then this might not be valid ...

Now, in the end this example just shows that lowering register passing
only at RTL expansion leads to a load of missed optimizations regarding
to parameter setup ... some scheme to apply the lowering on GIMPLE already
would be interesting to explore (but albeit quite a bit of work).  We'd
have a second set of "parameter decls" somewhere, like in struct function,
and use that when the IL is on lowered form.  Same for DECL_RESULT of course.
And then the interesting part is whether to expose the stack in some way or
restrict the lowering to decomposition/combining to registers.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]