This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/74585] powerpc64: Very poor code generation for homogeneous vector aggregates passed in registers


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=74585

--- Comment #11 from Bill Schmidt <wschmidt at gcc dot gnu.org> ---
With the original test case, -mcpu=power8 is problematic because of the use of
the "swapping stores," whose RHS is a vec_select rather than a register or
subreg.  This prevents us from saving the RHS of the store for use in replacing
subsequent loads, running afoul of this logic in dse.c:record_store ():

  if (GET_CODE (body) == SET
      /* No place to keep the value after ra.  */
      && !reload_completed
      && (REG_P (SET_SRC (body))                   <= this part
          || GET_CODE (SET_SRC (body)) == SUBREG
          || CONSTANT_P (SET_SRC (body)))
      && !MEM_VOLATILE_P (mem)
      /* Sometimes the store and reload is used for truncation and              
         rounding.  */
      && !(FLOAT_MODE_P (GET_MODE (mem)) && (flag_float_store)))

We can circumvent this if we can use stvx to force the parameters to the stack,
which is legal since the stack slots are properly aligned.

However, even using -mcpu=power9, we don't handle removing the stores and
replacing the partial loads with register logic.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]