Unreviewed patch: [PR 16815] RFA: Fix a BLOCK_REG_PADDING check

Thu Sep 30 12:26:00 GMT 2004

Pinging the patch below, originally posted here:
http://gcc.gnu.org/ml/gcc-patches/2004-09/msg01735.html

Richard Sandiford <rsandifo@redhat.com> writes:
> This patch fixes PR target/16815, which complains about very inefficient
> handling of floating-point arguments in big-endian n32 and n64 MIPS code.
> It also fixes gcc.dg/mips-movcc-3.c for mips-sgi-irix6.5.
>
> Although this patch does fix a testsuite failure, there's no wrong code
> as such, so I suppose it might not qualify as a true bugfix.  However,
> the current output is a significant code quality regression from gcc 3.3
> and earlier, so I hope the patch is still acceptable for stage 3.
>
> The problem is that, in big-endian n32 and n64 code, float arguments
> are always being forced onto the stack.  For example:
>
>     float
>     sub (float f, float g)
>     {
>       return f + g;
>     }
>
> gives the horrendous:
>
>         addiu   $sp,$sp,-32
>         swc1    $f12,0($sp)
>         swc1    $f13,16($sp)
>         lwc1    $f1,0($sp)
>         lwc1    $f0,16($sp)
>         addiu   $sp,$sp,32
>         j       $31
>         add.s   $f0,$f1,$f0
>
> when compiled with "-mabi=n32 -EB -O2".  The problem lies in the
> BLOCK_REG_PADDING check in function.c:assign_parm_setup_block_p():
>
> -------------------------------------------------------------------------
> /* A subroutine of assign_parms.  Return true if the current parameter
>    should be stored as a BLKmode in the current frame.  */
>
> static bool
> assign_parm_setup_block_p (struct assign_parm_data_one *data)
> {
>   if (data->nominal_mode == BLKmode)
>     return true;
>   if (GET_CODE (data->entry_parm) == PARALLEL)
>     return true;
>
> #ifdef BLOCK_REG_PADDING
>   if (data->locate.where_pad == (BYTES_BIG_ENDIAN ? upward : downward)
>       && GET_MODE_SIZE (data->promoted_mode) < UNITS_PER_WORD)
>     return true;
> #endif
>
>   return false;
> }
> -------------------------------------------------------------------------
>
> This check is looking for register arguments that are padded at the
> least significant end.  At the moment, such arguments do need to be
> handled by assign_parm_setup_block, but the check is too general in
> two respects:
>
>   (1) It doesn't check whether the argument is actually passed in a
>       register.  An argument that is:
>
>         (a) passed on the stack;
>         (b) less than a word in size; and
>         (c) padded at the least significant end
>
>       will be caught by the check and thus will never be assigned a
>       pseudo register.  This seems unnecessary, since the padding of any
>       such argument is controlled entirely by FUNCTION_ARG_PADDING and
>       will be handled in the same way as it would have been on a
>       non-B_R_P target.
>
>   (2) It doesn't depend on the value of BLOCK_REG_PADDING.  (Note that
>       where_pad is the padding provided by FUNCTION_ARG_PADDING, not
>       BLOCK_REG_PADDING.)
>
>       This is the crux of the problem.  With big-endian n32 and n64,
>       float arguments are passed naturally in registers but are
>       left-justified when passed on the stack.  Thus FUNCTION_ARG_PADDING
>       says "upward" but BLOCK_REG_PADDING says "downward".
>
>       Now this combination _is_ handled correctly by
>       assign_parm_setup_block, but it simply does a normal store of the
>       register into the stack slot.  This stack slot then becomes the
>       PARM_DECL's DECL_RTL.  Since no special handling is required,
>       it would be much better to use a pseudo register for the PARM_DECL
>       where appropriate.
>
> The patch below fixes both (1) and (2).
>
> FWIW, I wondered about simply shifting the affected registers right so
> that they are padded in the way gcc expects, but it wasn't always a win.
> It would sometimes lead to spill code like:
>
>         dsrl    reg,reg,56
>         sb      reg,stack
>
> instead of simply:
>
>         sd      reg,stack
>
> It also felt less like stage 3 work.
>
> Two other targets use BLOCK_REG_PADDING: powerpc and hppa.  Both powerpc
> definitions are:
>
>   #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
>     (!(FIRST) ? upward : FUNCTION_ARG_PADDING (MODE, TYPE))
>
> so changing the where_pad check to use BLOCK_REG_PADDING is a no-op
> in this case.  The pa.h version is more problematic:
>
>   #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
>     (TARGET_64BIT ? upward : downward)
>
> This seems to be a simplification of function_arg_padding that relies
> on the fact that BLOCK_REG_PADDING is only used for aggregates.   At the
> moment, that's true, but since non-aggregate subword values are padded
> downward if TARGET_64BIT, it seems a little dangerous to rely on that.
> (FWIW, BLOCK_REG_PADDING is defined in terms of block moves, but it
> doesn't say what kind of arguments will and will not use a block move.)
>
> So there seemed to be two choices:
>
>    (a) In assign_parm_setup_block_p, check BLOCK_REG_PADDING as well as,
>        not instead of, where_pad.  Only return true if both specify
>        padding at the least significant end.
>
>        This fixes the MIPS problem without changing the gcc.c-torture
>        or gcc.dg assembly output for either powerpc64-linux-gnu or
>        hppa64-linux-gnu.  But it doesn't feel right conceptually.
>        It also fails to to handle the pathological (hopefully non-existent)
>        case of an argument that is padded at the least significant end in
>        registers and at the most significant end on the stack.
>
>    (b) Make hppa's definition use function_arg_padding, just like the
>        powerpc and MIPS versions do.
>
> I went for (b) here.  I compared the gcc.c-torture and gcc.dg output
> of powerpc64-linux-gnu and hppa64-linux-gnu before and after the patch
> (including subdirectories like gcc.dg/compat).  There were no changes.
> The patch has also been bootstrapped & regression tested on mips-sgi-irix6.5,
> where it fixes the mips-movcc-3.c test and introduced no new failures.
> OK to install?
>
> Richard
>
>
> 	PR target/16815
> 	* function.c (assign_parm_setup_block_p): Tighten BLOCK_REG_PADDING
> 	check.
> 	* config/pa/pa.h (BLOCK_REG_PADDING): Define in terms of
> 	function_arg_padding.
>
> Index: function.c
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/function.c,v
> retrieving revision 1.575
> diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.575 function.c
> *** function.c	10 Sep 2004 00:50:24 -0000	1.575
> --- function.c	17 Sep 2004 06:04:03 -0000
> *************** assign_parm_setup_block_p (struct assign
> *** 2513,2520 ****
>       return true;
>   
>   #ifdef BLOCK_REG_PADDING
> !   if (data->locate.where_pad == (BYTES_BIG_ENDIAN ? upward : downward)
> !       && GET_MODE_SIZE (data->promoted_mode) < UNITS_PER_WORD)
>       return true;
>   #endif
>   
> --- 2513,2524 ----
>       return true;
>   
>   #ifdef BLOCK_REG_PADDING
> !   /* Only assign_parm_setup_block knows how to deal with register arguments
> !      that are padded at the least significant end.  */
> !   if (REG_P (data->entry_parm)
> !       && GET_MODE_SIZE (data->promoted_mode) < UNITS_PER_WORD
> !       && (BLOCK_REG_PADDING (data->passed_mode, data->passed_type, 1)
> ! 	  == (BYTES_BIG_ENDIAN ? upward : downward)))
>       return true;
>   #endif
>   
> Index: config/pa/pa.h
> ===================================================================
> RCS file: /cvs/gcc/gcc/gcc/config/pa/pa.h,v
> retrieving revision 1.234
> diff -c -p -F^\([(a-zA-Z0-9_]\|#define\) -r1.234 pa.h
> *** config/pa/pa.h	8 Sep 2004 18:45:16 -0000	1.234
> --- config/pa/pa.h	17 Sep 2004 06:04:05 -0000
> *************** #define FUNCTION_ARG_PADDING(MODE, TYPE)
> *** 918,924 ****
>      We use a DImode register in the parallel for 5 to 7 byte structures
>      so that there is only one element.  This allows the object to be
>      correctly padded.  */
> ! #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) (TARGET_64BIT ? upward : downward)
>   
>   /* Do not expect to understand this without reading it several times.  I'm
>      tempted to try and simply it, but I worry about breaking something.  */
> --- 918,925 ----
>      We use a DImode register in the parallel for 5 to 7 byte structures
>      so that there is only one element.  This allows the object to be
>      correctly padded.  */
> ! #define BLOCK_REG_PADDING(MODE, TYPE, FIRST) \
> !   function_arg_padding ((MODE), (TYPE))
>   
>   /* Do not expect to understand this without reading it several times.  I'm
>      tempted to try and simply it, but I worry about breaking something.  */