[x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

Fri Aug 20 19:55:12 GMT 2021

Hi Richard,

Benchmarking this patch using CSiBE on x86_64-pc-linux-gnu with -Os -m32 saves 2432 bytes.
Of the 893 tests, 34 have size differences, 30 are improvements, 4 are regressions (of a few bytes).

> Also I'm missing a 'else' - in the default case there's no cost/benefit of using SSE vs. GPR regs?
> For SSE it would be a constant pool load.

The code size regression  I primarily wanted to tackle was the zero vs. non-zero case when
dealing with immediate operands, which was the piece affected by my and Jakub's xor
improvements.

Alas my first attempt to specify a non-zero gain in the default (doesn't fit in SImode) case,
increased the code size slightly.  The use of the constant pool complicates things, as the number
of times the same value is used becomes an issue.  If the constant being loaded is unique, then
clearly the increase in constant pool size should (ideally) be taken into account.  But if the same
constant is used multiple times in a chain (or is already in the constant pool), the observed cost
is much cheaper.  Empirically, a value of zero isn't a poor choice, so the decision on whether to
use vector instructions is shifted to the gains from operations being performed, rather than the
loading of integer constants.  No doubt, like rtx_costs, these are free parameters that future
generations will continue to tweak and refine.

Given that this patch reduces code size with -Os, both with and without -m32, ok for mainline?

Thanks in advance,
Roger
--

-----Original Message-----
From: Richard Biener <richard.guenther@gmail.com> 
Sent: 20 August 2021 08:29
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.

On Thu, Aug 19, 2021 at 6:01 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Doh!  ENOPATCH.
>
> -----Original Message-----
> From: Roger Sayle <roger@nextmovesoftware.com>
> Sent: 19 August 2021 16:59
> To: 'GCC Patches' <gcc-patches@gcc.gnu.org>
> Subject: [x86_64 PATCH] Tweak -Os costs for scalar-to-vector pass.
>
>
> Back in June I briefly mentioned in one of my gcc-patches posts that a 
> change that should have always reduced code size, would mysteriously 
> occasionally result in slightly larger code (according to CSiBE):
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573233.html
>
> Investigating further, the cause turns out to be that x86_64's 
> scalar-to-vector (stv) pass is relying on poor estimates of the size 
> costs/benefits.  This patch tweaks the backend's compute_convert_gain 
> method to provide slightly more accurate values when compiling with -Os.
> Compilation without -Os is (should be) unaffected.  And for 
> completeness, I'll mention that the stv pass is a net win for code 
> size so it's much better to improve its heuristics than simply gate 
> the pass on !optimize_for_size.
>
> The net effect of this change is to save 1399 bytes on the CSiBE code 
> size benchmark when compiling with -Os.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?

+                   /* xor (2 bytes) vs. xorps (3 bytes).  */
+                   if (src == const0_rtx)
+                     igain -= COSTS_N_BYTES (1);
+                   /* movdi_internal vs. movv2di_internal.  */
+                   /* => mov (5 bytes) vs. movaps (7 bytes).  */
+                   else if (x86_64_immediate_operand (src, SImode))
+                     igain -= COSTS_N_BYTES (2);

doesn't it need two GPR xor for 32bit DImode and two mov?  Thus the non-SSE cost should be times 'm'?  For const0_rtx we may eventually re-use the zero reg for the high part so that is eventually correct.

Also I'm missing a 'else' - in the default case there's no cost/benefit of using SSE vs. GPR regs?  For SSE it would be a constant pool load.

I also wonder, since I now see COSTS_N_BYTES for the first time (heh), whether with -Os we'd need to replace all COSTS_N_INSNS (1) scaling with COSTS_N_BYTES scaling?  OTOH costs_add_n_insns uses COSTS_N_INSNS for the size part as well.

That said, it looks like we're eventually mixing apples and oranges now or even previously?

Thanks,
Richard.

>
>
> 2021-08-19  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.c (compute_convert_gain): Provide
>         more accurate values for CONST_INT, when optimizing for size.
>         * config/i386/i386.c (COSTS_N_BYTES): Move definition from here...
>         * config/i386/i386.h (COSTS_N_BYTES): to here.
>
> Roger
> --
>