This is the mail archive of the
gcc-patches@gcc.gnu.org
mailing list for the GCC project.
Re: [PATCH] Use xchg for -Os (PR target/92549)
- From: Uros Bizjak <ubizjak at gmail dot com>
- To: Jakub Jelinek <jakub at redhat dot com>
- Cc: "gcc-patches at gcc dot gnu dot org" <gcc-patches at gcc dot gnu dot org>
- Date: Tue, 19 Nov 2019 10:20:23 +0100
- Subject: Re: [PATCH] Use xchg for -Os (PR target/92549)
- References: <20191119090435.GE4650@tucnak>
On Tue, Nov 19, 2019 at 10:04 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
>
> While we have *swap<mode> patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
>
> The following patch does it when optimizing the insn for size only.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-11-19 Jakub Jelinek <jakub@redhat.com>
>
> PR target/92549
> * config/i386/i386.md (peephole2 for *swap<mode>): New peephole2.
>
> * gcc.target/i386/pr92549.c: New test.
OK with some test adjustments.
Thanks,
Uros.
> --- gcc/config/i386/i386.md.jj 2019-10-28 22:16:14.583008121 +0100
> +++ gcc/config/i386/i386.md 2019-11-18 17:06:36.050742545 +0100
> @@ -2787,6 +2787,17 @@
> (set_attr "amdfam10_decode" "double")
> (set_attr "bdver1_decode" "double")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "register_operand")
> + (match_operand:SWI 1 "register_operand"))
> + (set (match_dup 1)
> + (match_operand:SWI 2 "register_operand"))
> + (set (match_dup 2) (match_dup 0))]
> + "peep2_reg_dead_p (3, operands[0])
> + && optimize_insn_for_size_p ()"
> + [(parallel [(set (match_dup 1) (match_dup 2))
> + (set (match_dup 2) (match_dup 1))])])
> +
> (define_expand "movstrict<mode>"
> [(set (strict_low_part (match_operand:SWI12 0 "register_operand"))
> (match_operand:SWI12 1 "general_operand"))]
> --- gcc/testsuite/gcc.target/i386/pr92549.c.jj 2019-11-18 17:48:35.533177377 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92549.c 2019-11-18 17:49:31.888336444 +0100
> @@ -0,0 +1,28 @@
> +/* PR target/92549 */
> +/* { dg-do compile } */
> +/* { dg-options "-Os -masm=att" } */
> +/* { dg-final { scan-assembler "xchgl" } } */
> +
> +#ifdef __i386__
> +#define R , regparm (2)
> +#else
> +#define R
> +#endif
Please use
*/ { dg-additional-options "-mregparm=2" { target ia32 } } */
instead.
> +
> +__attribute__((noipa R)) int
> +bar (int a, int b)
> +{
> + return b - a + 5;
> +}
> +
> +__attribute__((noipa R)) int
> +foo (int a, int b)
> +{
> + return 1 + bar (b, a);
> +}
> +
> +int
> +main ()
> +{
> + return foo (39, 3);
> +}
No need for main in compile-only tests.
> Jakub
>