This is the mail archive of the
mailing list for the GCC project.
Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
- From: Martin Jambor <mjambor at suse dot cz>
- To: Richard Biener <rguenther at suse dot de>, gcc-patches at gcc dot gnu dot org
- Cc: Jan Hubicka <hubicka at ucw dot cz>, ubizjak at gmail dot com, kirill dot yukhin at gmail dot com, vmakarov at redhat dot com, hjl dot tools at gmail dot com, Martin Jambor <mjambor at suse dot de>
- Date: Thu, 25 Jul 2019 11:13:23 +0200
- Subject: Re: [PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs
- References: <alpine.LSU.firstname.lastname@example.org>
On Tue, Jul 23 2019, Richard Biener wrote:
> The following fixes the runtime regression of 456.hmmer caused
> by matching ICC in code generation and using cmov more aggressively
> (through GIMPLE level MAX_EXPR usage). Appearantly (discovered
> by manual assembler editing) using the SSE unit for performing
> SImode loads, adds and then two singed max operations plus stores
> is quite a bit faster than cmovs - even faster than the original
> single cmov plus branchy second max. Even more so for AMD CPUs
> than Intel CPUs.
> Instead of hacking up some pattern recognition pass to transform
> integer mode memory-to-memory computation chains involving
> conditional moves to "vector" code (similar to what STV does
> for TImode ops on x86_64) the following simply allows SImode
> into SSE registers (support for this is already there in some
> important places like move patterns!). For the particular
> case of 456.hmmer the required support is loads/stores
> (already implemented), SImode adds and SImode smax.
> So the patch adds a smax pattern for SImode (we don't have any
> for scalar modes but currently expand via a conditional move sequence)
> emitting as SSE vector max or cmp/cmov depending on the alternative.
> And it amends the *add<mode>_1 pattern with SSE alternatives
> (which have to come before the memory alternative as IRA otherwise
> doesn't consider reloading a memory operand to a register).
> With this in place the runtime of 456.hmmer improves by 10%
> on Haswell which is back to before regression speed but not
> to same levels as seen with manually editing just the single
> important loop.
> I'm currently benchmarking all SPEC CPU 2006 on Haswell. More
> interesting is probably Zen where moves crossing the
> integer - vector domain are excessively expensive (they get
> done via the stack).
There was a znver2 CPU machine not doing anything useful overnight here
so I benchmarked your patch using SPEC 2006 and SPEC CPUrate 2017 on top
of trunk r273663 (I forgot to pull, so before Honza's znver2 tuning
patches, I am afraid). All benchmarks were run only once with options
-Ofast -march=native -mtune=native.
By far the biggest change was indeed 456.hmmer which improved by
incredible 35%. There was no other change bigger than +- 1.5% in SPEC
2006 so the SPECint score grew by almost 3.4%.
I understand this patch fixes a regression in that benchmark but even
so, 456.hmmer built with the Monday trunk was 23% slower than with gcc 9
and with the patch is 20% faster than gcc 9.
In SPEC 2017, there were two changes worth mentioning although they
probably need to be confirmed and re-measured on top of the new tuning
changes. 525.x264_r regressed by 3.37% and 511.povray_r improved by
> Clearly this approach will run into register allocation issues
> but it looks cleaner than writing yet another STV-like pass
> (STV itself is quite awkwardly structured so I refrain from
> touching it...).
> Anyway - comments? It seems to me that MMX-in-SSE does
> something very similar.
> Bootstrapped on x86_64-unknown-linux-gnu, previous testing
> revealed some issue. Forgot that *add<mode>_1 also handles
> DImode..., fixed below, re-testing in progress.
> 2019-07-23 Richard Biener <email@example.com>
> PR target/91154
> * config/i386/i386.md (smaxsi3): New.
> (*add<mode>_1): Add SSE and AVX variants.
> * config/i386/i386.c (ix86_lea_for_add_ok): Do not allow
> SSE registers.