Bug 54400 - recognize vector reductions
Summary: recognize vector reductions
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.8.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
: 55071 (view as bug list)
Depends on:
Blocks: vectorizer
  Show dependency treegraph
 
Reported: 2012-08-29 06:10 UTC by Marc Glisse
Modified: 2016-10-28 08:17 UTC (History)
5 users (show)

See Also:
Host:
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-09-03 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Glisse 2012-08-29 06:10:00 UTC
Hello,

for this program:

#include <x86intrin.h>
double f(__m128d v){return v[1]+v[0];}

gcc -O3 -msse4 (same with -Os) generates:

	movapd	%xmm0, %xmm2
	unpckhpd	%xmm2, %xmm2
	movapd	%xmm2, %xmm1
	addsd	%xmm0, %xmm1
	movapd	%xmm1, %xmm0

(yes, the number of mov instructions is a bit high...)

Looking at the x86 backend, it can expand reduc_splus_v2df and __builtin_ia32_haddpd, but it doesn't provide any pattern that could be recognized. hsubpd is even less present.

It seems to me that, considering only the low part of the result of haddpd, the pattern should be small enough to be matched: (plus (vec_select (match_operand 1) const_a) (vec_select (match_dup 1) const_b)) where a and b are 0 and 1 in any order.
Comment 1 Marc Glisse 2012-09-01 09:40:14 UTC
The code below seems to optimize v[0]-v[1] and v[1]+v[0]. It doesn't recognize v[0]+v[1], but that would not be too hard to add I guess. Compared to the true hadd insn, I removed the setattr "type" "sseadd" because it crashed the compiler (in cost computation maybe). Apart from the things left in here that may not make sense, I don't know if a peephole would be more relevant. Maybe the insn helps more if I want to recognize dot products (dppd) later on? At least thanks to it {v[0]-v[1],w[0]-w[1]} is now recognized as a hsub (although it doesn't work if v==w because vec_duplicate doesn't match vec_concat).

(define_insn "*sse3_h<plusminus_insn>v2df3_low_MARC"
  [(set (match_operand:DF 0 "register_operand" "=x,x")
        (plusminus:DF
          (vec_select:DF
            (match_operand:V2DF 1 "register_operand" "0,x")
            (parallel [(const_int 0)]))
          (vec_select:DF
            (match_dup 1)
            (parallel [(const_int 1)]))))]
  "TARGET_SSE3"
  "@
   h<plusminus_mnemonic>pd\t{%0, %0|%0, %0}
   vh<plusminus_mnemonic>pd\t{%1, %1, %0|%0, %1, %1}"
  [(set_attr "isa" "noavx,avx")
   (set_attr "prefix" "orig,vex")
   (set_attr "mode" "V2DF")])
Comment 2 Richard Biener 2012-09-03 10:12:36 UTC
The basic-block vectorizer does not handle reductions and it would be another
natural place to perform this optimization.
Comment 3 Marc Glisse 2012-09-03 10:21:48 UTC
(In reply to comment #2)
> The basic-block vectorizer does not handle reductions and it would be another
> natural place to perform this optimization.

I thought about turning a PLUS_EXPR of BIT_FIELD_REF into a REDUC_PLUS_EXPR (in forwprop), but that wouldn't handle the MINUS_EXPR case (can still be worth doing though, especially if the code is common to the other reductions).
Comment 4 Marc Glisse 2012-10-08 20:46:04 UTC
Author: glisse
Date: Mon Oct  8 20:45:56 2012
New Revision: 192223

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192223
Log:
2012-10-08  Marc Glisse  <marc.glisse@inria.fr>

gcc/
	PR target/54400
	* config/i386/i386.md (type attribute): Add sseadd1.
	(unit attribute): Add support for sseadd1.
	(memory attribute): Likewise.
	* config/i386/athlon.md: Likewise.
	* config/i386/core2.md: Likewise.
	* config/i386/atom.md: Likewise.
	* config/i386/ppro.md: Likewise.
	* config/i386/bdver1.md: Likewise.
	* config/i386/sse.md (sse3_h<plusminus_insn>v2df3): split into...
	(sse3_haddv2df3): ... expander.
	(*sse3_haddv2df3): ... define_insn. Accept permuted operands.
	(sse3_hsubv2df3): ... define_insn.
	(*sse3_haddv2df3_low): New define_insn.
	(*sse3_hsubv2df3_low): New define_insn.

gcc/testsuite/
	PR target/54400
	* gcc.target/i386/pr54400.c: New testcase.


Added:
    trunk/gcc/testsuite/gcc.target/i386/pr54400.c   (with props)
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/athlon.md
    trunk/gcc/config/i386/atom.md
    trunk/gcc/config/i386/bdver1.md
    trunk/gcc/config/i386/core2.md
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/ppro.md
    trunk/gcc/config/i386/sse.md
    trunk/gcc/testsuite/ChangeLog

Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
            ('svn:eol-style' added)

Propchange: trunk/gcc/testsuite/gcc.target/i386/pr54400.c
            ('svn:keywords' added)
Comment 5 Marc Glisse 2012-10-08 21:02:56 UTC
Renaming since the specific x86 case is done, and this is now a middle-end PR.
Comment 6 vincenzo Innocente 2012-10-25 08:40:27 UTC
*** Bug 55071 has been marked as a duplicate of this bug. ***