This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/63677] Failure to constant fold with vectorization.

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 30 Oct 2014 09:33:12 +0000
Subject: [Bug tree-optimization/63677] Failure to constant fold with vectorization.
Auto-submitted: auto-generated
References: <bug-63677-4 at http dot gcc dot gnu dot org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63677

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> The problem is that the loop is first vectorized, then several passes later
> slp vectorizes the initialization, so after some cleanups we have e.g. in
> cddce2:
>   MEM[(int *)&a] = { 0, 1, 2, 3 };
>   MEM[(int *)&a + 16B] = { 4, 5, 6, 7 };
>   vect__13.6_20 = MEM[(int *)&a];
>   vect__13.6_17 = MEM[(int *)&a + 16B];
> But there is no further FRE pass that would optimize the loads into
>   vect__13.6_20 = { 0, 1, 2, 3 };
>   vect__13.6_17 = { 4, 5, 6, 7 };
> (supposedly that would need to be done before forwprop4 that could in theory
> refold all the stmts into constant).
> 
> Richard, how expensive would be to schedule another FRE pass if anything has
> been vectorized in the current function (either vect pass, or slp)?  Or are
> there other passes that handle this?  Looking at e.g.
> typedef int V __attribute__((vector_size (4 * sizeof (int))));
> struct S { int a[4]; };
> V __attribute__ ((noinline)) foo (struct S *p)
> {
>   *(V *) p = (V) { 1, 2, 3, 4 };
>   return *(V *) p;
> }
> with -O2 -fno-tree-fre, it seems DOM is able to do that, but unfortunately
> at dom2 time the values have not been sufficiently forward propagated for
> dom2 to optimize this.

For the case in question there is only FRE that can handle CSEing of
the MEM[(int *)&a] load (DOM should habdle the laod of _17 fine).
I'm not very fond of adding more passes, but in theory a FRE right
after pass_tree_loop_done could do the trick.  Though ideally you'd
want it a bit later, after vector lowering - and after tracer
(so where the current DOM sits and remove DOM).  Of course FRE is
more expensive than DOM and DOM might catch some jump threading
opportunities (though VRP does that as well).

References:
- [Bug tree-optimization/63677] New: Failure to constant fold with vectorization.
  - From: belagod at gcc dot gnu.org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]