Bug 50969 - 17% degradation in 168.wupwise for interleave via permutation
Summary: 17% degradation in 168.wupwise for interleave via permutation
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 4.7.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-02 21:26 UTC by Pat Haugen
Modified: 2012-03-02 14:52 UTC (History)
4 users (show)

See Also:
Host: powerpc64-linux
Target: powerpc64-linux
Build: powerpc64-linux
Known to work:
Known to fail:
Last reconfirmed:


Attachments
benchmark file (389 bytes, text/plain)
2011-11-02 21:26 UTC, Pat Haugen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Pat Haugen 2011-11-02 21:26:54 UTC
Created attachment 25694 [details]
benchmark file

Revision 180450 (along with 180567 to fix the ICE) causes a large degradation in cpu2000 benchmark wupwise. Additional loops are now being vectorized but result in worse performance, not sure it that means a cost issue or what. Based on prior observations the degradation is most likely due to the permute instructions being used which are restricted to a single VSU pipe, so two of them can't be executed in parallel.

Attatched file zaxpy.f is just one of the files containing a function that degraded (zscal.f is another). The second loop is where the time is spent in the function. Following degradations (compared to revision 180449) were observed with oprofile.

-m64 -O3 -mcpu=power7
zaxpy : -24%
zscal : -79%

-m64 -O3 -mcpu=power7 -funroll-loops
zaxpy : -65%
zscal : -61%
Comment 1 Pat Haugen 2011-11-02 21:38:28 UTC
I swapped the numbers, should be:

-m64 -O3 -mcpu=power7
zaxpy : -79%
zscal : -24%

-m64 -O3 -mcpu=power7 -funroll-loops
zaxpy : -61%
zscal : -65%
Comment 2 Richard Biener 2011-11-03 08:19:01 UTC
Yes, sounds like a cost model issue.
Comment 3 Bill Schmidt 2012-02-06 21:39:38 UTC
Author: wschmidt
Date: Mon Feb  6 21:39:34 2012
New Revision: 183944

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183944
Log:
2012-02-06  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

	PR tree-optimization/50969
	* tree-vect-stmts.c (vect_model_store_cost): Correct statement cost to
	use vec_perm rather than vector_stmt.
	(vect_model_load_cost): Likewise.
	* config/i386/i386.c (ix86_builtin_vectorization_cost): Change cost of
	vec_perm to be the same as other vector statements.
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Revise
	cost of vec_perm for TARGET_VSX.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/rs6000/rs6000.c
    trunk/gcc/tree-vect-stmts.c
Comment 4 Bill Schmidt 2012-02-06 21:41:47 UTC
Fixed with simple permute cost change for now.  A better analysis of permutes will be considered in 4.8.
Comment 5 Bill Schmidt 2012-02-14 19:40:22 UTC
Author: wschmidt
Date: Tue Feb 14 19:40:13 2012
New Revision: 184225

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184225
Log:
2012-02-14  Bill Schmidt <wschmidt@linux.vnet.ibm.com>
	    Ira Rosen <irar@il.ibm.com>

	PR tree-optimization/50031
	PR tree-optimization/50969
	* targhooks.c (default_builtin_vectorization_cost): Handle
	vec_promote_demote.
	* target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
	* tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
	all types of reduction and pattern statements.
	(vect_estimate_min_profitable_iters): Likewise.
	* tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
	(vect_model_store_cost): Use vec_perm rather than vector_stmt for
	statement cost.
	(vect_model_load_cost): Likewise.
	(vect_get_load_cost): Likewise; add dump logic for explicit realigns.
	(vectorizable_type_demotion): Call vect_model_promotion_demotion_cost.
	(vectorizable_type_promotion): Likewise.
	* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
	vec_promote_demote.
	* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
	vec_perm for VSX and handle vec_promote_demote.


Modified:
    branches/ibm/gcc-4_6-branch/gcc/ChangeLog.ibm
    branches/ibm/gcc-4_6-branch/gcc/config/i386/i386.c
    branches/ibm/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
    branches/ibm/gcc-4_6-branch/gcc/config/spu/spu.c
    branches/ibm/gcc-4_6-branch/gcc/target.h
    branches/ibm/gcc-4_6-branch/gcc/targhooks.c
    branches/ibm/gcc-4_6-branch/gcc/tree-vect-loop.c
    branches/ibm/gcc-4_6-branch/gcc/tree-vect-stmts.c
Comment 6 Bill Schmidt 2012-03-02 14:52:09 UTC
Author: wschmidt
Date: Fri Mar  2 14:51:58 2012
New Revision: 184787

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=184787
Log:
2012-03-02  Bill Schmidt <wschmidt@linux.vnet.ibm.com>
	    Ira Rosen <irar@il.ibm.com>

	PR tree-optimization/50031
	PR tree-optimization/50969
	* targhooks.c (default_builtin_vectorization_cost): Handle
	vec_promote_demote.
	* target.h (enum vect_cost_for_stmt): Add vec_promote_demote.
	* tree-vect-loop.c (vect_get_single_scalar_iteraion_cost): Handle
	all types of reduction and pattern statements.
	(vect_estimate_min_profitable_iters): Likewise.
	* tree-vect-stmts.c (vect_model_promotion_demotion_cost): New function.
	(vect_model_store_cost): Use vec_perm rather than vector_stmt for
	statement cost.
	(vect_model_load_cost): Likewise.
	(vect_get_load_cost): Likewise; add dump logic for explicit realigns.
	(vectorizable_type_demotion): Call vect_model_promotion_demotion_cost.
	(vectorizable_type_promotion): Likewise.
	* config/spu/spu.c (spu_builtin_vectorization_cost): Handle
	vec_promote_demote.
	* config/i386/i386.c (ix86_builtin_vectorization_cost): Likewise.
	* config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Update
	vec_perm for VSX and handle vec_promote_demote.


Modified:
    branches/gcc-4_6-branch/gcc/ChangeLog
    branches/gcc-4_6-branch/gcc/config/i386/i386.c
    branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c
    branches/gcc-4_6-branch/gcc/config/spu/spu.c
    branches/gcc-4_6-branch/gcc/target.h
    branches/gcc-4_6-branch/gcc/targhooks.c
    branches/gcc-4_6-branch/gcc/tree-vect-loop.c
    branches/gcc-4_6-branch/gcc/tree-vect-stmts.c