This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug tree-optimization/84037] [8 Regression] Speed regression of polyhedron benchmark since r256644


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #23 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #21)
> So after r257453 we improve the situation pre-IVOPTs to just
> 6 IVs (duplicated but trivially equivalent) plus one counting IV.  But then
> when SLP is enabled IVOPTs comes along and adds another 4 IVs which makes us
> spill... (for AVX256, so you need -march=core-avx2 for example).
> 
> Bin, any chance you can take a look?  In the IVO dump I see
> 
>   target_avail_regs 15
>   target_clobbered_regs 9
>   target_reg_cost 4
>   target_spill_cost 8
>   regs_used 3
> ^^^
> 
> and regs_used looks awfully low to me.  The loop has even more IVs initially
> plus variable steps for that IVs which means we need two regs per IV.
> 
> There doesn't seem to be a way to force IVOPTs to use the minimal set of IVs?
> Or just use the original set, removing the obvious redundancies?  There is
> a microarchitectural issue left with the vectorization but the spilling
> obscures the look quite a bit :/

Sure, I will have a look based on your commit.  Thanks

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]