This is the mail archive of the
gcc-bugs@gcc.gnu.org
mailing list for the GCC project.
[Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038
- From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
- To: gcc-bugs at gcc dot gnu dot org
- Date: Tue, 15 Nov 2016 11:49:43 +0000
- Subject: [Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038
- Auto-submitted: auto-generated
- References: <bug-78348-4@http.gcc.gnu.org/bugzilla/>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2016-11-15
Ever confirmed|0 |1
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
> The issue is that memcpy must be produced instead of memove which does
> not have optimized version for avx2 x86 and simply uses byte copy.
I'd expected a if (! overlap) memcpy () else byte-copy at least.
Note the loop distribution code doesn't try to be clever in choosing memcpy
over memmove (using dependence analysis). So improving loop distribution
(adding a PKIND_MEMMOVE and conservatively using that from dependence analysis)
is a possibility as well. But we have
(compute_affine_dependence
stmt_a: _2 = par.0_1->x2[i_19][j_20];
stmt_b: par.0_1->x1[i_19][j_20] = _2;
(analyze_overlapping_iterations
(chrec_a = {0, +, 1}_2)
(chrec_b = {0, +, 1}_2)
(overlap_iterations_a = [0])
(overlap_iterations_b = [0]))
(analyze_overlapping_iterations
(chrec_a = i_19)
(chrec_b = i_19)
(overlap_iterations_a = [0])
(overlap_iterations_b = [0]))
(analyze_overlapping_iterations
(chrec_a = 33280)
(chrec_b = 12800)
(analyze_ziv_subscript
)
(overlap_iterations_a = no dependence)
(overlap_iterations_b = no dependence))
) -> no dependence
so I think we could use memcpy for all no dependence cases?