This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Tue, 15 Nov 2016 11:49:43 +0000
Subject: [Bug tree-optimization/78348] [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038
Auto-submitted: auto-generated
References: <bug-78348-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78348

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2016-11-15
     Ever confirmed|0                           |1

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
> The issue is that memcpy must be produced instead of memove which does
> not have optimized version for avx2 x86 and simply uses byte copy.

I'd expected a if (! overlap) memcpy () else byte-copy at least.

Note the loop distribution code doesn't try to be clever in choosing memcpy
over memmove (using dependence analysis).  So improving loop distribution
(adding a PKIND_MEMMOVE and conservatively using that from dependence analysis)
is a possibility as well.  But we have

(compute_affine_dependence
  stmt_a: _2 = par.0_1->x2[i_19][j_20];
  stmt_b: par.0_1->x1[i_19][j_20] = _2;
(analyze_overlapping_iterations
  (chrec_a = {0, +, 1}_2)
  (chrec_b = {0, +, 1}_2)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
(analyze_overlapping_iterations
  (chrec_a = i_19)
  (chrec_b = i_19)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
(analyze_overlapping_iterations
  (chrec_a = 33280)
  (chrec_b = 12800)
(analyze_ziv_subscript
)
  (overlap_iterations_a = no dependence)
  (overlap_iterations_b = no dependence))
) -> no dependence

so I think we could use memcpy for all no dependence cases?

References:
- [Bug tree-optimization/78348] New: [7 REGRESSION] 15% performance drop for coremark-pro/nnet-test after r242038
  - From: ysrumyan at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]