Bug 69299 - [6 Regression] -mavx performance degradation with r232088
Summary: [6 Regression] -mavx performance degradation with r232088
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.0
: P1 normal
Target Milestone: 6.0
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 68991
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-15 14:20 UTC by Jakub Jelinek
Modified: 2016-01-29 19:10 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-01-15 00:00:00


Attachments
A patch (726 bytes, patch)
2016-01-15 22:13 UTC, H.J. Lu
Details | Diff
An updated patch (1.52 KB, patch)
2016-01-16 15:09 UTC, H.J. Lu
Details | Diff
A followup patch to call constraint_satisfied_p to check memory operand (763 bytes, patch)
2016-01-16 23:21 UTC, H.J. Lu
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2016-01-15 14:20:59 UTC
Reduced testcase:

-Ofast -mavx -mno-avx2 -mtune=bdver2

float *a, *b;
int c, d, e, f;
void
foo (void)
{
  for (; c; c++)
    a[c] = 0;
  if (!d)
    for (; c < f; c++)
      b[c] = (double) e / b[c];
}

There are various differences of the kind:
-       vcvtps2pd       16(%rsp), %xmm3
+       vmovaps 16(%rsp), %xmm7
+       vcvtps2pd       %xmm7, %xmm3
Comment 1 H.J. Lu 2016-01-15 14:46:15 UTC
With

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index bac9d66..1a7d386 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -161,7 +161,7 @@
   "@internal GOT memory operand."
   (match_operand 0 "GOT_memory_operand"))
 
-(define_constraint "Bm"
+(define_memory_constraint "Bm"
   "@internal Vector memory operand."
   (match_operand 0 "vector_memory_operand"))
 
I got

-	vmovaps	16(%rsp), %xmm7
-	vcvtps2pd	%xmm7, %xmm3
+	vcvtps2pd	16(%rsp), %xmm3

But we can't use define_memory_constraint:

https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00125.html
Comment 2 H.J. Lu 2016-01-15 14:48:07 UTC
The related LRA code is

                   case CT_MEMORY:
                      if (MEM_P (op)
                          && satisfies_memory_constraint_p (op, cn))
                        win = true;
                      else if (spilled_pseudo_p (op))
                        win = true;

                      /* If we didn't already win, we can reload constants
                         via force_const_mem or put the pseudo value into
                         memory, or make other memory by reloading the
                         address like for 'o'.  */
                      if (CONST_POOL_OK_P (mode, op)
                          || MEM_P (op) || REG_P (op))
                        badop = false;

             /* If this operand accepts a register, and if the
                 register class has at least one allocatable register,
                 then this operand can be reloaded.  */
              if (winreg && !no_regs_p)
                badop = false;
Comment 3 Jakub Jelinek 2016-01-15 14:57:16 UTC
Maybe we really need to have two types of memory
constraints, ones which can be worst case always satisfied by reloading
their address into an address register and another ones which can be worst
case always satisfied by loading the memory into a temporary register (for
loads) or storing it from a temporary register.

Or consider the constraint as CT_MEMORY only if the operand satisfies the constraint predicate and as CT_FIXED_FORM (or whatever is the default) otherwise?  Only normal CT_MEMORY, for Bm as long as it satisfies memory_operand, the exact address form doesn't really matters, but what matters is something inherent in the memory (and ISA flags).
Comment 4 H.J. Lu 2016-01-15 22:13:47 UTC
Created attachment 37364 [details]
A patch

Does it make sense?
Comment 5 H.J. Lu 2016-01-16 15:09:46 UTC
Created attachment 37372 [details]
An updated patch

This patch also changes LRA to skip bad MEM.  Although there are no
regressions on x86-64, there could be latent issues.  Someone should
exam IRA/LRA to see if it is OK.
Comment 6 H.J. Lu 2016-01-16 23:21:20 UTC
Created attachment 37378 [details]
A followup patch to call constraint_satisfied_p to check memory operand
Comment 7 Vladimir Makarov 2016-01-27 02:27:12 UTC
(In reply to Jakub Jelinek from comment #3)
> Maybe we really need to have two types of memory
> constraints, ones which can be worst case always satisfied by reloading
> their address into an address register and another ones which can be worst
> case always satisfied by loading the memory into a temporary register (for
> loads) or storing it from a temporary register.
> 
> Or consider the constraint as CT_MEMORY only if the operand satisfies the
> constraint predicate and as CT_FIXED_FORM (or whatever is the default)
> otherwise?  Only normal CT_MEMORY, for Bm as long as it satisfies
> memory_operand, the exact address form doesn't really matters, but what
> matters is something inherent in the memory (and ISA flags).

I think it is a good idea.  We need memory constraint which prevents memory address reloading.  So I've started to implement the new type of constraint.
Comment 8 Vladimir Makarov 2016-01-29 18:47:49 UTC
Author: vmakarov
Date: Fri Jan 29 18:47:17 2016
New Revision: 232993

URL: https://gcc.gnu.org/viewcvs?rev=232993&root=gcc&view=rev
Log:
2016-01-29  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/69299
	* config/i386/constraints.md (Bm): Describe as special memory
	constraint.
	* doc/md.texi (DEFINE_SPECIAL_MEMORY_CONSTRAINT): Describe it.
	* genoutput.c (main): Process DEFINE_SPECIAL_MEMORY_CONSTRAINT.
	* genpreds.c (struct constraint_data): Add is_special_memory.
	(have_special_memory_constraints, special_memory_start): New
	static vars.
	(special_memory_end): Ditto.
	(add_constraint): Add new arg is_special_memory.  Add code to
	process its true value.  Update have_special_memory_constraints.
	(process_define_constraint): Pass the new arg.
	(process_define_register_constraint): Ditto.
	(choose_enum_order): Process special memory.
	(write_tm_preds_h): Generate enum const CT_SPECIAL_MEMORY and
	function insn_extra_special_memory_constraint.
	(main): Process DEFINE_SPECIAL_MEMORY_CONSTRAINT.
	* gensupport.c (process_rtx): Process
	DEFINE_SPECIAL_MEMORY_CONSTRAINT.
	* ira-costs.c (record_reg_classes): Process CT_SPECIAL_MEMORY.
	* ira-lives.c (single_reg_class): Use
	insn_extra_special_memory_constraint.
	* ira.c (ira_setup_alts): Process CT_SPECIAL_MEMORY.
	* lra-constraints.c (process_alt_operands): Ditto.
	(curr_insn_transform): Use insn_extra_special_memory_constraint.
	* recog.c (asm_operand_ok, preprocess_constraints): Process
	CT_SPECIAL_MEMORY.
	* reload.c (find_reloads): Ditto.
	* rtl.def (DEFINE_SPECIFAL_MEMORY_CONSTRAINT): New.
	* stmt.c (parse_input_constraint): Use
	insn_extra_special_memory_constraint.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/constraints.md
    trunk/gcc/doc/md.texi
    trunk/gcc/genoutput.c
    trunk/gcc/genpreds.c
    trunk/gcc/gensupport.c
    trunk/gcc/ira-costs.c
    trunk/gcc/ira-lives.c
    trunk/gcc/ira.c
    trunk/gcc/lra-constraints.c
    trunk/gcc/recog.c
    trunk/gcc/reload.c
    trunk/gcc/rtl.def
    trunk/gcc/stmt.c
Comment 9 Jeffrey A. Law 2016-01-29 19:10:04 UTC
Fixed by Vlad's commit on the trunk.