This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug rtl-optimization/83628] New: performance regression when accessing arrays on alpha


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83628

            Bug ID: 83628
           Summary: performance regression when accessing arrays on alpha
           Product: gcc
           Version: 7.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mikulas at artax dot karlin.mff.cuni.cz
  Target Milestone: ---

The alpha architecture has instructions s4add, s8add, s4sub and s8sub. These
instructions shift the first argument left by 4 or 8 bits and add or subtract
the second argument.

GCC version 6 and 7 is not capable of using these instructions to perform
addition and shift. It always generates these instructions with the second
argument zero and generates separate add or sub instruction afterwards. GCC 5
and before use these instructions correctly.

These intructions are used to access arrays and thus this bug causes slowdown
of any code that works with arrays.


Example:
$ cat index.c 
int get_int(int *p, long idx)
{
        return p[idx];
}
long long get_long_long(long long *p, long idx)
{
        return p[idx];
}
$ alpha-linux-gnu-gcc-6 -c -O2 index.c
$ alpha-linux-gnu-objdump -d index.o
0000000000000000 <get_int>:
   0:   51 14 20 42     s4addq  a1,0,a1
   4:   11 04 11 42     addq    a0,a1,a1
   8:   00 00 11 a0     ldl     v0,0(a1)
   c:   01 80 fa 6b     ret

0000000000000010 <get_long_long>:
  10:   51 16 20 42     s8addq  a1,0,a1
  14:   11 04 11 42     addq    a0,a1,a1
  18:   00 00 11 a4     ldq     v0,0(a1)
  1c:   01 80 fa 6b     ret

$ cat s4add.c
unsigned long s4a(unsigned long a, unsigned long b)
{
        return a + b * 4;
}

unsigned long s8a(unsigned long a, unsigned long b)
{
        return a + b * 8;
}
$ alpha-linux-gnu-gcc-6 -c -O2 s4add.c
$ alpha-linux-gnu-objdump -d s4add.o
0000000000000000 <s4a>:
   0:   51 14 20 42     s4addq  a1,0,a1
   4:   00 04 30 42     addq    a1,a0,v0
   8:   01 80 fa 6b     ret
   c:   00 00 fe 2f     unop

0000000000000010 <s8a>:
  10:   51 16 20 42     s8addq  a1,0,a1
  14:   00 04 30 42     addq    a1,a0,v0
  18:   01 80 fa 6b     ret
  1c:   00 00 fe 2f     unop


With gcc 5 and previous, optimal code is generated:
$ alpha-linux-gnu-gcc-5 -c -O2 index.c
$ alpha-linux-gnu-objdump -d index.o
0000000000000000 <get_int>:
   0:   51 04 30 42     s4addq  a1,a0,a1
   4:   00 00 11 a0     ldl     v0,0(a1)
   8:   01 80 fa 6b     ret
   c:   00 00 fe 2f     unop

0000000000000010 <get_long_long>:
  10:   51 06 30 42     s8addq  a1,a0,a1
  14:   00 00 11 a4     ldq     v0,0(a1)
  18:   01 80 fa 6b     ret
  1c:   00 00 fe 2f     unop

$ alpha-linux-gnu-gcc-5 -c -O2 s4add.c
$ alpha-linux-gnu-objdump -d s4add.o
0000000000000000 <s4a>:
   0:   40 04 30 42     s4addq  a1,a0,v0
   4:   01 80 fa 6b     ret
   8:   1f 04 ff 47     nop
   c:   00 00 fe 2f     unop

0000000000000010 <s8a>:
  10:   40 06 30 42     s8addq  a1,a0,v0
  14:   01 80 fa 6b     ret
  18:   1f 04 ff 47     nop
  1c:   00 00 fe 2f     unop



I bisected the problem and it is caused by this commit:

commit fabf26080cb4cc3fecd30d409ec9c63f0ec42eff
Author: vekumar <vekumar@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu May 7 10:47:54 2015 +0000

    2015-05-07  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>

            * combine.c (make_compound_operation): Remove checks for PLUS/MINUS
            rtx type.


    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@222874
138bc75d-0d04-0410-961f-82ee72b054a4

--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2015-05-07  Venkataramanan Kumar  <venkataramanan.kumar@amd.com>
+
+       * combine.c (make_compound_operation): Remove checks for PLUS/MINUS
+       rtx type.
+
 2015-05-07  Richard Biener  <rguenther@suse.de>

        PR tree-optimization/66002
diff --git a/gcc/combine.c b/gcc/combine.c
index c04146ae645..9e3eb030a63 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7723,9 +7723,8 @@ extract_left_shift (rtx x, int count)
    We try, as much as possible, to re-use rtl expressions to save memory.

    IN_CODE says what kind of expression we are processing.  Normally, it is
-   SET.  In a memory address (inside a MEM, PLUS or minus, the latter two
-   being kludges), it is MEM.  When processing the arguments of a comparison
-   or a COMPARE against zero, it is COMPARE.  */
+   SET.  In a memory address it is MEM.  When processing the arguments of
+   a comparison or a COMPARE against zero, it is COMPARE.  */

 rtx
 make_compound_operation (rtx x, enum rtx_code in_code)
@@ -7745,8 +7744,6 @@ make_compound_operation (rtx x, enum rtx_code in_code)
      but once inside, go back to our default of SET.  */

   next_code = (code == MEM ? MEM
-              : ((code == PLUS || code == MINUS)
-                 && SCALAR_INT_MODE_P (mode)) ? MEM
               : ((code == COMPARE || COMPARISON_P (x))
                  && XEXP (x, 1) == const0_rtx) ? COMPARE
               : in_code == COMPARE ? SET : in_code);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]