This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance

From: "pthaugen at gcc dot gnu dot org" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: 21 Aug 2008 19:21:56 -0000
Subject: [Bug tree-optimization/37194] New: Autovectorization of constant iteration loop degrades performance
Reply-to: gcc-bugzilla at gcc dot gnu dot org

Seeing a degradation in cpu2000 benchmark 252.eon that is caused by
autovectorization of a simple loop in function ggSpectrum::Set(float).

Here's a simple C version.

void ggSpectrum_Set(float * data, float d) {
   int i;
   for (i = 0; i < 8; i++)
      data[i] = d;
}


When compiled with -O3 -mcpu=970 the following code is generated:

ggSpectrum_Set:
        mfvrsave 0
        stwu 1,-48(1)
        stw 0,44(1)
        oris 0,0,0x8000
        mtvrsave 0
        li 10,0
        rlwinm 0,3,30,30,31
        subfic 0,0,4
        andi. 9,0,3
        beq- 0,.L16
        mtctr 9
        .p2align 4,,15
.L10:
        slwi 0,10,2
        addi 10,10,1
        stfsx 1,3,0
        subfic 8,10,8
        bdnz .L10
.L3:
        subfic 6,9,8
        srwi 0,6,2
        slwi. 7,0,2
        beq- 0,.L5
        mtctr 0
        stfs 1,16(1)
        cmpwi 7,0,0
        li 0,16
        slwi 9,9,2
        li 11,0
        add 9,3,9
        lvewx 0,1,0
        vspltw 0,0,0
        beq- 7,.L17
        .p2align 4,,15
.L6:
        slwi 0,11,4
        addi 11,11,1
        stvx 0,9,0
        bdnz .L6
        cmpw 7,6,7
        subf 8,7,8
        add 10,10,7
        beq- 7,.L9
.L5:
        mtctr 8
        slwi 0,10,2
        add 3,3,0
        .p2align 4,,15
.L8:
        stfs 1,0(3)
        addi 3,3,4
        bdnz .L8
.L9:
        lwz 12,44(1)
        mtvrsave 12
        addi 1,1,48
        blr
.L16:
        mr 10,9
        li 8,8
        b .L3
.L17:
        li 0,1
        mtctr 0
        b .L6


Adding -mno-altivec results in this simpler sequence, and a significant boost
in performance (~40% speedup for the benchmark):

ggSpectrum_Set:
        stfs 1,28(3)
        stfs 1,0(3)
        stfs 1,4(3)
        stfs 1,8(3)
        stfs 1,12(3)
        stfs 1,16(3)
        stfs 1,20(3)
        stfs 1,24(3)
        blr


Another thing that stood out from the benchmark run was that the code was
taking a pretty big hit on a couple of the statically predicted branches
(apparently the address was already 16 byte aligned a lot of the time). So it
seems like it would be best to remove the static prediction and let the
hardware prediction take over.


-- 
           Summary: Autovectorization of constant iteration loop degrades
                    performance
           Product: gcc
           Version: 4.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: pthaugen at gcc dot gnu dot org
 GCC build triplet: powerpc64-linux
  GCC host triplet: powerpc64-linux
GCC target triplet: powerpc64-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194

Follow-Ups:
- [Bug tree-optimization/37194] Autovectorization of small constant iteration loop degrades performance
  - From: pinskia at gcc dot gnu dot org
- [Bug tree-optimization/37194] Autovectorization of small constant iteration loop degrades performance
  - From: rguenth at gcc dot gnu dot org
- [Bug tree-optimization/37194] Autovectorization of small constant iteration loop degrades performance
  - From: dorit at gcc dot gnu dot org

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]