This is the mail archive of the gcc-bugs@gcc.gnu.org mailing list for the GCC project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA

From: "ysrumyan at gmail dot com" <gcc-bugzilla at gcc dot gnu dot org>
To: gcc-bugs at gcc dot gnu dot org
Date: Thu, 15 Nov 2012 15:25:20 +0000
Subject: [Bug rtl-optimization/55342] New: [LRA,x86] Non-optimal code for simple loop with LRA
Auto-submitted: auto-generated

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55342

             Bug #: 55342
           Summary: [LRA,x86] Non-optimal code for simple loop with LRA
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ysrumyan@gmail.com
            Target: x86


For a simple test-case we got -15% regression with LRA on x86 in 32-bit mode.
The test-case is

#define byte unsigned char
#define MIN(a, b) ((a) > (b)?(b):(a))

void convert_image(byte *in, byte *out, int size) {
    int i;
    byte * read = in,
     * write = out;
    for(i = 0; i < size; i++) {
        byte r = *read++;
        byte g = *read++;
        byte b = *read++;
        byte c, m, y, k, tmp;
        c = 255 - r;
        m = 255 - g;
        y = 255 - b;
    if (c < m)
      k = MIN (c, y);
    else
          k = MIN (m, y);
        *write++ = c - k;
        *write++ = m - k;
        *write++ = y - k;
        *write++ = k;
    }
}

The essential part of assembly is (it is correspondent to write-part of loop): 

without LRA
.L4:
    movl    %esi, %ecx
    addl    $4, %eax
    subl    %ecx, %ebx
    movzbl    3(%esp), %ecx
    movb    %bl, -4(%eax)
    movl    %esi, %ebx
    subl    %ebx, %edx
    movb    %dl, -2(%eax)
    subl    %ebx, %ecx
    movb    %cl, -3(%eax)
    cmpl    %ebp, 4(%esp)
    movb    %bl, -1(%eax)
    je    .L1

with LRA

.L4:
    movl    %esi, %eax
    subl    %eax, %ebx
    movl    28(%esp), %eax
    movb    %bl, (%eax)
    movl    %esi, %eax
    subl    %eax, %ecx
    movl    28(%esp), %eax
    movb    %cl, 1(%eax)
    movl    %esi, %eax
    subl    %eax, %edx
    movl    28(%esp), %eax
    movb    %dl, 2(%eax)
    addl    $4, %eax
    movl    %eax, 28(%esp)
    movl    28(%esp), %ecx
    movl    %esi, %eax
    cmpl    %ebp, (%esp)
    movb    %al, -1(%ecx)
    je    .L1

I also wonder why additional moves are required to perform subtraction:

    movl  %esi, %eax
    subl  %eax, %ebx

whereas only one instruction is required:
    subl  %esi, %ebx.

I assume that this part is not related to LRA.

Follow-Ups:
- [Bug rtl-optimization/55342] [LRA,x86] Non-optimal code for simple loop with LRA
  - From: hjl.tools at gmail dot com
- [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  - From: vmakarov at gcc dot gnu.org
- [Bug rtl-optimization/55342] [4.8 Regression] [LRA,x86] Non-optimal code for simple loop with LRA
  - From: ysrumyan at gmail dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]