Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug
Bug#: 32593
Product:  
Component:  
Status: NEW
Resolution:
Assigned To: Not yet assigned to anyone <unassigned@gcc.gnu.org>
Host:
Reported against  
Priority:  
Severity:  
Target Milestone:  
 
 
Target:
Reporter: Alexander Strange <astrange@ithinksw.com>
Add CC:
CC:
Remove selected CCs
Build:
URL:
Summary:
Keywords:
Known to work:
Known to fail:

Attachment Description Type Created Size Actions
sub.c testcase text/plain 2007-07-02 18:47 299 bytes Edit
Create a New Attachment (proposed patch, testcase, etc.) View All

Bug 32593 depends on: Show dependency tree
Show dependency graph
Bug 32593 blocks:

Additional Comments:





Mark bug as waiting for feedback
Mark bug as suspended




View Bug Activity   |   Format For Printing   |   Clone This Bug


Description:   Last confirmed: 2009-11-07 09:35 Opened: 2007-07-02 18:47
> /usr/local/gcc43/bin/g++ -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: ../gcc/configure --prefix=/usr/local/gcc43 --disable-multilib
--with-arch=pentium-m --with-tune=nocona --enable-target-optspace
--disable-bootstrap --with-gmp=/sw --with-system-zlib
--enable-languages=c,c++,objc,obj-c++
Thread model: posix
gcc version 4.3.0 20070702 (experimental)

> /usr/local/gcc43/bin/gcc -Os -fno-pic -fomit-frame-pointer -S sub.c

"i= 7 - ff_h264_norm_shift[x>>(CABAC_BITS-1)];" generates:

        movl    $7, %ecx
        subl    %eax, %ecx
        sall    %cl, %edx

It would be better if it generated:

        negl %eax
        addl $7, %eax
        sall    %al, %edx

which would leave a register free (which helps if this function is inlined);
this is safe since eax isn't used again later.

You can do this by transforming 'y = constant - x' into 'y = -x + constant'.
It looks like gcc actually does the reverse; if I change the source to match
the best output, it generates the same thing.

------- Comment #1 From Alexander Strange 2007-07-02 18:47 -------
Created an attachment (id=13827) [edit]
testcase

------- Comment #2 From Andrew Pinski 2007-07-02 19:06 -------
This is a target issue (or a semi generic one for 2-operand targets).  

------- Comment #3 From Alexander Strange 2008-02-16 18:16 -------
Also, 'x >> 32 - y' can be transformed into 'x >> -y', since x86 only uses the
lowest 5 bits. I'm not sure about other targets.

Messing with the backend doesn't seem very popular these days. I guess I should
figure out how those parts work.

------- Comment #4 From Alexander Strange 2008-12-17 22:10 -------
Causes silly code on i386 with this:
void pred8x8l_vertical_add_c(unsigned char *pix, const short *block, int
stride){
    int i;
    for(i=0; i<8; i++){
        int j;
        for (j=0; j<8; j++){
            pix[j] = pix[j-stride] + block[j];
        }
        pix+= stride;
        block+= 8;
    }
}

where it calculates and then spills each of [0-7] - stride to the stack,
instead of just being able to keep -stride in a register and incrementing it.

------- Comment #5 From Paolo Bonzini 2009-11-07 09:35 -------
Confirmed, the code for -O2 -funroll-loops includes things such as

        movzwl  2(%eax), %esi
        movl    $1, -44(%ebp)
        subl    %ecx, -44(%ebp)
        movl    -44(%ebp), %edi
        movzbl  (%edx,%edi), %ebx
        addl    %ebx, %esi
        movl    %esi, %ebx
        movb    %bl, 1(%edx)
...
        movl    -44(%ebp), %ebx
        movzwl  2(%esi), %edi
        movzbl  (%edx,%ebx), %ebx
        addl    %ebx, %edi
        movl    %edi, %ebx
        movb    %bl, 1(%edx)

The unrolling is done on the tree level, so it's CSE who'd need to know this.

------- Comment #6 From Richard Guenther 2009-11-07 14:35 -------
On the tree level we see (after FRE):

  D.1996_50 = pix_1 + 1;
  D.1997_51 = 1 - stride_11(D);
  D.1998_52 = (unsigned int) D.1997_51;
  D.1999_53 = pix_1 + D.1998_52;
  D.2000_54 = *D.1999_53;
  D.2003_57 = block_2 + 2;
  D.2004_58 = *D.2003_57;
  D.2005_59 = (unsigned char) D.2004_58;
  D.2006_60 = D.2000_54 + D.2005_59;
  *D.1996_50 = D.2006_60;

etc.

which again also shows the weakness of POINTER_PLUS_EXPR and the
conversions it causes for the offset operand.  IVOPTs cannot cope
with it and PRE/LIM make a mess out of the code as well.

------- Comment #7 From Richard Guenther 2009-11-07 14:44 -------
Oh, there's no loop.  Then it's the not implemented strength-reduction on
scalar code that is the issue.  In theory strength-reduction can be
integrated into our global value-numbering / PRE code, but nobody has done
that.

Bug List: (This bug is not in your last search results)   Show last search results      Search page      Enter new bug