Bug 41742 - Unnecessary zero-extension at -O2 but not -O1
Summary: Unnecessary zero-extension at -O2 but not -O1
Status: ASSIGNED
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 4.5.0
: P3 enhancement
Target Milestone: ---
Assignee: Ajit Kumar Agarwal
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2009-10-18 13:56 UTC by Segher Boessenkool
Modified: 2023-04-11 17:07 UTC (History)
5 users (show)

See Also:
Host:
Target: powerpc-linux
Build:
Known to work:
Known to fail:
Last reconfirmed: 2012-08-16 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Segher Boessenkool 2009-10-18 13:56:15 UTC
Take the following example:

void *memset(void *b, int c, unsigned long len)
{
        unsigned long i;

        for (i = 0; i < len; i++)
                ((unsigned char *)b)[i] = c;

        return b;
}

-O2 generates:

memset:
        cmpwi 0,5,0
        beqlr 0
        mtctr 5
        rlwinm 4,4,0,0xff
        li 9,0
        .p2align 4,,15
.L3:
        stbx 4,3,9
        addi 9,9,1
        bdnz .L3
        blr

The zero-extension of GPR4 isn't needed, and in fact, -O1 doesn't
generate it:

memset:
        cmpwi 0,5,0
        beqlr 0
        li 9,0
        subf 5,9,5
        mtctr 5
.L3:
        stbx 4,3,9
        addi 9,9,1
        bdnz .L3
        blr

(the subf here is superfluous though).
Comment 1 Segher Boessenkool 2012-08-16 10:39:45 UTC
Still happens on mainline: -O2 still has the superfluous sign-extend,
but now the -O1 code is perfect.
Comment 2 Segher Boessenkool 2017-03-02 10:53:55 UTC
With current trunk the loop code is better (uses stbu), but the
unnecessary extend still is there:

memset:
        cmpwi 0,5,0
        beqlr 0
        rlwinm 4,4,0,0xff
        mtctr 5
        addi 9,3,-1
        .p2align 4,,15
.L3:
        stbu 4,1(9)
        bdnz .L3
        blr
Comment 3 Peter Bergner 2023-03-23 13:54:19 UTC
Update: current code (both powerpc64-linux and powerpc64le-linux) still contain the unneeded rlwinm, but now we've replaced the loop with a call to libc's memset and we do not shrink-wrap the call due to the use of the non-volatile r31 in the entry block. 

foo:
	cmpdi 0,5,0
	std 31,-8(1)
	stdu 1,-128(1)
	mr 31,3
	beq 0,.L4
	mflr 0
	rlwinm 4,4,0,0xff
	std 0,144(1)
	bl memset
	nop
	ld 0,144(1)
	mtlr 0
.L4:
	addi 1,1,128
	mr 3,31
	ld 31,-8(1)
	blr