User account creation filtered due to spam.

Bug 77729 - aarch64 inserts unneeded uxtb after ldrb, orr ...32
Summary: aarch64 inserts unneeded uxtb after ldrb, orr ...32
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 6.1.1
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2016-09-25 14:49 UTC by Julian Andres Klode
Modified: 2016-11-30 11:18 UTC (History)
2 users (show)

See Also:
Host:
Target: aarch64
Build:
Known to work:
Known to fail:
Last reconfirmed: 2016-09-25 00:00:00


Attachments
C reproducer (125 bytes, text/plain)
2016-09-25 14:51 UTC, Julian Andres Klode
Details
Aarch64 output at -O2 (278 bytes, text/plain)
2016-09-25 14:51 UTC, Julian Andres Klode
Details
arm (thumb2) output at -O2 (429 bytes, text/plain)
2016-09-25 14:52 UTC, Julian Andres Klode
Details
ppc64le (308 bytes, text/plain)
2016-09-25 15:04 UTC, Julian Andres Klode
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Julian Andres Klode 2016-09-25 14:49:48 UTC
In the attached test case, the aarch64 target inserts a uxtb instruction that is not needed:

	ldrb	w1, [x0]
	orr	w1, w1, 32
	uxtb	w1, w1
	cmp	w1, 116

The arm backend handles that just fine:

	ldrb	r3, [r0]	@ zero_extendqisi2
	orr	r3, r3, #32
	cmp	r3, #116

It also works with 33 or 34 instead of 32 for whatever reasons.
Comment 1 Julian Andres Klode 2016-09-25 14:51:10 UTC
Created attachment 39675 [details]
C reproducer
Comment 2 Julian Andres Klode 2016-09-25 14:51:31 UTC
Created attachment 39676 [details]
Aarch64 output at -O2
Comment 3 Julian Andres Klode 2016-09-25 14:52:01 UTC
Created attachment 39677 [details]
arm (thumb2) output at -O2
Comment 4 Andrew Pinski 2016-09-25 14:54:40 UTC
Note this testcase needs to be improved as I have a patch which converts a switch with just one case and a default into anew if statement.
Comment 5 Julian Andres Klode 2016-09-25 15:04:37 UTC
Created attachment 39678 [details]
ppc64le

Hmm, AFAICT the same seems to happen on powerpc64le:

	lbz 9,0(3)                      # Load zero
	ori 9,9,0x20                    # ors in 32
	rlwinm 9,9,0,0xff               # zero extend value AFAICT
	cmpwi 7,9,116


So far tested:

good: mipsel, x86_64, armhf (thumb2)
bad: aarch64, powerpc64le
Comment 6 Julian Andres Klode 2016-09-25 15:09:33 UTC
(In reply to Andrew Pinski from comment #4)
> Note this testcase needs to be improved as I have a patch which converts a
> switch with just one case and a default into anew if statement.

FWIW, The issue is exactly the same with if statements, like in:

int TrieCase3(const char *string)
{
    if((string[0] | 32) == 't') {
        if((string[1] | 32) == 'a') {
            if((string[2] | 32) == 'g') {
                return 42;
            }
        }
    }
    return -1;
}
Comment 7 Florian Weimer 2016-09-25 15:11:54 UTC
Something similar happens on x86_64 with -Os.  The OR instruction operates on %eax instead of %al:

0000000000000000 <TrieCase3>:
   0:	8a 07                	mov    (%rdi),%al
   2:	83 c8 20             	or     $0x20,%eax
   5:	3c 74                	cmp    $0x74,%al

(Byte-wise OR would be 0x0c 0x20, one byte shorter.)
Comment 8 Andrew Pinski 2016-09-25 15:15:59 UTC
This looks like missing removal of casts.

Note in C, char_var|32 is really the same as ((int)char_var)|32
Comment 9 Julian Andres Klode 2016-09-25 15:19:01 UTC
(In reply to Andrew Pinski from comment #8)
> This looks like missing removal of casts.
> 
> Note in C, char_var|32 is really the same as ((int)char_var)|32

Well. The loads of the byte are already zero-extend loads. The current code is like saying:

   (int) ((int)char|32)

:)
Comment 10 Andrew Pinski 2016-09-25 15:42:04 UTC
This is how we expand it on aarch64:
(insn 10 9 11 (set (reg:QI 81)
        (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8])) t.c:4 -1
     (nil))

(insn 11 10 12 (set (reg:SI 82)
        (ior:SI (subreg:SI (reg:QI 81) 0)
            (const_int 32 [0x20]))) t.c:4 -1
     (nil))

(insn 12 11 13 (set (reg:SI 83)
        (zero_extend:SI (subreg:QI (reg:SI 82) 0))) t.c:4 -1
     (nil))

(insn 13 12 14 (set (reg:CC 66 cc)
        (compare:CC (reg:SI 83)
            (const_int 116 [0x74]))) t.c:4 -1
     (nil))


-----
(set (reg:SI 83)
    (ior:SI (and:SI (subreg:SI (mem:QI (reg/v/f:DI 80 [ string ]) [0 *string_9(D)+0 S1 A8]) 0)
            (const_int 223 [0xdf]))
        (const_int 32 [0x20])))

Notice how the and there is 223, but really that can be still a zero_extend.  Basically combine is going funny.

---- CUT ----

Note for 33, orr does not accept 33 so combine does not see 33 and does not change the and part around the subreg.