Bug 95891 - Missing optimization comparing equals two fields of the same struct that is passed via a register
Summary: Missing optimization comparing equals two fields of the same struct that is p...
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 10.1.0
: P3 enhancement
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks: argument, return
  Show dependency treegraph
 
Reported: 2020-06-25 08:38 UTC by jm
Modified: 2021-08-16 01:05 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2020-06-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jm 2020-06-25 08:38:57 UTC
I'm sorry, this is perhaps not the correct component but my knowledge of gcc internals does not allow me to do more than guess.

For all version of gcc I've tried, the following code:

struct point {
    int x, y;
};

bool f(point a, point b) {
    return a.x == b.x && a.y == b.y;
}

bool f(unsigned long long a, unsigned long long b) {
    return a == b;
}

is compiled to

f(point, point):
        xor     eax, eax
        cmp     edi, esi
        je      .L5
        ret
.L5:
        sar     rdi, 32
        sar     rsi, 32
        cmp     edi, esi
        sete    al
        ret
f(unsigned long long, unsigned long long):
        cmp     rdi, rsi
        sete    al
        ret

I'd expect f(point, point) to have the same assembly as f(unsigned long long, unsigned long long).

Yours,

-- Jean-Marc Bourguet
Comment 1 Richard Biener 2020-06-25 11:09:38 UTC
Confirmed.
Comment 2 Andrew Pinski 2021-05-30 23:26:44 UTC
Confirmed.  Happens on aarch64 too:
        cmp     w0, w1
        beq     .L5
        mov     w0, 0
        ret
        .p2align 2,,3
.L5:
        asr     x0, x0, 32
        asr     x1, x1, 32
        cmp     w0, w1
        cset    w0, eq
        ret

I wonder if we could expose that point is passed via a 64bit argument at the tree level and then use BIT_FIELD_REF to do the extraction or lower the field extractions to BIT_FIELD_REF.

Also we don't optimize:
bool f1(unsigned long long a, unsigned long long b) {
  return (((int)a) == ((int)b)) && ((int)(a>>32) == (int)(b>>32));
}

into just return a==b; either.
Which is another thing which needs to happen after the BIT_FIELD_REF Change ...