[Bug target/102952] New code-gen options for retpolines and straight line speculation

Thu Oct 28 22:07:47 GMT 2021

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952

--- Comment #22 from Andrew Cooper <andrew.cooper3 at citrix dot com> ---
One curious thing I have discovered.  While auditing the -mharden-sls=all code
generation in Xen, I found examples where I got "ret int3 ret int3" with no
intervening instructions.

It turns out this is not a regression in this change.  It is a pre-existing
missing optimisation, which is made more obvious when every ret is extended
with an int3.

It occurs for functions with either no stack frame at all, or functions which
have an early exit before setting up the stack frame.  Some examples which
occur at -O1 do not occur at -O2.

One curious example which does still repro at -O2 is this.  We have a hash
lookup function:

struct context *sidtab_search(struct sidtab *s, u32 sid)
{
    int hvalue;
    struct sidtab_node *cur;

    if ( !s )
        return NULL;

    hvalue = SIDTAB_HASH(sid);
    cur = s->htable[hvalue];
    while ( cur != NULL && sid > cur->sid )
        cur = cur->next;

    if ( cur == NULL || sid != cur->sid )
    {
        /* Remap invalid SIDs to the unlabeled SID. */
        sid = SECINITSID_UNLABELED;
        hvalue = SIDTAB_HASH(sid);
        cur = s->htable[hvalue];
        while ( cur != NULL && sid > cur->sid )
            cur = cur->next;
        if ( !cur || sid != cur->sid )
            return NULL;
    }

    return &cur->context;
}

which compiles (reformatted a little for width - unmodified:
https://paste.debian.net/hidden/7bf675d6/) to:

<sidtab_search>:
         48 85 ff             test   %rdi,%rdi
/------- 74 63                je     <sidtab_search+0x68>
|        48 8b 17             mov    (%rdi),%rdx
|        89 f0                mov    %esi,%eax
|        83 e0 7f             and    $0x7f,%eax
|        48 8b 04 c2          mov    (%rdx,%rax,8),%rax
|        48 85 c0             test   %rax,%rax
|   /--- 75 13                jne    <sidtab_search+0x29>
|  /|--- eb 17                jmp    <sidtab_search+0x2f>
|  ||    0f 1f 84 00 00 00 00 nopl   0x0(%rax,%rax,1)
|  ||    00 
|  ||/-> 48 8b 40 48          mov    0x48(%rax),%rax
|  |||   48 85 c0             test   %rax,%rax
|  +||-- 74 06                je     <sidtab_search+0x2f>
|  |\|-> 39 30                cmp    %esi,(%rax)
|  | \-- 72 f3                jb     <sidtab_search+0x20>
| /|---- 74 24                je     <sidtab_search+0x53>
| |\---> 48 8b 42 28          mov    0x28(%rdx),%rax
| |      48 85 c0             test   %rax,%rax
| | /--- 75 11                jne    <sidtab_search+0x49>
|/|-|--- eb 32                jmp    <sidtab_search+0x6c> // (1)
||| |    66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
||| |/-> 48 8b 40 48          mov    0x48(%rax),%rax
||| ||   48 85 c0             test   %rax,%rax
|||/||-- 74 17                je     <sidtab_search+0x60> // (2)
||||\|-> 83 38 04             cmpl   $0x4,(%rax)
|||| \-- 76 f2                jbe    <sidtab_search+0x40>
||||     83 38 05             cmpl   $0x5,(%rax)
+|||---- 75 15                jne    <sidtab_search+0x68>
||\|---> 48 83 c0 08          add    $0x8,%rax
|| |     c3                   retq   
|| |     cc                   int3   
|| |     0f 1f 80 00 00 00 00 nopl   0x0(%rax)
|| \---> c3                   retq                        // Target of (2)
||       cc                   int3   
||       66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
\|-----> 31 c0                xor    %eax,%eax
 |       c3                   retq   
 |       cc                   int3   
 \-----> c3                   retq                        // Target of (1)
         cc                   int3   
         66 90                xchg   %ax,%ax

There are 4 exits in total.  Two have to set up %eax, so they can't usefully be
merged.

However, the unconditional jmp at (1) is 2 bytes, and could fully contain its
target ret;int3 without even impacting the surrounding padding.  Whether it
inlines or merges, this drops 4 bytes.

The conditional jump at (2) could be folded in to any of the other exit paths,
dropping 16 bytes from the total size size.

I have no idea how easy/hard this may be to track down, or whether it is worth
pursuing urgently, but it probably does want looking at, seeing as SLS
hardening doubles the hit.