85516 – [missed-optimization] gcc does not convert multiple compares against constants to a shift+bitmask test

Bug 85516 - [missed-optimization] gcc does not convert multiple compares against constants to a shift+bitmask test

Summary: [missed-optimization] gcc does not convert multiple compares against constant...

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	8.0

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:	missed-optimization

Depends on:
Blocks:

Reported:	2018-04-24 15:03 UTC by Avi Kivity
Modified:	2021-11-29 15:29 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed:	2018-04-25 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Avi Kivity 2018-04-24 15:03:33 UTC

Consider

=== example starts ===
enum class E { a, b, c, d, e, f, g, h, i, j };

bool check_mask(E v) {
    return v == E::b || v == E::e || v == E::f || v == E::j;
}
=== example ends ===

This can be compiled to just three instructions (x86):

  mov $0x232, %eax
  bt %edi, %eax
  setc %al

but instead gcc compiles it to:

  cmpl $1, %edi
  sete %al
  cmpl $4, %edi
  sete %dl
  orb %dl, %al
  jne .L1
  subl $5, %edi
  andl $-5, %edi
  sete %al
 .L1:

which is three times as large and contains a possibly unpredictable branch. More bits in the mask will presumably generate larger code.

Comment 1 Jakub Jelinek 2018-04-24 15:25:42 UTC

GCC can do this for switch expansion right now,
enum class E { a, b, c, d, e, f, g, h, i, j };

bool check_mask(E v) {
  switch (v) {
    case E::b: case E::e: case E::f: case E::j: return true;
    default: return false;
  }
}

	xorl	%eax, %eax
	cmpl	$9, %edi
	ja	.L1
	movl	$1, %eax
	movl	%edi, %ecx
	salq	%cl, %rax
	testl	$562, %eax
	setne	%al
.L1:
	ret

Comment 2 Avi Kivity 2018-04-24 15:29:04 UTC

Interesting. Still missing optimizations - v cannot be larger than 9 (UB), and bt is faster than mov+shift+and.

Comment 3 Jakub Jelinek 2018-04-24 15:35:11 UTC

UB only with -fstrict-enums which isn't the default.

Comment 4 Richard Biener 2018-04-25 07:26:33 UTC

I expected fold_truth_andor to eventually catch this via fold_range_test.  Probably the "bad" association in the testcase prevents this.

Anyway, this is a job for reassoc / if-conversion.

Comment 5 Andrew Pinski 2021-07-24 09:53:23 UTC

Note on the trunk we get the full optimized version with the switch now:
check_mask(E):
        cmpl    $9, %edi
        ja      .L3
        movl    $562, %eax
        btq     %rdi, %rax
        setc    %al
        ret
.L3:
        xorl    %eax, %eax
        ret

Comment 6 Avi Kivity 2021-11-29 15:28:19 UTC

gcc 11.2 produces optimized code with the original example:

check_mask(E):
  cmpl $9, %edi
  ja .L3
  movl %edi, %ecx
  movl $562, %eax
  shrq %cl, %rax
  andl $1, %eax
  ret
.L3:
  xorl %eax, %eax
  ret

Comment 7 Avi Kivity 2021-11-29 15:29:29 UTC

Marking as fixed, since at least 11.2.