Bug 110772 - strange code generated for bit-field access
Summary: strange code generated for bit-field access
Status: UNCONFIRMED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 10.5.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2023-07-21 23:26 UTC by Roland Illig
Modified: 2023-07-22 15:50 UTC (History)
0 users

See Also:
Host:
Target: arm
Build:
Known to work:
Known to fail:
Last reconfirmed:


Attachments
precompiled code that generates unrelated diagnostics (25.83 KB, text/plain)
2023-07-21 23:26 UTC, Roland Illig
Details
precompiled code that works as intended (25.82 KB, text/plain)
2023-07-21 23:27 UTC, Roland Illig
Details
Assembler code from comment 5 (47.28 KB, text/plain)
2023-07-22 15:23 UTC, Roland Illig
Details
Preprocessed source from comment 5 (36.99 KB, text/plain)
2023-07-22 15:29 UTC, Roland Illig
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roland Illig 2023-07-21 23:26:55 UTC
Created attachment 55598 [details]
precompiled code that generates unrelated diagnostics

In NetBSD's lint, I changed 'struct type_qualifier' from an enum to a set of bit-fields. It worked well on x86_64 but not on arm and powerpc.

https://github.com/NetBSD/src/commit/35c652085d26b93b94f55312f715361ee0cd2043

On these two 32-bit platforms, lint generated some unrelated and wrong diagnostics.

I tracked the difference down to a single line of code, and changing that line changes unrelated code.

In the attached p5.i, applying the following diff changes not only the code in cgram.y:529, but also in cgram.y:489.

$ diff -u p5.i p3.i
--- p5.i        2023-07-22 00:22:30.748103516 +0200
+++ p3.i        2023-07-22 00:22:05.448298739 +0200
@@ -6424,7 +6424,7 @@
 case 68:
 # 528 "cgram.y"
  {
-  if (!yystack.l_mark[0].y_type_qualifiers.tq_const)
+  if (!yystack.l_mark[0].y_seen_statement)
    yyerror("Bad attribute");
  }
 # 1 ""


$ gcc --version
gcc (nb2 20230710) 10.5.0

$ uname -mp
evbarm earmv7hfeb

$ gcc -O2 -S p5.i -fverbose-asm
$ gcc -O2 -S p3.i -fverbose-asm
$ gcc -O2 -c p5.i
$ gcc -O2 -c p3.i
$ objdump -dr p5.o > p5.dis
$ objdump -dr p3.o > p3.dis

$ diff -u p5.dis p3.dis
...
-     ba8:      e1b033a3        lsrs    r3, r3, #7
+     ba8:      e3530000        cmp     r3, #0
...
-    1010:      e1b033a3        lsrs    r3, r3, #7
+    1010:      e3530000        cmp     r3, #0

The code has changed in two places. Searching for the text "#7" in the p5.s file shows that the two places where the code has changed are in cgram.y:529 and cgram.y:489.

I doubt that the code invokes undefined behavior, as it is fairly standard yacc code. Therefore I suspect a compiler error.

Compiling the code with -Os or -O1 instead of -O2 does not show this behavior.

Removing the ':1' from the members in struct type_qualifier does not show this behavior.

Compiling the code on x86_64 or i386 does not show this behavior.
Comment 1 Roland Illig 2023-07-21 23:27:46 UTC
Created attachment 55599 [details]
precompiled code that works as intended
Comment 2 Andrew Pinski 2023-07-21 23:53:48 UTC
I am trying to understand what you think is wrong here?

lsrs    r3, r3, #7

means logical shift right by 7 and compare against 0.

Also this is big-endian arm so the order of bit fields will be different than little-endian.

This is extracting one bit from the whole byte.

x86 has an instruction (testb) which does the testing that way.
arm does not.
powerpc does not either.
aarch64 has an instruction too.


I don't see anything wrong with the code generation here at all. 

Take:
```
struct t
{
  _Bool a0:1;
  _Bool a1:1;
  _Bool a2:1;
  _Bool a3:1;
  _Bool a4:1;
  _Bool a5:1;
  _Bool a6:1;
  _Bool a7:1;
};

int g(); int h();
int f(struct t *a)
{
  if(a->a0) return g();
  return h();
}
int f1(struct t *a)
{
  if(a->a7) return g();
  return h();
}
int f2(_Bool *a)
{
  if(*a) return g();
  return h();
}
```

I don't see anything wrong with the above code generation for either arm or x86_64 (or powerpc).
Comment 3 Andrew Pinski 2023-07-22 00:08:57 UTC
Maybe PR 108498  ....
Comment 4 Andrew Pinski 2023-07-22 00:11:01 UTC
Note 10.5.0 was the last release in the GCC 10.x series, can you test out GCC 11.4.0 out?
Comment 5 Roland Illig 2023-07-22 15:21:30 UTC
Sorry for the confusing description. Let me try again.

NetBSD lint includes a yacc parser for C code. This parser contains the rules 'block_item_list' and 'block_item':

https://github.com/NetBSD/src/blob/7ebcd1679b455e619909ec9c128e8ad7f103ded9/usr.bin/xlint/lint1/cgram.y#L1804
> block_item_list: /* C99 6.8.2 */
>   block_item
> | block_item_list block_item {
>     if ($1 && !$2)
>       /* declarations after statements is a C99 feature */
>       c99ism(327);
>     $$ = $1 || $2;
>   }


The rule 'block_item' remembers in 'y_seen_statement' whether the item was a statement, so that the parser rule 'block_item_list' can warn in C90 mode.

In another part of the parser, the rule 'gcc_attribute' handles the keyword 'const', by accessing 'y_type_qualifiers.tq_const', which on 2023-07-15 was a _Bool bit-field member in 'struct type_qualifiers'.

https://github.com/NetBSD/src/blob/7ebcd1679b455e619909ec9c128e8ad7f103ded9/usr.bin/xlint/lint1/cgram.y#L2197
> gcc_attribute:
> | type_qualifier {
>     if (!$1.tq_const)
>       yyerror("Bad attribute");
>   }

On big-endian arm and powerpc, the code that GCC 10.5.0 generates for the 'block_item_list' parser rule depends on whether the 'gcc_attribute' parser rule accesses a bit-field member or not. This does not make sense to me, as I expect the parser rules to be independent.

When I compile this parser on arm with -O2 and run lint in C90 mode, it not only warns about declarations after statements, but also about statements after statements.

$ gcc -O2 -ftrapv -fPIE -std=gnu99 -S cgram.c -o cgram.casmv -fverbose-asm

The code that is generated for the condition '$1 && !$2' is:

> @ cgram.y:1802:                 if ($1 && !$2)
>         ldr     r6, .L580+796
>         add     r6, pc, r6
>         ldr     r4, [r6, #20]
>         ldrb    r3, [r4, #-8]   @ $1.y_seen_statement
>         cmp     r3, #0
>         beq     .L368
> @ cgram.y:550:          $$ = build_unary($2 == INC ? INCAFT : DECAFT, $3, $1);
>         ldrb    r3, [r4]
> @ cgram.y:1802:
>         lsrs    r3, r3, #7      @ $2.y_type_qualifiers.tq_const
>         beq     .L562

(Annotations hand-edited by me.)

Two strange things happen here:

1. The code from cgram.y:550 is completely unrelated, so it should not have been mentioned here. The 'ldrb' is correct, so maybe it's just the attribution that is wrong.

2. The expressions '$1' and '$2' both have type _Bool. Nevertheless, the second bool is accessed as if it were a bit-field. Due to this access, no matter whether '$2 as bool' is 1 or 0, the the expression '($2 as bit-field) >> 7' always evaluates to 0, so the condition '$1 && !$2' evaluates to true, which results in the additional warnings from lint.

Instead of lsrs, GCC should have generated a cmp instruction.

I don't have any other GCC version installed on the machine, so I cannot test whether GCC 11.4.0 behaves differently.
Comment 6 Roland Illig 2023-07-22 15:23:31 UTC
Created attachment 55611 [details]
Assembler code from comment 5
Comment 7 Roland Illig 2023-07-22 15:29:41 UTC
Created attachment 55612 [details]
Preprocessed source from comment 5
Comment 8 Roland Illig 2023-07-22 15:50:14 UTC
When I compile the attached code with "ARM GCC 10.5.0" and "-O2 -fPIE -ftrapv" on godbolt.org, the generated code is correct (you can search for "#327" in the output and then go back one branch).

The code generated by godbolt.org "ARM GCC 11.4.0" with "-O2 -fPIE -ftrapv" looks good as well.

So it could also be that the NetBSD version of GCC is missing a bug-fix or two.