Bug 106725 - LTO semantics for __attribute__((leaf))
Summary: LTO semantics for __attribute__((leaf))
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 12.2.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: documentation, lto, wrong-code
Depends on:
Blocks:
 
Reported: 2022-08-23 18:09 UTC by Daniel Thornburgh
Modified: 2022-12-20 18:03 UTC (History)
7 users (show)

See Also:
Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2022-11-05 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Thornburgh 2022-08-23 18:09:49 UTC
After implementing GCC's `__attribute__((leaf))` in Clang/LLVM, discussion arose (https://reviews.llvm.org/D131628#3740602) about the semantics of LTO.

It seems like merging together object files in LTO might or might notproduce erroneous results, depending on how the term "compilation unit" in the docs are interpreted.

A minimal example consists of three source files:

main.c:
  __attribute__((leaf)) void foo(void);
  int main(void) {
    foo();
    return 0;
  }

foo.c:
  void bar(void);
  void foo(void) {
    bar();
  }

bar.c:
  void bar(void) {}

If "compilation unit" in the `__attribute__((leaf))` manual means "translation unit", `__attribute__((leaf))` should be valid in main.c, since foo() only calls bar(), which is not in the main.c translation unit.

Compile main.c and bar.c to LTO GIMPLE, but compile foo.c without LTO:
  $ gcc -flto -c main.c bar.c
  $ gcc -c foo.c

The resulting GIMPLE and object files can then be linked together and the LTO trees dumped (Here I'm using -nostdlib just because I built gcc in isolation). I installed a simple printf hack to gimple-pretty-print to print "leaf" if ECF_LEAF is set on a printed call. This gives the following results:

  $ gcc -fdump-tree-all main.o bar.o foo.o

  $ cat a.ltrans0.ltrans.252t.optimized
<...>
int main ()
{
<...>
  foo (); check 1 1024 leaf
<...>
}

;; Function bar (bar, funcdef_no=1, decl_uid=4720, cgraph_uid=1, symbol_order=2)
<...>

From the above, it appears that the main and bar translation units are merged together by LTO to form a new module without altering the `__attribute__((leaf))` semantics given on the call to foo() in main().

If "compilation unit" is interpreted as "translation unit", this may be incorrect behavior, since any post-LTO passes that are expecting that foo() cannot call bar() (since it's in the same TU) would have their expectations violated.

If instead "compilation unit" is interpreted as "LTO unit", `then the above program has undefined behavior, since `__attribute__((leaf)) should never have been used on foo(), since it calls something (bar()) in the same LTO unit as main.c.

Accordingly, if "compilation unit" means "translation unit" and this case is incorrect behavior, this bug is to report this.

If "compilation unit" instead means "LTO unit", this bug is to request clarification in the GCC manual about the precise behavior in case of LTO.

Finally, if "compilation unit" means "translation unit", but GCC has some other internal mechanisms for dealing with this, feel free to close this. Please do report back with those mechanisms though, as we'll have to do something similar in LLVM in that case. It still may be worth altering the docs, given the ambiguity in the term "compilation unit" in the presence of LTO.

System: Debian rodete

GCC build commands used:
  ../configure --disable-bootstrap  --enable-languages=c,c++,lto \
    --prefix=$(realpath ..)/gcc-install
  make -j96 -l96 all-gcc
  make install-gcc
Comment 1 Richard Biener 2022-08-24 07:18:12 UTC
For GCC "leaf" is interpreted at the LTO WPA stage where 'compilation unit' then
means the whole program.  Note that "leaf" doesn't mean calls "back" into the
CU that GCC _can see_ are invalid - those are treated correctly.  It's basically an optimization promise that if GCC doesn't see such call it can assume there
are no "hidden" ones.

If GCC, with LTO, would partition the program into two LTRANS partitions,
one containing main and bar and one containing foo then applying this
optimization promise during LTRANS time on the main/bar partition would
be wrong as you say - but I think GCC doesn't do this.

As for documentation I think 'compilation unit' should be changed to
'translation unit', since that's the only thing a user can reason about.
The compiler then has to make sure to apply compatible reasoning when
combining multiple translation units.

Honza?
Comment 2 Daniel Thornburgh 2022-08-24 16:42:08 UTC
(In reply to Richard Biener from comment #1)
> If GCC, with LTO, would partition the program into two LTRANS partitions,
> one containing main and bar and one containing foo then applying this
> optimization promise during LTRANS time on the main/bar partition would
> be wrong as you say - but I think GCC doesn't do this.

In this case, foo() was already compiled to native code outside of LTO. Wouldn't this then mean that its contents wouldn't be available for the WPA and LTRANS phases of the LTO code generation? It seems like the compiler wouldn't know that foo() might call bar(), and the presence of `__attribute__((leaf))` would cause it to assume that it doesn't call bar().
Comment 3 rguenther@suse.de 2022-08-25 05:48:30 UTC
On Wed, 24 Aug 2022, dthorn at google dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106725
> 
> --- Comment #2 from Daniel Thornburgh <dthorn at google dot com> ---
> (In reply to Richard Biener from comment #1)
> > If GCC, with LTO, would partition the program into two LTRANS partitions,
> > one containing main and bar and one containing foo then applying this
> > optimization promise during LTRANS time on the main/bar partition would
> > be wrong as you say - but I think GCC doesn't do this.
> 
> In this case, foo() was already compiled to native code outside of LTO.
> Wouldn't this then mean that its contents wouldn't be available for the WPA and
> LTRANS phases of the LTO code generation? It seems like the compiler wouldn't
> know that foo() might call bar(), and the presence of `__attribute__((leaf))`
> would cause it to assume that it doesn't call bar().

As said, GCC shouldn't assume this since leaf is defined at translation
unit level, not at LTO level.
Comment 4 Daniel Thornburgh 2022-08-25 16:26:16 UTC
(In reply to rguenther@suse.de from comment #3)
> As said, GCC shouldn't assume this since leaf is defined at translation
> unit level, not at LTO level.

Sure, but what prevents GCC from making this assumption? Are all uses of leaf evaluated before the TUs are merged? Does GCC have some provenance tracking for which TU a given function came from in the merged view? Is there a pass I missed to drop leaf after merging but before it's used?
Comment 5 rguenther@suse.de 2022-08-26 07:15:25 UTC
On Thu, 25 Aug 2022, dthorn at google dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106725
> 
> --- Comment #4 from Daniel Thornburgh <dthorn at google dot com> ---
> (In reply to rguenther@suse.de from comment #3)
> > As said, GCC shouldn't assume this since leaf is defined at translation
> > unit level, not at LTO level.
> 
> Sure, but what prevents GCC from making this assumption? Are all uses of leaf
> evaluated before the TUs are merged? Does GCC have some provenance tracking for
> which TU a given function came from in the merged view? Is there a pass I
> missed to drop leaf after merging but before it's used?

Honza should be able to answer this.
Comment 6 Daniel Thornburgh 2022-11-01 01:32:59 UTC
I spent a little more time on this, and here's a more concrete reproducer of GCC's current behavior.

The setup again has 3 files: main.c, lto.c, and ext.c. lto.c is a simple getter-setter interface wrapping a global int. main.c sets the value using this interface, then makes an __attribute__((leaf)) call to ext.c. This sets the value to 0. This should be legal, since the call doesn't call back to main.c, it calls to lto.c.

$ tail -n+1 *.c

==> ext.c <==
void set_value(int v);

void external_call(void) {
  set_value(0);
}

==> lto.c <==
static int value;
void set_value(int v) { value = v; }
int get_value(void) { return value; }

==> main.c <==
#include <stdio.h>

void set_value(int v);
int get_value(void);
__attribute__((leaf)) void external_call(void);

int main(void) {
  set_value(42);
  external_call();
  printf("%d\n", get_value());
}


If we compile main.c and lto.c together using the pre-WHOPR module-merging flow, the resulting binary assumes that the external call cannot clobber the value, and it thus prints 42 rather than zero.

$ gcc -c -O2 ext.c
$ gcc -O2 -flto-partition=none main.o lto.o ext.o
$ ./a.out
42

If you instead use WHOPR, it looks like this optimization doesn't trigger:
$ gcc -O2 -flto main.o lto.o ext.o
$ ./a.out
0

At least in the unpartitioned case, it looks like the optimizer is considering attribute((leaf)) to apply to the whole LTO unit. I'm unsure what WPA's semantics are, since there may be other reasons why this optimization wasn't taken there.
Comment 7 Daniel Thornburgh 2022-11-01 01:50:18 UTC
Correction: A compilation line was missed:

+$ gcc -flto -c -O2 main.c lto.c
 $ gcc -c -O2 ext.c
Comment 8 Richard Biener 2022-11-05 14:09:14 UTC
I think that shows we have a correctness issue here (if 'leaf' should be usable).

Honza - can you please investigate?