Bug 83653 - [6/7/8 Regression] GCC fails to remove a can't-happen call on ia64
Summary: [6/7/8 Regression] GCC fails to remove a can't-happen call on ia64
Status: RESOLVED INVALID
Alias: None
Product: gcc
Classification: Unclassified
Component: middle-end (show other bugs)
Version: 6.4.0
: P3 normal
Target Milestone: 6.5
Assignee: Not yet assigned to anyone
URL:
Keywords: missed-optimization
Depends on:
Blocks:
 
Reported: 2018-01-02 19:04 UTC by Matthew Wilcox
Modified: 2018-01-15 18:39 UTC (History)
3 users (show)

See Also:
Host: ia64-unknown-linux-gnu
Target: ia64-unknown-linux-gnu
Build: ia64-unknown-linux-gnu
Known to work: 4.6.3
Known to fail: 4.9.4, 5.5.0, 6.2.0, 7.2.0
Last reconfirmed: 2018-01-11 00:00:00


Attachments
Gzipped preprocessed source (293.66 KB, application/gzip)
2018-01-02 19:04 UTC, Matthew Wilcox
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Wilcox 2018-01-02 19:04:25 UTC
Created attachment 43009 [details]
Gzipped preprocessed source

This is an excerpt from the Linux kernel, with some patches that I'm preparing to go in.  The 0day build-bot reports a problem with an undefined __bad_increment_for_ia64_fetch_and_add.

I can reproduce the problem by taking the attached preprocessed source and compiling it with gcc -O2 -o shmem.o -c shmem.i and then running nm shmem.o |grep __bad

Manually inlining page_ref_sub() in the included test-case makes the problem go away, as does simply deleting the call to page_ref_sub() (at line 50756).

Command line:   gcc -Wp,-MD,mm/.shmem.o.d  -nostdinc -isystem /usr/lib/gcc/ia64-linux-gnu/4.6/include -I./arch/ia64/include -I./arch/ia64/include/generated  -I./include -I./arch/ia64/include/uapi -I./arch/ia64/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -DHAVE_WORKING_TEXT_ALIGN -DHAVE_MODEL_SMALL_ATTRIBUTE -DHAVE_SERIALIZE_DIRECTIVE -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -Wno-unused-but-set-variable -fno-PIE -pipe -ffixed-r13 -mfixed-range=f12-f15,f32-f127 -falign-functions=32 -frename-registers -fno-optimize-sibling-calls -fno-delete-null-pointer-checks -O2 -DCC_HAVE_ASM_GOTO -Wframe-larger-than=2048 -fno-stack-protector -fomit-frame-pointer -fno-var-tracking-assignments -g -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-stack-check -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init  -mconstant-gp  -DKBUILD_BASENAME='"shmem"'  -DKBUILD_MODNAME='"shmem"' -c -o mm/.tmp_shmem.o mm/shmem.c
Comment 1 Richard Biener 2018-01-05 12:37:45 UTC
Do GCC 7 or trunk work?
Comment 2 Matthew Wilcox 2018-01-08 14:33:03 UTC
7.2, 6.2, 5.5 and 4.9 fail.  4.6.3 succeeds.
Comment 3 Richard Biener 2018-01-08 15:33:06 UTC
So I just assume trunk fails as well.
Comment 4 Aldy Hernandez 2018-01-11 03:28:23 UTC
I can reproduce this on mainline, however the testcase is suspect.

I see that page_ref_sub() is defined as:

int __ia64_asr_i = ((nr))
...
if (__ia64_asr_i == xxx)
else if (__ia64_asr_i == yyy)
else if (__ia64_asr_i == yyy)
etc
else _tmp = __bad_increment_for_ia64_fetch_and_add();

The thing I see is that NR doesn't seem like an inlineable constant when passed from the caller:

nr = 1UL << compound_order(page)
...
page_ref_sub(page, nr)

because:

unsigned int compound_order(struct page *page)
{
 if (!PageHead(page))
  return 0;
 return page[1].compound_order;
}

And sure enough...after early inlining, both compound_order and page_ref_sub are inlined into shmem_add_to_page_cache and we can see:

  _117 = MEM[(struct page *)page_49(D) + 56B].D.16951.D.16950.compound_order;

There's no way the compiler can know that _117 is a known constant if it's reading the value from memory.

OTOH, the other *_bad* thinggies do get inlined correctly because they depend on sizeof(stuff), whose size can be determined at compile time.

Matthew, could you double check here?  Maybe I missed something, but perhaps a reduced testcase would help analyze better (at least for me :)).
Comment 5 Matthew Wilcox 2018-01-11 04:49:28 UTC
Hi Aldy!

Thanks for looking into this.  Yes, I agree, there's no way that GCC can know this is a constant, but that *should* have been taken care of.  Please pardon me copying and pasting from the original source file rather than the preprocessed source, but I find it utterly impossible to work with the preprocessed source ...

#define atomic_sub_return(i,v)                                          \
({                                                                      \
        int __ia64_asr_i = (i);                                         \
        (__builtin_constant_p(i)                                        \
         && (   (__ia64_asr_i ==   1) || (__ia64_asr_i ==   4)          \
             || (__ia64_asr_i ==   8) || (__ia64_asr_i ==  16)          \
             || (__ia64_asr_i ==  -1) || (__ia64_asr_i ==  -4)          \
             || (__ia64_asr_i ==  -8) || (__ia64_asr_i == -16)))        \
                ? ia64_fetch_and_add(-__ia64_asr_i, &(v)->counter)      \
                : ia64_atomic_sub(__ia64_asr_i, v);                     \
})

That __builtin_constant_p() *should* have led GCC to throw up its hands, not bother checking for the +/- 1, 4, 8, 16 cases and just call ia64_atomic_sub().  Looking at the disassembly, I see a BBB bundle, indicating quite strongly to me that it is testing for all of these cases, and the __builtin_constant_p is being ... ignored?  misunderstood?

Thanks!
Comment 6 rguenther@suse.de 2018-01-11 08:31:31 UTC
On Thu, 11 Jan 2018, matthew at wil dot cx wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83653
> 
> --- Comment #5 from Matthew Wilcox <matthew at wil dot cx> ---
> Hi Aldy!
> 
> Thanks for looking into this.  Yes, I agree, there's no way that GCC can know
> this is a constant, but that *should* have been taken care of.  Please pardon
> me copying and pasting from the original source file rather than the
> preprocessed source, but I find it utterly impossible to work with the
> preprocessed source ...
> 
> #define atomic_sub_return(i,v)                                          \
> ({                                                                      \
>         int __ia64_asr_i = (i);                                         \
>         (__builtin_constant_p(i)                                        \
>          && (   (__ia64_asr_i ==   1) || (__ia64_asr_i ==   4)          \
>              || (__ia64_asr_i ==   8) || (__ia64_asr_i ==  16)          \
>              || (__ia64_asr_i ==  -1) || (__ia64_asr_i ==  -4)          \
>              || (__ia64_asr_i ==  -8) || (__ia64_asr_i == -16)))        \
>                 ? ia64_fetch_and_add(-__ia64_asr_i, &(v)->counter)      \
>                 : ia64_atomic_sub(__ia64_asr_i, v);                     \
> })
> 
> That __builtin_constant_p() *should* have led GCC to throw up its hands, not
> bother checking for the +/- 1, 4, 8, 16 cases and just call ia64_atomic_sub(). 
> Looking at the disassembly, I see a BBB bundle, indicating quite strongly to me
> that it is testing for all of these cases, and the __builtin_constant_p is
> being ... ignored?  misunderstood?

The usual events are that we perform some jump threading across
the __builtin_constant_p before it is resolved, exposing the constant
path somewhere 'i' isn't constant.

The builtin really doesn't guarantee what kernel folks want, because
at one point they want it resolved very late so optimization can
eventually figure out 'i' _is_ constant on the other hand some
optimizations might pull out stuff across it and break expectations.

Note I didn't try to figure what exactly goes wrong but I believe
we had duplicate issues (also in the kernel) that were worked
around in the source by just removing the stupid optimization
and relying on GCC to perform it for constant 'i'.
Comment 7 Matthew Wilcox 2018-01-11 14:33:11 UTC
OK, so how should we write this function/macro to accomplish what we want?

And the requirement is "If the argument is one of these eight special constants, use this special instruction, otherwise call this function".  Even if the argument happens to be one of the eight special values at runtime, we should still call the function, because it's not worth doing runtime checks; we only want a compile-time optimisation.  And it's worth the compile-time optimisation because adding constant 1/-1 is really, really common.
Comment 8 Aldy Hernandez 2018-01-11 19:19:24 UTC
Ah, I see what the problem is.

 unsigned long i, nr = 1UL << compound_order(page);
...
 page_ref_sub(page, nr); // calls __builtin_constant_p(nr) eventually

The problem is that there is a path to __builtin_constant_p(nr) for which NR is 0, and thus a constant.  This is because compound_order() is defined as:

static inline unsigned int compound_order(struct page *page)
{
        if (!PageHead(page))
                return 0;
        return page[1].compound_order;
}

The dom2 threading pass is isolating the path returning 0, and realizing that that particular path is a constant.  Your series of if's does not handle 0, and you get the __bad_increment_for_ia64_fetch_and_add exposed.

Don't blame me, I'm just the messenger :).

Perhaps Richi has a suggestion on how to code your macro.

If, as Richard says, this is a known quirk, perhaps we should document it in the section for __builtin_constant_p() in the manual, along with a suggestion on how to code an alternative.
Comment 9 Matthew Wilcox 2018-01-11 19:26:13 UTC
Maybe I'm a little slow, but I don't see what the path is that sets 'nr' to 0.  It's 1UL << compound_order.  Typically, compound_order is 0, although it may be anything up to log2(number of pages in the machine).  Are you saying that nr could be 0 because DOM2 thinks compound_order() could be larger than 64, and thus undefined?
Comment 10 Jeffrey A. Law 2018-01-11 19:45:18 UTC
So look for something similar to this:

 <bb 29> [local count: 96151367]:
  # i_280 = PHI <0(28), 0(30)>
  if (i_280 < nr_300)
    goto <bb 81>; [92.50%]
  else
    goto <bb 40>; [7.50%]

The subscripts may change, but I think that's what you're looking for.

i_280 will be replaced with zero resulting in

if (0 < nr_300)
  true
else
  false


And because nr is unsigned that will turn into an equality comparison

if (nr_300 != 0)
  true
else
  false

At that point the compiler knows that nr_300 has the value 0 on the false arm of the conditional.


Anyway, this isn't a GCC bug.  I'll note that it may be worth reviewing pr72785  which is essentially the same issue with misunderstanding how builtin_constant_p works.
Comment 11 Matthew Wilcox 2018-01-11 20:32:01 UTC
I'm sorry, I still don't get it.

What I think you're saying is that GCC performs this optimisation:

nr = 1UL << compound_order(page);
atomic_sub_return(x, nr);

into:

if (PageHead(page))
  atomic_sub_return(x, 1);
else
  atomic_sub_return(x, 1UL << page[1].order)

That seems like a great optimisation to me, and I'm all for it.  What I don't understand is where the b_c_p call gets lost.

Also, this bug is pretty fragile.  If I replace the 'nr' in the call to atomic_sub_return with '1UL << compound_order(page)', the bug goes away.

Anyway, I have no vested interest in ia64 or the code that's currently using __b_c_p.  I just want it to stop blocking me getting my patch in.  It's a bit different from 72785 because I can't just resolve it by removing the checking code.  Just tell me what the macro should look like.
Comment 12 Aldy Hernandez 2018-01-11 20:39:56 UTC
(In reply to Matthew Wilcox from comment #9)
> Maybe I'm a little slow, but I don't see what the path is that sets 'nr' to
> 0.  It's 1UL << compound_order.  Typically, compound_order is 0, although it
> may be anything up to log2(number of pages in the machine).  Are you saying

Sorry, yes 1<<0 will be 1, which is handled.  Let me dig into the threading.
Comment 13 Jakub Jelinek 2018-01-11 21:14:28 UTC
What about this approach, force __builtin_constant_p to evaluate in the FE before optimizations and decide just on that.  Will work with constant literals passed to the macro, will do the fallback if you say use it in inline and hope that if the inline is called with a constant it will propagate:

struct V { int counter; };
int ia64_fetch_and_add(int, int *);
int ia64_atomic_sub(int, struct V *);

#ifdef __OPTIMIZE__
#define atomic_sub_return_1(i,v,c)                                      \
({                                                                      \
        int __ia64_asr_i = (i);                                         \
        static const int __ia64_asr_p_##c				\
          = __builtin_constant_p(i)                                     \
            ? ((i) == 1 || (i) == 4 || (i) == 8 || (i) == 16       	\
               || (i) == -1 || (i) == -4 || (i) == -8 || (i) == -16)	\
            : 0;							\
        __ia64_asr_p_##c						\
          ? ia64_fetch_and_add(-__ia64_asr_i, &(v)->counter)      	\
          : ia64_atomic_sub(__ia64_asr_i, v);                     	\
})
#define atomic_sub_return_2(i,v,c) atomic_sub_return_1(i,v,c)
#define atomic_sub_return(i,v) atomic_sub_return_2(i,v,__COUNTER__)
#else
#define atomic_sub_return(i,v) ia64_atomic_sub(i,v)
#endif

void
foo (struct V *p, int i)
{
  atomic_sub_return (4, p);
  atomic_sub_return (8, p);
  atomic_sub_return (16, p);
  atomic_sub_return (7, p);
  atomic_sub_return (i, p);
}
Comment 14 Matthew Wilcox 2018-01-11 21:55:19 UTC
Confirmed this fixes the problem.  I'll send it to Tony and see if he likes it.  May I add your Signed-off-by to the patch?
Comment 15 Jakub Jelinek 2018-01-11 22:03:40 UTC
(In reply to Matthew Wilcox from comment #14)
> Confirmed this fixes the problem.  I'll send it to Tony and see if he likes
> it.  May I add your Signed-off-by to the patch?

Sure.  Feel free to reformat it as the kernel wants, it has been almost 20 years since I've been active in the kernel, so don't know the formatting rules anymore.
Comment 16 Jakub Jelinek 2018-01-15 15:13:56 UTC
Actually, there is no need to use the __COUNTER__ stuff, just use
static const int __ia64_asr_p = ...;
in the scope.  It is just important __OPTIMIZE__ is on, because at -O0 the compiler wouldn't optimize those static vars away.