Bug 81763 - Issues with BMI on 32bit x86 apps on GCC 7.1+
Summary: Issues with BMI on 32bit x86 apps on GCC 7.1+
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 7.1.0
: P3 normal
Target Milestone: 7.4
Assignee: Not yet assigned to anyone
URL:
Keywords: wrong-code
: 84377 (view as bug list)
Depends on: 53399
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-08 01:15 UTC by Mike Lothian
Modified: 2023-06-07 01:43 UTC (History)
7 users (show)

See Also:
Host:
Target: x86_64-pc-linux-gnu
Build:
Known to work:
Known to fail:
Last reconfirmed: 2017-08-08 00:00:00


Attachments
si_shader objdumps (169.24 KB, application/x-xz)
2017-08-09 17:00 UTC, Mike Lothian
Details
llvm-tblgen Working and Broken binaries (891.50 KB, application/x-xz)
2018-01-05 11:16 UTC, Mike Lothian
Details
preprocessed source (256.42 KB, application/x-bzip)
2018-01-25 17:41 UTC, Manuel Lauss
Details
compressed preprocessed source (329.74 KB, application/x-bzip)
2018-01-26 12:37 UTC, Manuel Lauss
Details
reducest testcase (1.18 KB, text/plain)
2018-01-26 15:52 UTC, Manuel Lauss
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Lothian 2017-08-08 01:15:51 UTC
I've been seeing issues with 32bit x86 llvm & mesa when using GCC 7.1.0 and I've just tested GCC 7.2 RC

When llvm & mesa are compiled with GCC 7.1.0+ and I run a 32bit glxgears it segfaults, the 64bit version work fine and if I recompile with GCC 6.3.0 everything works fine too (games also segfault but glxgears is a nice cut down example)

This is only with the radeonsi driver, i965 works fine

https://bugs.freedesktop.org/show_bug.cgi?id=101881

It would be nice to rule out a GCC issue before 7.2.0 is released, the last time I tried turning on debugging the issue no longer manifested.

Is there anything you can think of that might help pinpoint the bug?
Comment 1 Andrew Pinski 2017-08-08 01:19:02 UTC
There are some known issues with older versions of LLVM, maybe you are using too older version of LLVM.  That is some versions of LLVM have undefined C++ code in them.  GCC 7.1 is more aggressive of optimizing things.
Comment 2 Mike Lothian 2017-08-08 01:24:55 UTC
Sorry I should have been more clear, this is LLVM trunk

I'm using these flags: 

CFLAGS="-O2 -march=native -pipe"
CXXFLAGS="-O2 -march=native -pipe"
LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed"

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/g++-v7 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 7.1.0-r1 p1.1' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --enable-libmpx --enable-vtable-verify --enable-libvtv --enable-lto --with-isl --disable-isl-version-check --enable-libsanitizer --disable-default-pie --disable-default-ssp
Thread model: posix
gcc version 7.1.0 (Gentoo 7.1.0-r1 p1.1)
Comment 3 Mike Lothian 2017-08-08 15:10:37 UTC
So dropping the -march=native allows everything to work again no matter which version of GCC

Just using -mbmi breaks things and using -march=native -mno-bmi allows it all to work

This is on a Skylake processor with the following in /proc/cpuinfo

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
Comment 4 Andrew Pinski 2017-08-08 18:55:19 UTC
This might be PR 53399.
Comment 5 H.J. Lu 2017-08-08 19:13:02 UTC
Please extract a small testcase.
Comment 6 Mike Lothian 2017-08-09 00:40:34 UTC
I tried the test case with

gcc -O2 -march=native test.c -o test

and 

gcc -O2 -march=native -mno-bmi test.c -o test

Both executables seem to run with no output

I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's happening as the problem doesn't manifest when debugging is enabled so I'm not sure how to create a smaller test case
Comment 7 H.J. Lu 2017-08-09 02:03:33 UTC
(In reply to Mike Lothian from comment #6)
> I tried the test case with
> 
> gcc -O2 -march=native test.c -o test
> 
> and 
> 
> gcc -O2 -march=native -mno-bmi test.c -o test
> 
> Both executables seem to run with no output
> 
> I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's
> happening as the problem doesn't manifest when debugging is enabled so I'm
> not sure how to create a smaller test case

Compile radeonsi one function at a time with and without -mbmi to
find out the smallest function which causes the problem with -mbmi.
If the function is small enough, we can take a guess.
Comment 8 Alexander Monakov 2017-08-09 14:20:16 UTC
Mike, you can start by preparing two build trees, one with faulty mesa compiled with -march=native, the other with working-fine mesa compiled with -march=native -mno-bmi. You should be able to collect:

- .o files from each tree, and
- link commands from build logs

and be able to re-link mesa libraries by hand and verify they still exhibit the same behavior (one fails, the other doesn't).

From there you can proceed by building "hybrid" libraries by taking a set of good .o files and a complementary set of bad .o files. This will allow you to find a single .o file that makes the library behave wrongly. More explanation and a helper script for bisecting is available at https://gcc.gnu.org/wiki/Analysing_Large_Testcases

At that point please share your status (once you're down to one file there's no generic recipe, you'd have to get creative).
Comment 9 Alexander Monakov 2017-08-09 14:45:48 UTC
A (potentially simpler) alternative is to use sequential builds (make without -j) and bisect by index of compiled source file, i.e. have a wrapper script around gcc that uses some global counter to pass -mno-bmi to first N compiler invocations.
Comment 10 Mike Lothian 2017-08-09 14:54:49 UTC
Unfortunately it also depends on LLVM not just Mesa which makes it a much bigger target for figuring this out
Comment 11 Mike Lothian 2017-08-09 16:58:58 UTC
So a lot of the segfaults I see are in si_shader so I thought I'd compile Mesa with and without BMI and compare the onjdumps of the two .o files

CFLAGS="-O2 -march=native -pipe -mno-bmi -m32" CXXFLAGS="-O2 -march=native -pipe -mno-bmi -m32" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -fuse-ld=bfd -m32" ./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/mesa-9999 --htmldir=/usr/share/doc/mesa-9999/html --libdir=/usr/lib32 --enable-dri --enable-glx --enable-shared-glapi --enable-texture-float --enable-nine --disable-debug --enable-dri3 --enable-egl --enable-gbm --enable-gles1 --enable-gles2 --enable-glx-tls --disable-libunwind --enable-valgrind=no --enable-llvm-shared-libs --with-dri-drivers=,swrast,i965 --with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm --enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau --disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers --disable-glx-read-only-text --disable-gallium-osmesa


CFLAGS="-O2 -march=native -pipe -m32" CXXFLAGS="-O2 -march=native -pipe -m32" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -fuse-ld=bfd -m32" ./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/mesa-9999 --htmldir=/usr/share/doc/mesa-9999/html --libdir=/usr/lib32 --enable-dri --enable-glx --enable-shared-glapi --enable-texture-float --enable-nine --disable-debug --enable-dri3 --enable-egl --enable-gbm --enable-gles1 --enable-gles2 --enable-glx-tls --disable-libunwind --enable-valgrind=no --enable-llvm-shared-libs --with-dri-drivers=,swrast,i965 --with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm --enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau --disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers --disable-glx-read-only-text --disable-gallium-osmesa

Were my configure options, and LLVM was compiled with -mno-bmi 

I'm attaching the dumps
Comment 12 Mike Lothian 2017-08-09 17:00:30 UTC
Created attachment 41960 [details]
si_shader objdumps
Comment 13 H.J. Lu 2017-08-09 18:48:40 UTC
(In reply to Mike Lothian from comment #12)
> Created attachment 41960 [details]
> si_shader objdumps

We need a small testcase in C.
Comment 14 Mike Lothian 2018-01-05 11:03:58 UTC
I've been playing around with GCC 7.2.0 again

When compiling a 32bit LLVM with -O2 -march=native, llvm-tblgen runs indefinitely during the build so it never completes

Doing the same with -O2 -march=native -mno-bmi or using GCC 6.4.0 allows it to build 

From what I can see llvm-tblgen is a C++ program
Comment 15 Mike Lothian 2018-01-05 11:16:22 UTC
Created attachment 43041 [details]
llvm-tblgen Working and Broken binaries
Comment 16 Manuel Lauss 2018-01-25 16:00:51 UTC
I've seen this as well.
Build llvm-7 with gcc-7.3, -m32 -march=znver1 -mtune=broadwell
-O1 / -Og: all good
-O2: llvm-tblgen runs in an infinite(?) loop
-O3: llvm-tblgen segfaults

-mno-bmi: all good, at all optimization levels

In case it helps, here's gdb info:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x080a96c9 in llvm::MachineValueTypeSet::const_iterator::find_from_pos (P=<optimized out>, this=<synthetic pointer>)
    at /tmp-ram/x1/portage/sys-devel/llvm-9999/work/llvm-9999/utils/TableGen/CodeGenDAGPatterns.h:148
148             W &= maskLeadingOnes<WordType>(WordWidth-SkipBits);

(gdb) list
143
144           // If P is in the middle of a word, process it manually here, because
145           // the trailing bits need to be masked off to use findFirstSet.
146           if (SkipBits != 0) {
147             WordType W = Set->Words[SkipWords];
148             W &= maskLeadingOnes<WordType>(WordWidth-SkipBits);
149             if (W != 0)
150               return Count + findFirstSet(W);
151             Count += WordWidth;
152             SkipWords++;

[...]
   0x080a968d <+1293>:  shr    $0x6,%ebx
   0x080a9690 <+1296>:  and    $0xffffffc0,%esi
   0x080a9693 <+1299>:  and    $0x3f,%eax
   0x080a9696 <+1302>:  je     0x80a96e0 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+1376>
   0x080a9698 <+1304>:  mov    $0x40,%ecx
   0x080a969d <+1309>:  mov    $0xffffffff,%edx
   0x080a96a2 <+1314>:  xor    %edi,%edi
   0x080a96a4 <+1316>:  sub    %eax,%ecx
   0x080a96a6 <+1318>:  mov    $0xffffffff,%eax
   0x080a96ab <+1323>:  shrd   %cl,%edx,%eax
   0x080a96ae <+1326>:  test   $0x20,%cl
   0x080a96b1 <+1329>:  shrx   %ecx,%edx,%edx
   0x080a96b6 <+1334>:  cmovne %edx,%eax
   0x080a96b9 <+1337>:  cmovne %edi,%edx
   0x080a96bc <+1340>:  mov    %edx,%ecx
   0x080a96be <+1342>:  mov    %eax,%edx
   0x080a96c0 <+1344>:  mov    -0x3c(%ebp),%eax
   0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
=> 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx
   0x080a96d0 <+1360>:  mov    %edx,%edi
   0x080a96d2 <+1362>:  or     %eax,%edi
   0x080a96d4 <+1364>:  jne    0x80a9e70 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3312>
   0x080a96da <+1370>:  add    $0x40,%esi
   0x080a96dd <+1373>:  add    $0x1,%ebx
   0x080a96e0 <+1376>:  cmp    $0x4,%ebx
   0x080a96e3 <+1379>:  je     0x80a9e60 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3296>
   0x080a96e9 <+1385>:  mov    -0x3c(%ebp),%eax
   0x080a96ec <+1388>:  mov    0x4(%eax,%ebx,8),%edx
   0x080a96f0 <+1392>:  mov    (%eax,%ebx,8),%eax
   0x080a96f3 <+1395>:  mov    %edx,%ecx
   0x080a96f5 <+1397>:  or     %eax,%ecx
   0x080a96f7 <+1399>:  jne    0x80a9e42 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3266>
   0x080a96fd <+1405>:  lea    0x1(%ebx),%eax
[...]


(gdb) info registers
eax            0x18000040       402653248
ecx            0x0      0
edx            0x3f     63
ebx            0x0      0
esp            0xff8f9660       0xff8f9660
ebp            0xff8f96c8       0xff8f96c8
esi            0x0      0
edi            0x0      0
eip            0x80a96c9        0x80a96c9 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+1353>
eflags         0x10202  [ IF RF ]
cs             0x23     35
ss             0x2b     43
ds             0x2b     43
es             0x2b     43
fs             0x0      0
gs             0x63     99
Comment 17 Uroš Bizjak 2018-01-25 16:57:54 UTC
(In reply to Manuel Lauss from comment #16)
>    0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
> => 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx

This looks like double-word andn is clobbering %eax too early.
Comment 18 Jakub Jelinek 2018-01-25 17:21:43 UTC
(In reply to Uroš Bizjak from comment #17)
> (In reply to Manuel Lauss from comment #16)
> >    0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
> > => 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx
> 
> This looks like double-word andn is clobbering %eax too early.

Yeah.  In that case, can you please attach the preprocessed source of whatever source contains that and g++ command line options used to compile that?
Thanks.
Comment 19 Jakub Jelinek 2018-01-25 17:34:03 UTC
Haven't managed to reproduce it e.g. with
long long
foo (long long *p, int q, unsigned r1, unsigned r2)
{
  int t, u;
  asm ("" : "+a" (p), "+b" (q), "+d" (r1), "+c" (r2), "=S" (t), "=D" (u));
  unsigned long long r = ((unsigned long long) r2 << 32) | r1;
  long long a = p[q] & ~r;
  asm volatile ("" : "+A" (a) : "S" (t), "D" (u));
  return a;
}
Comment 20 Manuel Lauss 2018-01-25 17:41:37 UTC
Created attachment 43242 [details]
preprocessed source

preprocessed source of file that contains the function "llvm::TypeInfer::EnforceSmallerThan", compressed due it being 2.5MB.

The Makefile builds it with these parameters: (stripped the includes)
g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections
Comment 21 Uroš Bizjak 2018-01-25 17:42:37 UTC
Following patch should fix the problem:

--cut here--
Index: i386.md
===================================================================
--- i386.md     (revision 256935)
+++ i386.md     (working copy)
@@ -9250,10 +9250,10 @@
 })
 
 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,&r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
        (and:DI
          (not:DI (match_operand:DI 1 "register_operand" "r,0"))
-         (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+         (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
    (clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
--cut here--

So, the pattern now reads:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,r")
	(and:DI
	  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
	  (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,*")])

Instead of using earlyclobber on the output operand (which guarantees that no input register will be clobbered), we can use a little trick and in case of memory operand 2, match the output with a register operand 1. This will keep output registers separate from registers in the memory operand (and is in fact what we do in all other _doubleword patterns).

Jakub, I don't have Haswell target to test BMI instructions via native bootstrap, can you perhaps bootstrap the compiler with the above patch?
Comment 22 Jakub Jelinek 2018-01-25 18:01:04 UTC
Will do, for now I'm including it with my normal options bootstraps (testing other patches and need the same baseline), then will try some --with-arch/--with-tune).
Wonder though if it wouldn't give the RA more choices by also including another
&r <- (r, m) alternative with bmi2 isa attribute.
Comment 23 Uroš Bizjak 2018-01-25 18:09:27 UTC
The above patch builds on the promise, that with (=r,r,r) alternative, the register allocator won't allocate (=r1,=r2) = ~(r0,r1) & (r2,r3). This would again clobber the r1 too early:

r1 = ~r0 & r2
r2 = ~r1 & r3

The safest choice of the pattern would be:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,&r")
	(and:DI
	  (not:DI (match_operand:DI 1 "register_operand" "0,0"))
	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,*")])

(The earlyclobber of non-BMI case is needed due to separate not insn).
Comment 24 Uroš Bizjak 2018-01-25 18:15:30 UTC
(In reply to Jakub Jelinek from comment #22)
> Wonder though if it wouldn't give the RA more choices by also including
> another
> &r <- (r, m) alternative with bmi2 isa attribute.

This would be worse than r <- (0, m) alternative on register starved x86_32 architecture. The above approach can use up to 6 registers, while r <- (0, m) uses up to 4.
Comment 25 Jakub Jelinek 2018-01-25 18:19:56 UTC
I believe for double-word pseudos the RA will not do that, CCing Vlad about it.
Anyway, by having all of r <- (r, r), r <- (0, rm) and &r <- (r, m) alternatives I'd think the RA has more choices than when it just has the first 2.
If it sees it as beneficial to have the middle operand in the destination, it can due to the second alternative even if third one is a memory, if it wants some other, it can, just needs to make sure the destination doesn't overlap with mem's address.
Comment 26 Mike Lothian 2018-01-25 18:23:11 UTC
Is this the patch you want us to test then:

diff -ur a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
--- a/gcc/config/i386/i386.md       2018-01-16 11:17:49.509247000 +0000
+++ b/gcc/config/i386/i386.md   2018-01-25 18:21:25.562225621 +0000
@@ -8586,7 +8586,7 @@
 (define_insn "*andndi3_doubleword"
   [(set (match_operand:DI 0 "register_operand" "=r,&r")
        (and:DI
-         (not:DI (match_operand:DI 1 "register_operand" "r,0"))
+         (not:DI (match_operand:DI 1 "register_operand" "0,0"))
          (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
    (clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
Comment 27 Manuel Lauss 2018-01-25 18:24:56 UTC
(In reply to Uroš Bizjak from comment #21)
> Following patch should fix the problem:
> 
> --cut here--
> Index: i386.md
> ===================================================================
> --- i386.md     (revision 256935)
> +++ i386.md     (working copy)
> @@ -9250,10 +9250,10 @@
>  })
>  
>  (define_insn "*andndi3_doubleword"
> -  [(set (match_operand:DI 0 "register_operand" "=r,&r")
> +  [(set (match_operand:DI 0 "register_operand" "=r,r")
>         (and:DI
>           (not:DI (match_operand:DI 1 "register_operand" "r,0"))
> -         (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
> +         (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
>     (clobber (reg:CC FLAGS_REG))]
>    "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>    "#"
> --cut here--
> 
> So, the pattern now reads:
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,r")
> 	(and:DI
> 	  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
> 	  (match_operand:DI 2 "nonimmediate_operand" "r,rm")))
>    (clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "bmi,*")])


I rebuilt gcc-7.3 with this applied, now the generated code looks much better:
  ab:   0f ad d0                shrd   %cl,%edx,%eax
  ae:   f6 c1 20                test   $0x20,%cl
  b1:   c4 e2 73 f7 d2          shrx   %ecx,%edx,%edx
  b6:   0f 45 c2                cmovne %edx,%eax
  b9:   0f 45 d5                cmovne %ebp,%edx
  bc:   c4 e2 68 f2 54 df 04    andn   0x4(%edi,%ebx,8),%edx,%edx
  c3:   89 d1                   mov    %edx,%ecx
  c5:   c4 e2 78 f2 04 df       andn   (%edi,%ebx,8),%eax,%eax
  cb:   09 c1                   or     %eax,%ecx

And it seems to work as well, 32bit llvm now built successfully.
Comment 28 Uroš Bizjak 2018-01-25 18:27:55 UTC
(In reply to Uroš Bizjak from comment #23)
> (The earlyclobber of non-BMI case is needed due to separate not insn).

It is not needed. I have added earlyclobber in r243202 without much thought.
Comment 29 Uroš Bizjak 2018-01-25 18:37:47 UTC
(In reply to Jakub Jelinek from comment #25)
> I believe for double-word pseudos the RA will not do that, CCing Vlad about
> it.

I start to worry about it due to allocated:

   0x080a96c3 <+1347>:  andn   (%eax,%ebx,8),%edx,%eax
=> 0x080a96c9 <+1353>:  andn   0x4(%eax,%ebx,8),%ecx,%edx

In addition to memory input operand, (dx,cx) regpair is used and (ax,dx) is the output operand. These regpairs do in fact interleave!

> Anyway, by having all of r <- (r, r), r <- (0, rm) and &r <- (r, m)
> alternatives I'd think the RA has more choices than when it just has the
> first 2.
> If it sees it as beneficial to have the middle operand in the destination,
> it can due to the second alternative even if third one is a memory, if it
> wants some other, it can, just needs to make sure the destination doesn't
> overlap with mem's address.

Taking all those facts into account, I think we can allow two alternatives:

r <- (0, rm)

and additionally

&r <- (r, m) for BMI
Comment 30 Uroš Bizjak 2018-01-25 18:45:37 UTC
So, I'll bootstrap:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,&r")
	(and:DI
	  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
	  (match_operand:DI 2 "nonimmediate_operand" "rm,m")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "*,bmi")])
Comment 31 Uroš Bizjak 2018-01-25 18:57:35 UTC
(In reply to Uroš Bizjak from comment #30)
> So, I'll bootstrap:

Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23):

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=r,&r")
	(and:DI
	  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "*,bmi")])

Manuel, can you please test this pattern?
Comment 32 Manuel Lauss 2018-01-25 19:29:04 UTC
(In reply to Uroš Bizjak from comment #31)
> (In reply to Uroš Bizjak from comment #30)
> > So, I'll bootstrap:
> 
> Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23):
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,&r")
> 	(and:DI
> 	  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
> 	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
>    (clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "*,bmi")])
> 
> Manuel, can you please test this pattern?

Seems to work as well:

[...]
  b1:   c4 e2 73 f7 d2          shrx   %ecx,%edx,%edx
  b6:   0f 45 c2                cmovne %edx,%eax
  b9:   0f 45 d5                cmovne %ebp,%edx
  bc:   c4 e2 68 f2 54 df 04    andn   0x4(%edi,%ebx,8),%edx,%edx
  c3:   89 d1                   mov    %edx,%ecx
  c5:   c4 e2 78 f2 04 df       andn   (%edi,%ebx,8),%eax,%eax
  cb:   09 c1                   or     %eax,%ecx
[...]
Comment 33 Jakub Jelinek 2018-01-25 19:42:33 UTC
(In reply to Uroš Bizjak from comment #31)
> (In reply to Uroš Bizjak from comment #30)
> > So, I'll bootstrap:
> 
> Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23):
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=r,&r")
> 	(and:DI
> 	  (not:DI (match_operand:DI 1 "register_operand" "0,r"))
> 	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
>    (clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "*,bmi")])
> 
> Manuel, can you please test this pattern?

At least with a smarter splitter we don't really need to avoid no overlap at all for the r <- (r, r) bmi case, we can choose which of the two 32-bit andn's we do first depending on the overlap, all we need to guarantee is that the splitter is not impossible and ideally doesn't need any instructions but the two.
Hard registers for DImode must be consecutive because we identify them by the (lowest) register number and mode and for r <- (r, r) there can't be any overlap between the two input operands.  So, even if DImode registers can start at any GPR number other than the last, not just even ones, either there is no overlap at all in between output and inputs, or the output is the same as the first, or as the second input (all these cases are fine), or there is a partial overlap with one or both of the operands.
For the partial operand I can think of DI:N = DI:N+1 &~ DI:unrelated, or
DI:N+1 = DI:N &~ DI:unrelated, or DI:N+1 = DI:N &~ DI:N+2 (or swapped operands),
the last case is partial overlap with both inputs.  So, right now we'd split those into:
  SI:N = SI:N+1 &~ SI:unrelated; SI:N+1 = SI:N+2 &~ SI:unrelated+1
which is fine, or
  SI:N+1 = SI:N &~ SI:unrelated; SI:N+2 = SI:N+1 &~ SI:unrelated+1
which is wrong but we can swap those:
  SI:N+2 = SI:N+1 &~ SI:unrelated+1; SI:N+1 = SI:N &~ SI:unrelated
and it should work.  The last case would be right now:
  SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
and is again wrong, but we could again swap:
  SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
and all is fine.
Comment 34 Uroš Bizjak 2018-01-25 19:58:56 UTC
(In reply to Jakub Jelinek from comment #33)

> At least with a smarter splitter we don't really need to avoid no overlap at
> all for the r <- (r, r) bmi case, we can choose which of the two 32-bit
> andn's we do first depending on the overlap, all we need to guarantee is
> that the splitter is not impossible and ideally doesn't need any
> instructions but the two.

This would work nicely. I have seen a couple of places with the attached source (using only -mbmi -m32) where RA satisfied & or 0 constraint only with spills and would benefit from relaxed constraints.

BTW: AFAICS, "andn" is the only double-word three-operand instruction that is split after reload. So, andn r <- (r,r) post-reload splitter is a precedent in x86 world.
Comment 35 Jakub Jelinek 2018-01-25 19:59:55 UTC
So, what about following?
--- gcc/config/i386/i386.md.jj	2018-01-16 09:28:19.721432394 +0100
+++ gcc/config/i386/i386.md	2018-01-25 20:58:18.382378827 +0100
@@ -9250,14 +9250,14 @@ (define_split
 })
 
 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,&r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,&r")
 	(and:DI
-	  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
-	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+	  (not:DI (match_operand:DI 1 "register_operand" "0,r,r"))
+	  (match_operand:DI 2 "nonimmediate_operand" "rm,r,m")))
    (clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
-  [(set_attr "isa" "bmi,*")])
+  [(set_attr "isa" "*,bmi2,bmi2")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -9273,7 +9273,22 @@ (define_split
    (parallel [(set (match_dup 3)
 		   (and:SI (not:SI (match_dup 4)) (match_dup 5)))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
+{
+  split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
+  /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could
+     be overlap between the output and input registers.  If the output
+     is equal to one of the input operands, this is fine, if there is
+     partial overlap, we can resolve it by swapping the two instructions.  */
+  if (reg_overlap_mentioned_p (operands[0], operands[4])
+      || reg_overlap_mentioned_p (operands[0], operands[5]))
+    {
+      std::swap (operands[0], operands[3]);
+      std::swap (operands[1], operands[4]);
+      std::swap (operands[2], operands[5]);
+      gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4])
+		  && !reg_overlap_mentioned_p (operands[0], operands[5]));
+    }
+})
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
Comment 36 Jakub Jelinek 2018-01-25 20:03:32 UTC
Ah, bmi, not bmi2.  And the =r <- (r, r) alternative might be best first.
--- gcc/config/i386/i386.md.jj	2018-01-16 09:28:19.721432394 +0100
+++ gcc/config/i386/i386.md	2018-01-25 20:58:18.382378827 +0100
@@ -9250,14 +9250,14 @@ (define_split
 })
 
 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,&r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r,&r")
 	(and:DI
-	  (not:DI (match_operand:DI 1 "register_operand" "r,0"))
-	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+	  (not:DI (match_operand:DI 1 "register_operand" "r,0,r"))
+	  (match_operand:DI 2 "nonimmediate_operand" "r,rm,m")))
    (clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
-  [(set_attr "isa" "bmi,*")])
+  [(set_attr "isa" "bmi,*,bmi")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -9273,7 +9273,22 @@ (define_split
    (parallel [(set (match_dup 3)
 		   (and:SI (not:SI (match_dup 4)) (match_dup 5)))
 	      (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);")
+{
+  split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);
+  /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could
+     be overlap between the output and input registers.  If the output
+     is equal to one of the input operands, this is fine, if there is
+     partial overlap, we can resolve it by swapping the two instructions.  */
+  if (reg_overlap_mentioned_p (operands[0], operands[4])
+      || reg_overlap_mentioned_p (operands[0], operands[5]))
+    {
+      std::swap (operands[0], operands[3]);
+      std::swap (operands[1], operands[4]);
+      std::swap (operands[2], operands[5]);
+      gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4])
+		  && !reg_overlap_mentioned_p (operands[0], operands[5]));
+    }
+})
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
Comment 37 Uroš Bizjak 2018-01-25 20:10:19 UTC
(In reply to Jakub Jelinek from comment #33)

> and it should work.  The last case would be right now:
>   SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
> and is again wrong, but we could again swap:
>   SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
> and all is fine.

Whoops, it looks that SI:N+2 is clobbered in the swapped case.
Comment 38 Manuel Lauss 2018-01-25 20:59:27 UTC
(In reply to Jakub Jelinek from comment #36)

Your patch does fix my llvm issue. Thank you!
Comment 39 Mike Lothian 2018-01-26 02:24:02 UTC
I can confirm it fixes things for me too. 

Is that the final patch in Comment 36? If so I'll try and get the Gentoo devs to include it in the GCC ebuilds

Will this be added to GCC 8.1 and 7.4?

Thanks again, this bug has been plaguing me for the last 8 months
Comment 40 Jakub Jelinek 2018-01-26 09:00:45 UTC
(In reply to Uroš Bizjak from comment #37)
> (In reply to Jakub Jelinek from comment #33)
> 
> > and it should work.  The last case would be right now:
> >   SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3;
> > and is again wrong, but we could again swap:
> >   SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2;
> > and all is fine.
> 
> Whoops, it looks that SI:N+2 is clobbered in the swapped case.

You're right.  So the question is if IRA/LRA can ever allow that case where
there is partial overlap with both registers.  I've tried hard to simulate that case with:
unsigned long long
foo (unsigned long long x, unsigned long long y)
{
  unsigned long long z;
  asm ("" : "+A" (x), "+Q" (y));
  z = x & ~y;
  asm ("" : "+Q" (z) : "a" (0), "b" (0));
  return z;
}
where IRA indeed allocates the used pseudos such that x is in ax:dx, y in cx:bx and z in dx:cx.  Now, if I try this and testcase with ~x & y instead of x & ~y with GCC patched with #c36, I get:
        andn    %eax, %ecx, %ecx
        xorl    %eax, %eax
        andn    %edx, %ebx, %ebx
        movl    %ecx, %edx
        movl    %ebx, %ecx
        movl    %eax, %ebx
resp.
        andn    %ecx, %eax, %ecx
        xorl    %eax, %eax
        andn    %ebx, %edx, %ebx
        movl    %ecx, %edx
        movl    %ebx, %ecx
        movl    %eax, %ebx
between the two inline asms, and if I leave just the =r <- (r, r) alternative and nothing else, LRA ICEs on it (on both variants).  All is with -O2 -m32 -mbmi -mstv -msse2.  So, is there something in LRA that prevents these partial overlaps?
Comment 41 Uroš Bizjak 2018-01-26 09:27:42 UTC
Let's go forward with this pattern:

(define_insn "*andndi3_doubleword"
  [(set (match_operand:DI 0 "register_operand" "=&r,r,r,&r")
	(and:DI
	  (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0"))
	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm")))
   (clobber (reg:CC FLAGS_REG))]
  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
  "#"
  [(set_attr "isa" "bmi,bmi,bmi,*")])

(=&r,r,rm) alternative avoids matching of output to all other overlapped and partial overlapped operands.

(=r,0,rm) alternative allows matching with op1, so we are sure output won't *partially* overlap with op2. This is true for register and memory operands. It can still fully overlap with register op2 in case the same reg is allocated for op1, op2 and op3, which is OK for BMI.

(=r,r,0) alternative will prevent *partial* overlap of output with op1 in a similar way.

(=&r,0,rm) is non-bmi alternative. Earlyclobber is needed, otherwise RA can match op2 with op0 and op1, so the same reg is allocated for op0, op1 and op2. If this is the case, after the split, NOT/AND sequence won't match ANDN instruction, since NOT will change both input operands to a follow-up AND insn.
Comment 42 Mike Lothian 2018-01-26 11:07:17 UTC
With the patch in Comment 36 I get the following error compiling Clang

FAILED: lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o
/usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Lex -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -MF lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o.d -o lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp
/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp: In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&, DefinedTracker&, bool, clang::Preprocessor&)’:
/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp:492:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623
 }
 ^
Comment 43 Jakub Jelinek 2018-01-26 12:11:26 UTC
(In reply to Uroš Bizjak from comment #41)
> Let's go forward with this pattern:
> 
> (define_insn "*andndi3_doubleword"
>   [(set (match_operand:DI 0 "register_operand" "=&r,r,r,&r")
> 	(and:DI
> 	  (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0"))
> 	  (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm")))
>    (clobber (reg:CC FLAGS_REG))]
>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>   "#"
>   [(set_attr "isa" "bmi,bmi,bmi,*")])

This looks reasonable to me.

(In reply to Mike Lothian from comment #42)
> With the patch in Comment 36 I get the following error compiling Clang

Just out of interest, could you attach the preprocessed source here, I'd like to see the case when the RA assigns such a partial overlap.  As you're using -march=native, will need g++ output with -v added to the command too (what flags it expands to).
Comment 44 Manuel Lauss 2018-01-26 12:37:44 UTC
Created attachment 43252 [details]
compressed preprocessed source

g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -o test.o -c PPExpressions.i

/tmp-ram/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp: In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&, DefinedTracker&, bool, clang::Preprocessor&)’:
/tmp-ram/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp:492:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
Comment 45 uros 2018-01-26 15:37:03 UTC
Author: uros
Date: Fri Jan 26 15:36:32 2018
New Revision: 257096

URL: https://gcc.gnu.org/viewcvs?rev=257096&root=gcc&view=rev
Log:
	PR target/81763
	* config/i386/i386.md (*andndi3_doubleword): Add earlyclobber
	to (=&r,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
Comment 46 Manuel Lauss 2018-01-26 15:52:29 UTC
Created attachment 43257 [details]
reducest testcase

reduced testcase for Jakub's patch in comment #36 and the build failure it causes in comment #42:

g++ (Gentoo 7.2.0-r1 p1.1) 7.2.0

g++ -m32 -O2 -mbmi -fPIC -fno-strict-aliasing -c testcase.i
testcase.i:126:12: warning: ‘unsigned int {anonymous}::bc::bn()’ used but never defined
   unsigned bn();
            ^~
testcase.i:98:8: warning: ‘void {anonymous}::bc::bo({anonymous}::bc&)’ used but never defined
   void bo(bc &);
        ^~
testcase.i:154:3: warning: ‘{anonymous}::bw::bw(unsigned int)’ used but never defined
   bw(unsigned);
   ^~
testcase.i: In function ‘bool bz({anonymous}::bw&, ai::as, by&, bool, ai::av&)’:
testcase.i:178:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623
 }
 ^
Comment 47 Mike Lothian 2018-01-26 18:48:26 UTC
With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm now seeing the following error building Clang:

[294/954] /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp
FAILED: lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o
/usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include  -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp
In file included from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/APValue.h:20:0,
                 from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:17,
                 from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/DeclObjC.h:17,
                 from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/Serialization/ASTReader.h:17,
                 from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp:14:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘struct llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >’:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:56:57:   required from ‘PointerTy llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::getPointer() const [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’
/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2859:39:   required from here
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:132:3: error: static assertion failed: PointerIntPair with integer size too large for pointer
   static_assert(IntBits <= PtrTraits::NumLowBitsAvailable,
   ^~~~~~~~~~~~~
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: left shift count >= width of type [-Wshift-count-overflow]
     ShiftedIntMask = (uintptr_t)(IntMask << IntShift)
                                 ~~~~~~~~~^~~~~~~~~~~~
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: right operand of shift expression ‘(3 << 4294967295)’ is >= than the precision of the left operand [-fpermissive]
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘static intptr_t llvm::PointerIntPairInfo<PointerT, IntBits, PtrTraits>::getInt(intptr_t) [with PointerT = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; intptr_t = int]’:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:58:56:   required from ‘IntType llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::getInt() const [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’
/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2897:32:   required from here
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:156:19: warning: right shift count >= width of type [-Wshift-count-overflow]
     return (Value >> IntShift) & IntMask;
            ~~~~~~~^~~~~~~~~~~~
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘static intptr_t llvm::PointerIntPairInfo<PointerT, IntBits, PtrTraits>::updateInt(intptr_t, intptr_t) [with PointerT = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; intptr_t = int]’:
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:73:28:   required from ‘void llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::setPointerAndInt(PointerTy, IntType) [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:51:21:   required from ‘llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::PointerIntPair(PointerTy, IntType) [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’
/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2831:33:   required from here
/usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:173:52: warning: left shift count >= width of type [-Wshift-count-overflow]
     return (OrigValue & ~ShiftedIntMask) | IntWord << IntShift;
                                            ~~~~~~~~^~~~~~~~~~~
Comment 48 Uroš Bizjak 2018-01-26 20:04:46 UTC
(In reply to Mike Lothian from comment #47)
> With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm
> now seeing the following error building Clang:

I don't think this is related to the original bugreport involving BMI instructions.  Could be Clang's source incompatibility with 32bit target.

In any case, please open a new PR.
Comment 49 uros 2018-01-29 16:03:49 UTC
Author: uros
Date: Mon Jan 29 16:03:17 2018
New Revision: 257154

URL: https://gcc.gnu.org/viewcvs?rev=257154&root=gcc&view=rev
Log:
	Backport from mainline
	2018-01-26  Uros Bizjak  <ubizjak@gmail.com>

	PR target/81763
	* config/i386/i386.md (*andndi3_doubleword): Add earlyclobber
	to (=&r,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives.


Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/i386/i386.md
Comment 50 Uroš Bizjak 2018-01-29 16:05:36 UTC
Fixed for 7.4+.
Comment 51 Andrew Pinski 2023-06-07 01:43:00 UTC
*** Bug 84377 has been marked as a duplicate of this bug. ***