I've been seeing issues with 32bit x86 llvm & mesa when using GCC 7.1.0 and I've just tested GCC 7.2 RC When llvm & mesa are compiled with GCC 7.1.0+ and I run a 32bit glxgears it segfaults, the 64bit version work fine and if I recompile with GCC 6.3.0 everything works fine too (games also segfault but glxgears is a nice cut down example) This is only with the radeonsi driver, i965 works fine https://bugs.freedesktop.org/show_bug.cgi?id=101881 It would be nice to rule out a GCC issue before 7.2.0 is released, the last time I tried turning on debugging the issue no longer manifested. Is there anything you can think of that might help pinpoint the bug?
There are some known issues with older versions of LLVM, maybe you are using too older version of LLVM. That is some versions of LLVM have undefined C++ code in them. GCC 7.1 is more aggressive of optimizing things.
Sorry I should have been more clear, this is LLVM trunk I'm using these flags: CFLAGS="-O2 -march=native -pipe" CXXFLAGS="-O2 -march=native -pipe" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed" gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/7.1.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-7.1.0-r1/work/gcc-7.1.0/configure --host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/7.1.0 --includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include --datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0 --mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/man --infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/info --with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/include/g++-v7 --with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/7.1.0/python --enable-languages=c,c++,fortran --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --enable-checking=release --with-bugurl=https://bugs.gentoo.org/ --with-pkgversion='Gentoo 7.1.0-r1 p1.1' --disable-esp --enable-libstdcxx-time --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --disable-altivec --disable-fixed-point --enable-targets=all --disable-libgcj --enable-libgomp --disable-libmudflap --disable-libssp --disable-libcilkrts --enable-libmpx --enable-vtable-verify --enable-libvtv --enable-lto --with-isl --disable-isl-version-check --enable-libsanitizer --disable-default-pie --disable-default-ssp Thread model: posix gcc version 7.1.0 (Gentoo 7.1.0-r1 p1.1)
So dropping the -march=native allows everything to work again no matter which version of GCC Just using -mbmi breaks things and using -march=native -mno-bmi allows it all to work This is on a Skylake processor with the following in /proc/cpuinfo flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
This might be PR 53399.
Please extract a small testcase.
I tried the test case with gcc -O2 -march=native test.c -o test and gcc -O2 -march=native -mno-bmi test.c -o test Both executables seem to run with no output I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's happening as the problem doesn't manifest when debugging is enabled so I'm not sure how to create a smaller test case
(In reply to Mike Lothian from comment #6) > I tried the test case with > > gcc -O2 -march=native test.c -o test > > and > > gcc -O2 -march=native -mno-bmi test.c -o test > > Both executables seem to run with no output > > I've only seen the issue in radeonsi in Mesa with LLVM, I'm not sure what's > happening as the problem doesn't manifest when debugging is enabled so I'm > not sure how to create a smaller test case Compile radeonsi one function at a time with and without -mbmi to find out the smallest function which causes the problem with -mbmi. If the function is small enough, we can take a guess.
Mike, you can start by preparing two build trees, one with faulty mesa compiled with -march=native, the other with working-fine mesa compiled with -march=native -mno-bmi. You should be able to collect: - .o files from each tree, and - link commands from build logs and be able to re-link mesa libraries by hand and verify they still exhibit the same behavior (one fails, the other doesn't). From there you can proceed by building "hybrid" libraries by taking a set of good .o files and a complementary set of bad .o files. This will allow you to find a single .o file that makes the library behave wrongly. More explanation and a helper script for bisecting is available at https://gcc.gnu.org/wiki/Analysing_Large_Testcases At that point please share your status (once you're down to one file there's no generic recipe, you'd have to get creative).
A (potentially simpler) alternative is to use sequential builds (make without -j) and bisect by index of compiled source file, i.e. have a wrapper script around gcc that uses some global counter to pass -mno-bmi to first N compiler invocations.
Unfortunately it also depends on LLVM not just Mesa which makes it a much bigger target for figuring this out
So a lot of the segfaults I see are in si_shader so I thought I'd compile Mesa with and without BMI and compare the onjdumps of the two .o files CFLAGS="-O2 -march=native -pipe -mno-bmi -m32" CXXFLAGS="-O2 -march=native -pipe -mno-bmi -m32" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -fuse-ld=bfd -m32" ./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/mesa-9999 --htmldir=/usr/share/doc/mesa-9999/html --libdir=/usr/lib32 --enable-dri --enable-glx --enable-shared-glapi --enable-texture-float --enable-nine --disable-debug --enable-dri3 --enable-egl --enable-gbm --enable-gles1 --enable-gles2 --enable-glx-tls --disable-libunwind --enable-valgrind=no --enable-llvm-shared-libs --with-dri-drivers=,swrast,i965 --with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm --enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau --disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers --disable-glx-read-only-text --disable-gallium-osmesa CFLAGS="-O2 -march=native -pipe -m32" CXXFLAGS="-O2 -march=native -pipe -m32" LDFLAGS="-Wl,-O1 -Wl,--hash-style=gnu -Wl,--as-needed -fuse-ld=bfd -m32" ./autogen.sh --prefix=/usr --build=i686-pc-linux-gnu --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --disable-dependency-tracking --disable-silent-rules --docdir=/usr/share/doc/mesa-9999 --htmldir=/usr/share/doc/mesa-9999/html --libdir=/usr/lib32 --enable-dri --enable-glx --enable-shared-glapi --enable-texture-float --enable-nine --disable-debug --enable-dri3 --enable-egl --enable-gbm --enable-gles1 --enable-gles2 --enable-glx-tls --disable-libunwind --enable-valgrind=no --enable-llvm-shared-libs --with-dri-drivers=,swrast,i965 --with-gallium-drivers=,swrast,radeonsi --with-vulkan-drivers=,intel,radeon PYTHON2=/usr/bin/python2.7 --with-platforms=x11,surfaceless,wayland,drm --enable-nine --enable-llvm --enable-omx --enable-va --enable-vdpau --disable-xa --disable-xvmc --with-va-libdir=/usr/lib32/va/drivers --disable-glx-read-only-text --disable-gallium-osmesa Were my configure options, and LLVM was compiled with -mno-bmi I'm attaching the dumps
Created attachment 41960 [details] si_shader objdumps
(In reply to Mike Lothian from comment #12) > Created attachment 41960 [details] > si_shader objdumps We need a small testcase in C.
I've been playing around with GCC 7.2.0 again When compiling a 32bit LLVM with -O2 -march=native, llvm-tblgen runs indefinitely during the build so it never completes Doing the same with -O2 -march=native -mno-bmi or using GCC 6.4.0 allows it to build From what I can see llvm-tblgen is a C++ program
Created attachment 43041 [details] llvm-tblgen Working and Broken binaries
I've seen this as well. Build llvm-7 with gcc-7.3, -m32 -march=znver1 -mtune=broadwell -O1 / -Og: all good -O2: llvm-tblgen runs in an infinite(?) loop -O3: llvm-tblgen segfaults -mno-bmi: all good, at all optimization levels In case it helps, here's gdb info: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x080a96c9 in llvm::MachineValueTypeSet::const_iterator::find_from_pos (P=<optimized out>, this=<synthetic pointer>) at /tmp-ram/x1/portage/sys-devel/llvm-9999/work/llvm-9999/utils/TableGen/CodeGenDAGPatterns.h:148 148 W &= maskLeadingOnes<WordType>(WordWidth-SkipBits); (gdb) list 143 144 // If P is in the middle of a word, process it manually here, because 145 // the trailing bits need to be masked off to use findFirstSet. 146 if (SkipBits != 0) { 147 WordType W = Set->Words[SkipWords]; 148 W &= maskLeadingOnes<WordType>(WordWidth-SkipBits); 149 if (W != 0) 150 return Count + findFirstSet(W); 151 Count += WordWidth; 152 SkipWords++; [...] 0x080a968d <+1293>: shr $0x6,%ebx 0x080a9690 <+1296>: and $0xffffffc0,%esi 0x080a9693 <+1299>: and $0x3f,%eax 0x080a9696 <+1302>: je 0x80a96e0 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+1376> 0x080a9698 <+1304>: mov $0x40,%ecx 0x080a969d <+1309>: mov $0xffffffff,%edx 0x080a96a2 <+1314>: xor %edi,%edi 0x080a96a4 <+1316>: sub %eax,%ecx 0x080a96a6 <+1318>: mov $0xffffffff,%eax 0x080a96ab <+1323>: shrd %cl,%edx,%eax 0x080a96ae <+1326>: test $0x20,%cl 0x080a96b1 <+1329>: shrx %ecx,%edx,%edx 0x080a96b6 <+1334>: cmovne %edx,%eax 0x080a96b9 <+1337>: cmovne %edi,%edx 0x080a96bc <+1340>: mov %edx,%ecx 0x080a96be <+1342>: mov %eax,%edx 0x080a96c0 <+1344>: mov -0x3c(%ebp),%eax 0x080a96c3 <+1347>: andn (%eax,%ebx,8),%edx,%eax => 0x080a96c9 <+1353>: andn 0x4(%eax,%ebx,8),%ecx,%edx 0x080a96d0 <+1360>: mov %edx,%edi 0x080a96d2 <+1362>: or %eax,%edi 0x080a96d4 <+1364>: jne 0x80a9e70 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3312> 0x080a96da <+1370>: add $0x40,%esi 0x080a96dd <+1373>: add $0x1,%ebx 0x080a96e0 <+1376>: cmp $0x4,%ebx 0x080a96e3 <+1379>: je 0x80a9e60 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3296> 0x080a96e9 <+1385>: mov -0x3c(%ebp),%eax 0x080a96ec <+1388>: mov 0x4(%eax,%ebx,8),%edx 0x080a96f0 <+1392>: mov (%eax,%ebx,8),%eax 0x080a96f3 <+1395>: mov %edx,%ecx 0x080a96f5 <+1397>: or %eax,%ecx 0x080a96f7 <+1399>: jne 0x80a9e42 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+3266> 0x080a96fd <+1405>: lea 0x1(%ebx),%eax [...] (gdb) info registers eax 0x18000040 402653248 ecx 0x0 0 edx 0x3f 63 ebx 0x0 0 esp 0xff8f9660 0xff8f9660 ebp 0xff8f96c8 0xff8f96c8 esi 0x0 0 edi 0x0 0 eip 0x80a96c9 0x80a96c9 <llvm::TypeInfer::EnforceSmallerThan(llvm::TypeSetByHwMode&, llvm::TypeSetByHwMode&)+1353> eflags 0x10202 [ IF RF ] cs 0x23 35 ss 0x2b 43 ds 0x2b 43 es 0x2b 43 fs 0x0 0 gs 0x63 99
(In reply to Manuel Lauss from comment #16) > 0x080a96c3 <+1347>: andn (%eax,%ebx,8),%edx,%eax > => 0x080a96c9 <+1353>: andn 0x4(%eax,%ebx,8),%ecx,%edx This looks like double-word andn is clobbering %eax too early.
(In reply to Uroš Bizjak from comment #17) > (In reply to Manuel Lauss from comment #16) > > 0x080a96c3 <+1347>: andn (%eax,%ebx,8),%edx,%eax > > => 0x080a96c9 <+1353>: andn 0x4(%eax,%ebx,8),%ecx,%edx > > This looks like double-word andn is clobbering %eax too early. Yeah. In that case, can you please attach the preprocessed source of whatever source contains that and g++ command line options used to compile that? Thanks.
Haven't managed to reproduce it e.g. with long long foo (long long *p, int q, unsigned r1, unsigned r2) { int t, u; asm ("" : "+a" (p), "+b" (q), "+d" (r1), "+c" (r2), "=S" (t), "=D" (u)); unsigned long long r = ((unsigned long long) r2 << 32) | r1; long long a = p[q] & ~r; asm volatile ("" : "+A" (a) : "S" (t), "D" (u)); return a; }
Created attachment 43242 [details] preprocessed source preprocessed source of file that contains the function "llvm::TypeInfer::EnforceSmallerThan", compressed due it being 2.5MB. The Makefile builds it with these parameters: (stripped the includes) g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections
Following patch should fix the problem: --cut here-- Index: i386.md =================================================================== --- i386.md (revision 256935) +++ i386.md (working copy) @@ -9250,10 +9250,10 @@ }) (define_insn "*andndi3_doubleword" - [(set (match_operand:DI 0 "register_operand" "=r,&r") + [(set (match_operand:DI 0 "register_operand" "=r,r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "r,0")) - (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) + (match_operand:DI 2 "nonimmediate_operand" "r,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" --cut here-- So, the pattern now reads: (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=r,r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "r,0")) (match_operand:DI 2 "nonimmediate_operand" "r,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" [(set_attr "isa" "bmi,*")]) Instead of using earlyclobber on the output operand (which guarantees that no input register will be clobbered), we can use a little trick and in case of memory operand 2, match the output with a register operand 1. This will keep output registers separate from registers in the memory operand (and is in fact what we do in all other _doubleword patterns). Jakub, I don't have Haswell target to test BMI instructions via native bootstrap, can you perhaps bootstrap the compiler with the above patch?
Will do, for now I'm including it with my normal options bootstraps (testing other patches and need the same baseline), then will try some --with-arch/--with-tune). Wonder though if it wouldn't give the RA more choices by also including another &r <- (r, m) alternative with bmi2 isa attribute.
The above patch builds on the promise, that with (=r,r,r) alternative, the register allocator won't allocate (=r1,=r2) = ~(r0,r1) & (r2,r3). This would again clobber the r1 too early: r1 = ~r0 & r2 r2 = ~r1 & r3 The safest choice of the pattern would be: (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=r,&r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "0,0")) (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" [(set_attr "isa" "bmi,*")]) (The earlyclobber of non-BMI case is needed due to separate not insn).
(In reply to Jakub Jelinek from comment #22) > Wonder though if it wouldn't give the RA more choices by also including > another > &r <- (r, m) alternative with bmi2 isa attribute. This would be worse than r <- (0, m) alternative on register starved x86_32 architecture. The above approach can use up to 6 registers, while r <- (0, m) uses up to 4.
I believe for double-word pseudos the RA will not do that, CCing Vlad about it. Anyway, by having all of r <- (r, r), r <- (0, rm) and &r <- (r, m) alternatives I'd think the RA has more choices than when it just has the first 2. If it sees it as beneficial to have the middle operand in the destination, it can due to the second alternative even if third one is a memory, if it wants some other, it can, just needs to make sure the destination doesn't overlap with mem's address.
Is this the patch you want us to test then: diff -ur a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md --- a/gcc/config/i386/i386.md 2018-01-16 11:17:49.509247000 +0000 +++ b/gcc/config/i386/i386.md 2018-01-25 18:21:25.562225621 +0000 @@ -8586,7 +8586,7 @@ (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=r,&r") (and:DI - (not:DI (match_operand:DI 1 "register_operand" "r,0")) + (not:DI (match_operand:DI 1 "register_operand" "0,0")) (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
(In reply to Uroš Bizjak from comment #21) > Following patch should fix the problem: > > --cut here-- > Index: i386.md > =================================================================== > --- i386.md (revision 256935) > +++ i386.md (working copy) > @@ -9250,10 +9250,10 @@ > }) > > (define_insn "*andndi3_doubleword" > - [(set (match_operand:DI 0 "register_operand" "=r,&r") > + [(set (match_operand:DI 0 "register_operand" "=r,r") > (and:DI > (not:DI (match_operand:DI 1 "register_operand" "r,0")) > - (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) > + (match_operand:DI 2 "nonimmediate_operand" "r,rm"))) > (clobber (reg:CC FLAGS_REG))] > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" > "#" > --cut here-- > > So, the pattern now reads: > > (define_insn "*andndi3_doubleword" > [(set (match_operand:DI 0 "register_operand" "=r,r") > (and:DI > (not:DI (match_operand:DI 1 "register_operand" "r,0")) > (match_operand:DI 2 "nonimmediate_operand" "r,rm"))) > (clobber (reg:CC FLAGS_REG))] > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" > "#" > [(set_attr "isa" "bmi,*")]) I rebuilt gcc-7.3 with this applied, now the generated code looks much better: ab: 0f ad d0 shrd %cl,%edx,%eax ae: f6 c1 20 test $0x20,%cl b1: c4 e2 73 f7 d2 shrx %ecx,%edx,%edx b6: 0f 45 c2 cmovne %edx,%eax b9: 0f 45 d5 cmovne %ebp,%edx bc: c4 e2 68 f2 54 df 04 andn 0x4(%edi,%ebx,8),%edx,%edx c3: 89 d1 mov %edx,%ecx c5: c4 e2 78 f2 04 df andn (%edi,%ebx,8),%eax,%eax cb: 09 c1 or %eax,%ecx And it seems to work as well, 32bit llvm now built successfully.
(In reply to Uroš Bizjak from comment #23) > (The earlyclobber of non-BMI case is needed due to separate not insn). It is not needed. I have added earlyclobber in r243202 without much thought.
(In reply to Jakub Jelinek from comment #25) > I believe for double-word pseudos the RA will not do that, CCing Vlad about > it. I start to worry about it due to allocated: 0x080a96c3 <+1347>: andn (%eax,%ebx,8),%edx,%eax => 0x080a96c9 <+1353>: andn 0x4(%eax,%ebx,8),%ecx,%edx In addition to memory input operand, (dx,cx) regpair is used and (ax,dx) is the output operand. These regpairs do in fact interleave! > Anyway, by having all of r <- (r, r), r <- (0, rm) and &r <- (r, m) > alternatives I'd think the RA has more choices than when it just has the > first 2. > If it sees it as beneficial to have the middle operand in the destination, > it can due to the second alternative even if third one is a memory, if it > wants some other, it can, just needs to make sure the destination doesn't > overlap with mem's address. Taking all those facts into account, I think we can allow two alternatives: r <- (0, rm) and additionally &r <- (r, m) for BMI
So, I'll bootstrap: (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=r,&r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "0,r")) (match_operand:DI 2 "nonimmediate_operand" "rm,m"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" [(set_attr "isa" "*,bmi")])
(In reply to Uroš Bizjak from comment #30) > So, I'll bootstrap: Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23): (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=r,&r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "0,r")) (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" [(set_attr "isa" "*,bmi")]) Manuel, can you please test this pattern?
(In reply to Uroš Bizjak from comment #31) > (In reply to Uroš Bizjak from comment #30) > > So, I'll bootstrap: > > Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23): > > (define_insn "*andndi3_doubleword" > [(set (match_operand:DI 0 "register_operand" "=r,&r") > (and:DI > (not:DI (match_operand:DI 1 "register_operand" "0,r")) > (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) > (clobber (reg:CC FLAGS_REG))] > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" > "#" > [(set_attr "isa" "*,bmi")]) > > Manuel, can you please test this pattern? Seems to work as well: [...] b1: c4 e2 73 f7 d2 shrx %ecx,%edx,%edx b6: 0f 45 c2 cmovne %edx,%eax b9: 0f 45 d5 cmovne %ebp,%edx bc: c4 e2 68 f2 54 df 04 andn 0x4(%edi,%ebx,8),%edx,%edx c3: 89 d1 mov %edx,%ecx c5: c4 e2 78 f2 04 df andn (%edi,%ebx,8),%eax,%eax cb: 09 c1 or %eax,%ecx [...]
(In reply to Uroš Bizjak from comment #31) > (In reply to Uroš Bizjak from comment #30) > > So, I'll bootstrap: > > Maybe we can also allow &r <- (r,r) for BMI, to be safe (c.f. comment #23): > > (define_insn "*andndi3_doubleword" > [(set (match_operand:DI 0 "register_operand" "=r,&r") > (and:DI > (not:DI (match_operand:DI 1 "register_operand" "0,r")) > (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) > (clobber (reg:CC FLAGS_REG))] > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" > "#" > [(set_attr "isa" "*,bmi")]) > > Manuel, can you please test this pattern? At least with a smarter splitter we don't really need to avoid no overlap at all for the r <- (r, r) bmi case, we can choose which of the two 32-bit andn's we do first depending on the overlap, all we need to guarantee is that the splitter is not impossible and ideally doesn't need any instructions but the two. Hard registers for DImode must be consecutive because we identify them by the (lowest) register number and mode and for r <- (r, r) there can't be any overlap between the two input operands. So, even if DImode registers can start at any GPR number other than the last, not just even ones, either there is no overlap at all in between output and inputs, or the output is the same as the first, or as the second input (all these cases are fine), or there is a partial overlap with one or both of the operands. For the partial operand I can think of DI:N = DI:N+1 &~ DI:unrelated, or DI:N+1 = DI:N &~ DI:unrelated, or DI:N+1 = DI:N &~ DI:N+2 (or swapped operands), the last case is partial overlap with both inputs. So, right now we'd split those into: SI:N = SI:N+1 &~ SI:unrelated; SI:N+1 = SI:N+2 &~ SI:unrelated+1 which is fine, or SI:N+1 = SI:N &~ SI:unrelated; SI:N+2 = SI:N+1 &~ SI:unrelated+1 which is wrong but we can swap those: SI:N+2 = SI:N+1 &~ SI:unrelated+1; SI:N+1 = SI:N &~ SI:unrelated and it should work. The last case would be right now: SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3; and is again wrong, but we could again swap: SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2; and all is fine.
(In reply to Jakub Jelinek from comment #33) > At least with a smarter splitter we don't really need to avoid no overlap at > all for the r <- (r, r) bmi case, we can choose which of the two 32-bit > andn's we do first depending on the overlap, all we need to guarantee is > that the splitter is not impossible and ideally doesn't need any > instructions but the two. This would work nicely. I have seen a couple of places with the attached source (using only -mbmi -m32) where RA satisfied & or 0 constraint only with spills and would benefit from relaxed constraints. BTW: AFAICS, "andn" is the only double-word three-operand instruction that is split after reload. So, andn r <- (r,r) post-reload splitter is a precedent in x86 world.
So, what about following? --- gcc/config/i386/i386.md.jj 2018-01-16 09:28:19.721432394 +0100 +++ gcc/config/i386/i386.md 2018-01-25 20:58:18.382378827 +0100 @@ -9250,14 +9250,14 @@ (define_split }) (define_insn "*andndi3_doubleword" - [(set (match_operand:DI 0 "register_operand" "=r,&r") + [(set (match_operand:DI 0 "register_operand" "=r,r,&r") (and:DI - (not:DI (match_operand:DI 1 "register_operand" "r,0")) - (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) + (not:DI (match_operand:DI 1 "register_operand" "0,r,r")) + (match_operand:DI 2 "nonimmediate_operand" "rm,r,m"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" - [(set_attr "isa" "bmi,*")]) + [(set_attr "isa" "*,bmi2,bmi2")]) (define_split [(set (match_operand:DI 0 "register_operand") @@ -9273,7 +9273,22 @@ (define_split (parallel [(set (match_dup 3) (and:SI (not:SI (match_dup 4)) (match_dup 5))) (clobber (reg:CC FLAGS_REG))])] - "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);") +{ + split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]); + /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could + be overlap between the output and input registers. If the output + is equal to one of the input operands, this is fine, if there is + partial overlap, we can resolve it by swapping the two instructions. */ + if (reg_overlap_mentioned_p (operands[0], operands[4]) + || reg_overlap_mentioned_p (operands[0], operands[5])) + { + std::swap (operands[0], operands[3]); + std::swap (operands[1], operands[4]); + std::swap (operands[2], operands[5]); + gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4]) + && !reg_overlap_mentioned_p (operands[0], operands[5])); + } +}) (define_split [(set (match_operand:DI 0 "register_operand")
Ah, bmi, not bmi2. And the =r <- (r, r) alternative might be best first. --- gcc/config/i386/i386.md.jj 2018-01-16 09:28:19.721432394 +0100 +++ gcc/config/i386/i386.md 2018-01-25 20:58:18.382378827 +0100 @@ -9250,14 +9250,14 @@ (define_split }) (define_insn "*andndi3_doubleword" - [(set (match_operand:DI 0 "register_operand" "=r,&r") + [(set (match_operand:DI 0 "register_operand" "=r,r,&r") (and:DI - (not:DI (match_operand:DI 1 "register_operand" "r,0")) - (match_operand:DI 2 "nonimmediate_operand" "rm,rm"))) + (not:DI (match_operand:DI 1 "register_operand" "r,0,r")) + (match_operand:DI 2 "nonimmediate_operand" "r,rm,m"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" - [(set_attr "isa" "bmi,*")]) + [(set_attr "isa" "bmi,*,bmi")]) (define_split [(set (match_operand:DI 0 "register_operand") @@ -9273,7 +9273,22 @@ (define_split (parallel [(set (match_dup 3) (and:SI (not:SI (match_dup 4)) (match_dup 5))) (clobber (reg:CC FLAGS_REG))])] - "split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]);") +{ + split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]); + /* For the =r <- (r, r) alternative of *andndi3_doubleword, there could + be overlap between the output and input registers. If the output + is equal to one of the input operands, this is fine, if there is + partial overlap, we can resolve it by swapping the two instructions. */ + if (reg_overlap_mentioned_p (operands[0], operands[4]) + || reg_overlap_mentioned_p (operands[0], operands[5])) + { + std::swap (operands[0], operands[3]); + std::swap (operands[1], operands[4]); + std::swap (operands[2], operands[5]); + gcc_assert (!reg_overlap_mentioned_p (operands[0], operands[4]) + && !reg_overlap_mentioned_p (operands[0], operands[5])); + } +}) (define_split [(set (match_operand:DI 0 "register_operand")
(In reply to Jakub Jelinek from comment #33) > and it should work. The last case would be right now: > SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3; > and is again wrong, but we could again swap: > SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2; > and all is fine. Whoops, it looks that SI:N+2 is clobbered in the swapped case.
(In reply to Jakub Jelinek from comment #36) Your patch does fix my llvm issue. Thank you!
I can confirm it fixes things for me too. Is that the final patch in Comment 36? If so I'll try and get the Gentoo devs to include it in the GCC ebuilds Will this be added to GCC 8.1 and 7.4? Thanks again, this bug has been plaguing me for the last 8 months
(In reply to Uroš Bizjak from comment #37) > (In reply to Jakub Jelinek from comment #33) > > > and it should work. The last case would be right now: > > SI:N+1 = SI:N &~ SI:N+2; SI:N+2 = SI:N+1 &~ SI:N+3; > > and is again wrong, but we could again swap: > > SI:N+2 = SI:N+1 &~ SI:N+3; SI:N+1 = SI:N &~ SI:N+2; > > and all is fine. > > Whoops, it looks that SI:N+2 is clobbered in the swapped case. You're right. So the question is if IRA/LRA can ever allow that case where there is partial overlap with both registers. I've tried hard to simulate that case with: unsigned long long foo (unsigned long long x, unsigned long long y) { unsigned long long z; asm ("" : "+A" (x), "+Q" (y)); z = x & ~y; asm ("" : "+Q" (z) : "a" (0), "b" (0)); return z; } where IRA indeed allocates the used pseudos such that x is in ax:dx, y in cx:bx and z in dx:cx. Now, if I try this and testcase with ~x & y instead of x & ~y with GCC patched with #c36, I get: andn %eax, %ecx, %ecx xorl %eax, %eax andn %edx, %ebx, %ebx movl %ecx, %edx movl %ebx, %ecx movl %eax, %ebx resp. andn %ecx, %eax, %ecx xorl %eax, %eax andn %ebx, %edx, %ebx movl %ecx, %edx movl %ebx, %ecx movl %eax, %ebx between the two inline asms, and if I leave just the =r <- (r, r) alternative and nothing else, LRA ICEs on it (on both variants). All is with -O2 -m32 -mbmi -mstv -msse2. So, is there something in LRA that prevents these partial overlaps?
Let's go forward with this pattern: (define_insn "*andndi3_doubleword" [(set (match_operand:DI 0 "register_operand" "=&r,r,r,&r") (and:DI (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0")) (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" "#" [(set_attr "isa" "bmi,bmi,bmi,*")]) (=&r,r,rm) alternative avoids matching of output to all other overlapped and partial overlapped operands. (=r,0,rm) alternative allows matching with op1, so we are sure output won't *partially* overlap with op2. This is true for register and memory operands. It can still fully overlap with register op2 in case the same reg is allocated for op1, op2 and op3, which is OK for BMI. (=r,r,0) alternative will prevent *partial* overlap of output with op1 in a similar way. (=&r,0,rm) is non-bmi alternative. Earlyclobber is needed, otherwise RA can match op2 with op0 and op1, so the same reg is allocated for op0, op1 and op2. If this is the case, after the split, NOT/AND sequence won't match ANDN instruction, since NOT will change both input operands to a follow-up AND insn.
With the patch in Comment 36 I get the following error compiling Clang FAILED: lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Lex -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -MF lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o.d -o lib/Lex/CMakeFiles/clangLex.dir/PPExpressions.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp: In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&, DefinedTracker&, bool, clang::Preprocessor&)’: /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp:492:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623 } ^
(In reply to Uroš Bizjak from comment #41) > Let's go forward with this pattern: > > (define_insn "*andndi3_doubleword" > [(set (match_operand:DI 0 "register_operand" "=&r,r,r,&r") > (and:DI > (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0")) > (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm"))) > (clobber (reg:CC FLAGS_REG))] > "!TARGET_64BIT && TARGET_STV && TARGET_SSE2" > "#" > [(set_attr "isa" "bmi,bmi,bmi,*")]) This looks reasonable to me. (In reply to Mike Lothian from comment #42) > With the patch in Comment 36 I get the following error compiling Clang Just out of interest, could you attach the preprocessed source here, I'd like to see the case when the RA assigns such a partial overlap. As you're using -march=native, will need g++ output with -v added to the command too (what flags it expands to).
Created attachment 43252 [details] compressed preprocessed source g++ -m32 -O3 -ggdb -march=znver1 -mtune=broadwell -pipe -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -o test.o -c PPExpressions.i /tmp-ram/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp: In function ‘bool EvaluateValue({anonymous}::PPValue&, clang::Token&, DefinedTracker&, bool, clang::Preprocessor&)’: /tmp-ram/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Lex/PPExpressions.cpp:492:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623 } ^ Please submit a full bug report, with preprocessed source if appropriate.
Author: uros Date: Fri Jan 26 15:36:32 2018 New Revision: 257096 URL: https://gcc.gnu.org/viewcvs?rev=257096&root=gcc&view=rev Log: PR target/81763 * config/i386/i386.md (*andndi3_doubleword): Add earlyclobber to (=&r,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
Created attachment 43257 [details] reducest testcase reduced testcase for Jakub's patch in comment #36 and the build failure it causes in comment #42: g++ (Gentoo 7.2.0-r1 p1.1) 7.2.0 g++ -m32 -O2 -mbmi -fPIC -fno-strict-aliasing -c testcase.i testcase.i:126:12: warning: ‘unsigned int {anonymous}::bc::bn()’ used but never defined unsigned bn(); ^~ testcase.i:98:8: warning: ‘void {anonymous}::bc::bo({anonymous}::bc&)’ used but never defined void bo(bc &); ^~ testcase.i:154:3: warning: ‘{anonymous}::bw::bw(unsigned int)’ used but never defined bw(unsigned); ^~ testcase.i: In function ‘bool bz({anonymous}::bw&, ai::as, by&, bool, ai::av&)’: testcase.i:178:1: internal compiler error: in gen_split_178, at config/i386/i386.md:8623 } ^
With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm now seeing the following error building Clang: [294/954] /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp FAILED: lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o /usr/bin/x86_64-pc-linux-gnu-g++ -m32 -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization -I/var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include -Iinclude -I/usr/lib/llvm/7/include -DNDEBUG -O2 -march=native -pipe -mindirect-branch=thunk -mfunction-return=thunk -mindirect-branch-register -fpermissive -fPIC -fvisibility-inlines-hidden -Werror=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -Wdelete-non-virtual-dtor -Wno-comment -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -fno-strict-aliasing -pedantic -Wno-long-long -fPIC -MD -MT lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -MF lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o.d -o lib/Serialization/CMakeFiles/clangSerialization.dir/ASTReader.cpp.o -c /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp In file included from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/APValue.h:20:0, from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:17, from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/DeclObjC.h:17, from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/Serialization/ASTReader.h:17, from /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/lib/Serialization/ASTReader.cpp:14: /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘struct llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >’: /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:56:57: required from ‘PointerTy llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::getPointer() const [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’ /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2859:39: required from here /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:132:3: error: static assertion failed: PointerIntPair with integer size too large for pointer static_assert(IntBits <= PtrTraits::NumLowBitsAvailable, ^~~~~~~~~~~~~ /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: left shift count >= width of type [-Wshift-count-overflow] ShiftedIntMask = (uintptr_t)(IntMask << IntShift) ~~~~~~~~~^~~~~~~~~~~~ /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:147:42: warning: right operand of shift expression ‘(3 << 4294967295)’ is >= than the precision of the left operand [-fpermissive] /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘static intptr_t llvm::PointerIntPairInfo<PointerT, IntBits, PtrTraits>::getInt(intptr_t) [with PointerT = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; intptr_t = int]’: /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:58:56: required from ‘IntType llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::getInt() const [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’ /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2897:32: required from here /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:156:19: warning: right shift count >= width of type [-Wshift-count-overflow] return (Value >> IntShift) & IntMask; ~~~~~~~^~~~~~~~~~~~ /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h: In instantiation of ‘static intptr_t llvm::PointerIntPairInfo<PointerT, IntBits, PtrTraits>::updateInt(intptr_t, intptr_t) [with PointerT = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; intptr_t = int]’: /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:73:28: required from ‘void llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::setPointerAndInt(PointerTy, IntType) [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’ /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:51:21: required from ‘llvm::PointerIntPair<PointerTy, IntBits, IntType, PtrTraits, Info>::PointerIntPair(PointerTy, IntType) [with PointerTy = llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>; unsigned int IntBits = 2; IntType = unsigned int; PtrTraits = llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> >; Info = llvm::PointerIntPairInfo<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*>, 2, llvm::PointerLikeTypeTraits<llvm::PointerUnion<clang::TypeSourceInfo*, std::pair<clang::TypeSourceInfo*, clang::QualType>*> > >]’ /var/tmp/portage/sys-devel/clang-9999/work/x/y/clang-9999/include/clang/AST/Decl.h:2831:33: required from here /usr/lib/llvm/7/include/llvm/ADT/PointerIntPair.h:173:52: warning: left shift count >= width of type [-Wshift-count-overflow] return (OrigValue & ~ShiftedIntMask) | IntWord << IntShift; ~~~~~~~~^~~~~~~~~~~
(In reply to Mike Lothian from comment #47) > With the patch you've committed to gcc master, applied on top of GCC 7.3 I'm > now seeing the following error building Clang: I don't think this is related to the original bugreport involving BMI instructions. Could be Clang's source incompatibility with 32bit target. In any case, please open a new PR.
Author: uros Date: Mon Jan 29 16:03:17 2018 New Revision: 257154 URL: https://gcc.gnu.org/viewcvs?rev=257154&root=gcc&view=rev Log: Backport from mainline 2018-01-26 Uros Bizjak <ubizjak@gmail.com> PR target/81763 * config/i386/i386.md (*andndi3_doubleword): Add earlyclobber to (=&r,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives. Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/i386/i386.md
Fixed for 7.4+.
*** Bug 84377 has been marked as a duplicate of this bug. ***