80148 – [7 Regression] operand has impossible constraints

Bug 80148 - [7 Regression] operand has impossible constraints

Summary: [7 Regression] operand has impossible constraints

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	target (show other bugs)
Version:	7.0.1

Importance:	P1 normal
Target Milestone:	7.0
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:	80160
	Show dependency tree / graph

Reported:	2017-03-22 10:37 UTC by Dmitry Vyukov
Modified:	2017-03-25 04:30 UTC (History)
CC List:	4 users (show)

See Also:
Host:
Target:	i?86--
Build:
Known to work:
Known to fail:
Last reconfirmed:	2017-03-23 00:00:00

Attachments
source code (475.01 KB, application/zip) 2017-03-22 10:39 UTC, Dmitry Vyukov	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dmitry Vyukov 2017-03-22 10:37:59 UTC

This pop up in kernel code. We have:

#define __cmpxchg_double(pfx, p1, p2, o1, o2, n1, n2)			\
({									\
	bool __ret;							\
	__typeof__(*(p1)) __old1 = (o1), __new1 = (n1);			\
	__typeof__(*(p2)) __old2 = (o2), __new2 = (n2);			\
	asm volatile(pfx "cmpxchg%c4b %2; sete %0"			\
		     : "=a" (__ret), "+d" (__old2),			\
		       "+m" (*(p1)), "+m" (*(p2))			\
		     : "i" (2 * sizeof(long)), "a" (__old1),		\
		       "b" (__new1), "c" (__new2));			\
	__ret;								\
})

#define arch_cmpxchg_double(p1, p2, o1, o2, n1, n2) \
	__cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)

#define cmpxchg_double(p1, p2, o1, o2, n1, n2)				\
({									\
	__typeof__(p1) ____p1 = (p1);					\
	kasan_check_write(____p1, 2 * sizeof(*____p1));			\
	arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2));	\
})

And this is invoked as:

		if (cmpxchg_double(&page->freelist, &page->counters,
				   freelist_old, counters_old,
				   freelist_new, counters_new))

This fails with:
error: ‘asm’ operand has impossible constraints

However if I change cmpxchg_double as:

-	arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2));	\
+	arch_cmpxchg_double((p1), (p2), (o1), (o2), (n1), (n2));	\

It works.

I've tested with gcc version 7.0.1 20170307 (experimental) (GCC). Arnd Bergmann reported that the problem happens with 4.9 through 7.0.1 for him. I've also tried with gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) and both versions work.

It's unclear to me why introducing a local variable as "__typeof__(p1) ____p1 = (p1)" and then using it fails. Whereas using p1 works. Essentially we have:

asm volatile("" "cmpxchg%c4b %2; sete %0" : "=a" (__ret), "+d" (__old2), "+m" (*((&page->freelist))), "+m" (*((&page->counters))) : "i" (2 * sizeof(long)), "a" (__old1), "b" (__new1), "c" (__new2));

vs:

__typeof__(&page->freelist) ____p1 = (&page->freelist);
...
asm volatile("" "cmpxchg%c4b %2; sete %0" : "=a" (__ret), "+d" (__old2), "+m" (*(____p1)), "+m" (*((&page->counters))) : "i" (2 * sizeof(long)), "a" (__old1), "b" (__new1), "c" (__new2));


It seems to me that both versions should work the same way.

I will attach preprocessed sources for both versions. They can be compiled as:

gcc slub1.c -fno-strict-aliasing -fno-common -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mpreferred-stack-boundary=2 -march=i686 -mtune=pentium3 -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -Wno-frame-address -O2 --param=allow-store-data-races=0 -Wframe-larger-than=1024 -fno-stack-protector -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -c

Comment 1 Dmitry Vyukov 2017-03-22 10:39:59 UTC

Created attachment 41016 [details]
source code

The archive contains slub2.c that works, and slub1.c that fails with the error.

Comment 2 Jakub Jelinek 2017-03-22 14:13:54 UTC

With just -m32 slub1.c -O2 -mpreferred-stack-boundary=2 -march=i686 this started with r246059.  Why doesn't the kernel use atomic builtins instead?
The asm uses %eax/%ebx/%ecx/%edx and needs to use up to 2 other registers to hold the addresses of mems (unless any of them can be sp relative), if %ebp is used for frame pointer then there are no other registers left, but it still ought to be reloadable.

Comment 3 Dmitry Vyukov 2017-03-22 14:26:38 UTC

> Why doesn't the kernel use atomic builtins instead?

There was a recent discussion here:
https://groups.google.com/forum/#!topic/kasan-dev/3sNHjjb4GCI

In short:
---
Trivially, The C++ model doesn't feature I/O ordering [1]...
+
https://lwn.net/Articles/691295/
+
rmb()/wmb() are not remotely similar to
atomic_thread_fenc_{acquire,release}, even if you restrict ordering to
coherent CPUs (i.e. the smp_* variants). Please don't do that :)
+
I'm also terrified of the optimisations that the compiler is theoretically
allowed to make to C11 atomics given the assumptions of the language
virtual machine, which are not necessarily valid in the kernel environment.
We would at least need well-supported compiler options to disable these
options, and also to allow data races with things like READ_ONCE.

Comment 4 Vladimir Makarov 2017-03-23 19:09:11 UTC

Thank you for reporting this.

Something is wrong with processing insns for reloads.  The asm-insn hash 2 the same operands mem[r263+12].  R263 is spilled for a reload.  The mem becomes invalid and r263 should be reloaded too.  Instead, LRA goes to a spill sub-pass changing the operands on mem[mem[sp+offset]], then it generates 2 reloads for the both operands:

p1=mem[sp+offset]
p2=mem[sp+offset]

LRA can not figure out that p1 and p2 do not conflict and should be the same (in LRA it is done on pseudo bases).

So the solution would be a generation of reloads for R263 before the spill sub-pass.

When I find why does not it happen, I could say how much time will it take to fix.  If it is simple, the patch will be probably ready tomorrow.

Comment 5 Vladimir Makarov 2017-03-23 21:20:58 UTC

The fix proposed by Bernd for PR80160 does not solve the problem.  So I am continuing to work on the patch.

Comment 6 Vladimir Makarov 2017-03-24 18:48:09 UTC

Author: vmakarov
Date: Fri Mar 24 18:47:38 2017
New Revision: 246467

URL: https://gcc.gnu.org/viewcvs?rev=246467&root=gcc&view=rev
Log:
2017-03-24  Vladimir Makarov  <vmakarov@redhat.com>

	PR target/80148
	* lra-assigns.c (assign_by_spills): Add spilled non-reload pseudos
	to consider in curr_insn_transform.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/lra-assigns.c

Comment 7 Jeffrey A. Law 2017-03-25 04:30:39 UTC

Fixed by Vlad's commit on the trunk.