Bug 38201 - -mfma/-mavx and -msse5/-msse4a don't work together
Summary: -mfma/-mavx and -msse5/-msse4a don't work together
Status: RESOLVED WONTFIX
Alias: None
Product: gcc
Classification: Unclassified
Component: target (show other bugs)
Version: 4.4.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: http://gcc.gnu.org/ml/gcc-patches/200...
Keywords:
Depends on:
Blocks:
 
Reported: 2008-11-20 14:45 UTC by H.J. Lu
Modified: 2011-01-16 15:56 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2008-12-09 21:58:04


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description H.J. Lu 2008-11-20 14:45:24 UTC
Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
Intel FMA and is a dummy at the moment, or -msse5, we will
generate FMA instructions for

double f;

void
foo (double x, double y, double z)
{
  f = x * y + z;
}

What FMA should "-mfma -msse5" generate? Also currently, with
"-O2 -mavx -msse5", we generate

foo:
	fmaddsd	%xmm2, %xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, f(%rip)
	ret

which won't run on any machines. For "-mfma -msse5" and
"-mavx -msse5",

1. Should these combinations be allowed? If allowed,
2. Should the last option turn off the first one?
Comment 1 Dwarak Rajagopal 2008-11-20 16:48:05 UTC
1) -msse5 includes -mfma switch (because fma is a part of sse5 instructions). So having "-msse5 -mfma" is same as having just "msse5", though you can just have -fma (without -msse5).

2) "-mavx -msse5" => Yes. This would not make sense since no machine can run this.

- Dwarak


(In reply to comment #0)
> Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
> Intel FMA and is a dummy at the moment, or -msse5, we will
> generate FMA instructions for
> 
> double f;
> 
> void
> foo (double x, double y, double z)
> {
>   f = x * y + z;
> }
> 
> What FMA should "-mfma -msse5" generate? Also currently, with
> "-O2 -mavx -msse5", we generate
> 
> foo:
>         fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
>         vmovsd  %xmm0, f(%rip)
>         ret
> 
> which won't run on any machines. For "-mfma -msse5" and
> "-mavx -msse5",
> 
> 1. Should these combinations be allowed? If allowed,
> 2. Should the last option turn off the first one?
> 

(In reply to comment #0)
> Both Intel FMA and AMD SSE5 support FMA. For -mfma, which enables
> Intel FMA and is a dummy at the moment, or -msse5, we will
> generate FMA instructions for
> 
> double f;
> 
> void
> foo (double x, double y, double z)
> {
>   f = x * y + z;
> }
> 
> What FMA should "-mfma -msse5" generate? Also currently, with
> "-O2 -mavx -msse5", we generate
> 
> foo:
>         fmaddsd %xmm2, %xmm1, %xmm0, %xmm0
>         vmovsd  %xmm0, f(%rip)
>         ret
> 
> which won't run on any machines. For "-mfma -msse5" and
> "-mavx -msse5",
> 
> 1. Should these combinations be allowed? If allowed,
> 2. Should the last option turn off the first one?
> 

Comment 2 H.J. Lu 2008-11-20 16:57:46 UTC
(In reply to comment #1)
> 1) -msse5 includes -mfma switch (because fma is a part of sse5 instructions).
> So having "-msse5 -mfma" is same as having just "msse5", though you can just
> have -fma (without -msse5).

Please look closely. I added -mfma to i386.opt:

---
mfma
Target Report Mask(ISA_FMA) Var(ix86_isa_flags) VarExists
Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX and FMA built-in functions and code generation
---

It has nothing to with SSE5. SSE5 never implies TARGET_FMA.
Comment 3 H.J. Lu 2008-11-20 18:52:49 UTC
Since -mfma implies -mavx, we got

[hjl@gnu-26 gcc]$ cat f.c
double f;

void
foo (double x, double y, double z)
{
  f = x * y + z;
}
[hjl@gnu-26 gcc]$ ./xgcc -B./ -O2 -mfma -msse5 f.c -S -fno-asynchronous-unwind-tables
[hjl@gnu-26 gcc]$ cat f.s
	.file	"f.c"
	.text
	.p2align 4,,15
.globl foo
	.type	foo, @function
foo:
	fmaddsd	%xmm2, %xmm1, %xmm0, %xmm0
	vmovsd	%xmm0, f(%rip)
	ret
	.size	foo, .-foo
	.comm	f,8,8
	.ident	"GCC: (GNU) 4.4.0 20081120 (experimental) [trunk revision 142045]"
	.section	.note.GNU-stack,"",@progbits
[hjl@gnu-26 gcc]$ 
Comment 4 Dwarak Rajagopal 2008-11-20 19:35:09 UTC
Yes, you are right. "-mfma -msse5" does not make sense. I mistook -mfma for -mfused-madd and hence the confusion.

Hence these combinations (1 and 2) does not make sense. 

Thanks,
Dwarak
Comment 5 H.J. Lu 2008-11-20 19:46:53 UTC
(In reply to comment #4)
> Yes, you are right. "-mfma -msse5" does not make sense. I mistook -mfma for
> -mfused-madd and hence the confusion.
> 
> Hence these combinations (1 and 2) does not make sense. 
> 

Should we disallow such combinations?
Comment 6 Dwarak Rajagopal 2008-11-20 19:49:52 UTC
> Should we disallow such combinations?
> 
Yes.
- Dwarak
Comment 7 H.J. Lu 2008-11-20 21:29:07 UTC
We have the same issue with -m3dnow, -m3dnowa and -msse4a.
Comment 8 Joey Ye 2008-11-21 12:00:56 UTC
In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any option combination from both sets should be prohibited.

Please add more options into these set in case I missed any.
Comment 9 H.J. Lu 2008-11-21 13:35:34 UTC
(In reply to comment #8)
> In short, set A={-favx, -ffma}, set B={-f3dnow, -f3dnowa, -fsse4a, -fsse5}. Any

It is -mXXX, not -fXXX.

> option combination from both sets should be prohibited.
> 

That is correct.

Comment 10 Richard Biener 2008-11-22 15:03:09 UTC
We should have -mfma to enable a fused multiply-add instruction that is available
when enabling either -msse5 or -mavx.  -mfma should not itself enable any
of the instruction set enabling features.

HJ, why did you need -mfma and could not have used -mfused-madd for this?
IMHO -mfma is confusing and should be removed.
Comment 11 H.J. Lu 2008-11-22 15:09:29 UTC
(In reply to comment #10)
> We should have -mfma to enable a fused multiply-add instruction that is
> available
> when enabling either -msse5 or -mavx.  -mfma should not itself enable any
> of the instruction set enabling features.
> 
> HJ, why did you need -mfma and could not have used -mfused-madd for this?
> IMHO -mfma is confusing and should be removed.
> 

Intel FMA is a separate instruction set with its own feature bit in CPUID.
Using -mfused-madd -mavx to enable an instruction set doesn't look
appropriate to me.
Comment 12 H.J. Lu 2008-11-22 15:15:33 UTC
Richard asked:

Why should it (-mavx -msse5) be disallowed if a user asks for it?  Do we
disallow -msse4a -mssse4?

Reply:

-msse4a -mssse4 can generate code which runs if you check the feature
bit in CPUID before calling an appropriate function. But -mavx -msse5
will generate codes which won't run on any machines.
Comment 13 Richard Biener 2008-11-22 15:31:38 UTC
I see.
Comment 14 Andrew Pinski 2008-11-23 18:28:11 UTC
>But -mavx -msse5 will generate codes which won't run on any machines.

It could run on a simulator that has both (or a new processor which has not come out yet).  Or are there conflicts with the opcodes themselves?  So I don't see why this is a really a bug.
Comment 15 H.J. Lu 2008-12-03 16:50:15 UTC
Simulator is fine. AVX executable can only run on simulator. If
there is a simulator which can run SSE5 and AVX, we will add
a new switch for it.
Comment 16 H.J. Lu 2008-12-09 21:58:04 UTC
A patch is posted at

http://gcc.gnu.org/ml/gcc-patches/2008-12/msg00585.html
Comment 17 H.J. Lu 2011-01-16 15:56:59 UTC
-msse5 is gone.