Bug 100758

Summary: __builtin_cpu_supports does not (always) detect "sse2"
Product: gcc Reporter: Erich Eckner <gcc>
Component: targetAssignee: Martin Liška <marxin>
Status: RESOLVED FIXED    
Severity: normal CC: gcc, iam, jakub, marxin, Mayshao-oc, sjames
Priority: P3    
Version: 11.1.0   
Target Milestone: ---   
Host: i686 Target: i686
Build: Known to work:
Known to fail: Last reconfirmed: 2021-05-26 00:00:00
Attachments: test.c: probe for sse2
cpuid probing

Description Erich Eckner 2021-05-25 18:37:24 UTC
Created attachment 50868 [details]
test.c: probe for sse2

I use the attached snippet to detect, whether the cpu supports sse2. This works most of the time, but fails to detect sse2 on two machines, which actually support sse2 (according to /proc/cpuinfo).

The affected machines run archlinux32. /proc/cpuinfo shows:

processor	: 0
vendor_id	: CentaurHauls
cpu family	: 6
model		: 15
model name	: VIA Nano U3400@800MHz
stepping	: 10
cpu MHz		: 798.016
cache size	: 2048 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush acpi mmx fxsr sse sse2 ss tm syscall nx lm constant_tsc arch_perfmon rep_good cpuid pni monitor vmx est tm2 ssse3 cx16 xtpr sse4_1 popcnt rng rng_en ace ace_en ace2 phe phe_en pmm pmm_en lahf_lm tpr_shadow vnmi vpid ida
vmx flags	: vnmi tsc_offset vtpr
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips	: 1596.53
clflush size	: 64
cache_alignment	: 128
address sizes	: 36 bits physical, 48 bits virtual
power management:

and
processor	: 0
vendor_id	: CentaurHauls
cpu family	: 6
model		: 13
model name	: VIA C7-D Processor 1800MHz
stepping	: 0
cpu MHz		: 1596.326
cache size	: 128 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fdiv_bug	: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge cmov pat clflush acpi mmx fxsr sse sse2 tm nx cpuid pni est tm2 xtpr rng rng_en ace ace_en ace2 ace2_en phe phe_en pmm pmm_en
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips	: 3193.67
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 32 bits virtual
power management:

respectively. On these machines, the output is "sse2: 0" instead of some value unequal 0.

Regards,
Erich

P.S.: I hope, I reported this in the correct category. Please let me know, if I did not.
Comment 1 Martin Liška 2021-05-26 07:57:22 UTC
Thank you for the report. You are right, GCC does not detect features for VIA (Centaur) CPUs:

https://github.com/gcc-mirror/gcc/blob/403bb89bd7f4ec03d4dcbdf8668d0187358631a0/gcc/common/config/i386/cpuinfo.h#L925-L926

Well, I can theoretically add it, but we speak about pretty legacy hardware. I don't know if it worth doing? What's your use case Erich?
Comment 2 Erich Eckner 2021-05-26 09:12:31 UTC
We use this in archlinux32 to detect, if we can install packages, that have sse2 opcodes:

If one sets "Architecture = auto" in /etc/pacman.conf, uname only gives "i686" in both cases (this is how archlinux does/did the probing), and then, we probe for sse2 to check, if we really are "i686" or "pentium4" (our nomenclature for "i686" + sse2). This works well on amd and intel cpus, but fails on via cpus.

Details about expected features: https://archlinux32.org/architecture/

We can add some cumbersome code which probes for sse2, but I'd really prefer some compiler builtin instead :-)
Comment 3 Martin Liška 2021-05-26 09:23:09 UTC
(In reply to Erich Eckner from comment #2)
> We use this in archlinux32 to detect, if we can install packages, that have
> sse2 opcodes:
> 
> If one sets "Architecture = auto" in /etc/pacman.conf, uname only gives
> "i686" in both cases (this is how archlinux does/did the probing), and then,
> we probe for sse2 to check, if we really are "i686" or "pentium4" (our
> nomenclature for "i686" + sse2). This works well on amd and intel cpus, but
> fails on via cpus.
> 
> Details about expected features: https://archlinux32.org/architecture/

I see.

> 
> We can add some cumbersome code which probes for sse2, but I'd really prefer
> some compiler builtin instead :-)

I can implement it for GCC 12.1 if you want (May 2022 release date). As I don't have access to the VIA machines, can you please paste the content of %eax, %ebx, %ecx and %edx for 'cpuid' instruction?
Comment 4 Erich Eckner 2021-05-26 09:46:27 UTC
Created attachment 50871 [details]
cpuid probing

Does the attached program yield, what you need? (Sry, I'm quite unfamiliar with asm in gcc)

It gives:

00000001 746e6543 736c7561 48727561 
000006d0 00000800 00004181 a7c9bbff 

and

0000000a 746e6543 736c7561 48727561 
000006fa 00010800 008863a9 afc9fbff 

on the two machines, respectively.
Comment 5 Richard Biener 2021-05-26 09:54:07 UTC
(In reply to Erich Eckner from comment #2)
> We use this in archlinux32 to detect, if we can install packages, that have
> sse2 opcodes:
> 
> If one sets "Architecture = auto" in /etc/pacman.conf, uname only gives
> "i686" in both cases (this is how archlinux does/did the probing), and then,
> we probe for sse2 to check, if we really are "i686" or "pentium4" (our
> nomenclature for "i686" + sse2). This works well on amd and intel cpus, but
> fails on via cpus.
> 
> Details about expected features: https://archlinux32.org/architecture/
> 
> We can add some cumbersome code which probes for sse2, but I'd really prefer
> some compiler builtin instead :-)

Eh, including cpuid.h and using that to check for SSE2 shouldn't be too hard
(or parsing /proc/cpuinfo).
Comment 6 Martin Liška 2021-05-26 10:09:17 UTC
(In reply to Richard Biener from comment #5)
> (In reply to Erich Eckner from comment #2)
> > We use this in archlinux32 to detect, if we can install packages, that have
> > sse2 opcodes:
> > 
> > If one sets "Architecture = auto" in /etc/pacman.conf, uname only gives
> > "i686" in both cases (this is how archlinux does/did the probing), and then,
> > we probe for sse2 to check, if we really are "i686" or "pentium4" (our
> > nomenclature for "i686" + sse2). This works well on amd and intel cpus, but
> > fails on via cpus.
> > 
> > Details about expected features: https://archlinux32.org/architecture/
> > 
> > We can add some cumbersome code which probes for sse2, but I'd really prefer
> > some compiler builtin instead :-)
> 
> Eh, including cpuid.h and using that to check for SSE2 shouldn't be too hard
> (or parsing /proc/cpuinfo).

Yeah, the following should work for you:

#include <cpuid.h>

int main()
{
  unsigned int eax, ebx, ecx, edx;
  __get_cpuid(0, &eax, &ebx, &ecx, &edx);

  if (edx & bit_SSE2)
    __builtin_printf ("has SSE2\n");
}
Comment 7 Erich Eckner 2021-05-26 10:13:07 UTC
Yes, we could do

#include <cpuinfo.h>
unsigned int regs[4];
__get_cpuid(0, regs, regs+1, regs+2, regs+3);
if (regs[3] & bit_SSE2) {
  ...
}

The "cumbersome" in my first comment referred to doing all that by hand, which was the alternative, we saw so far.

Ok, I'll switch to that code and stop bothering you :-)

Thanks for looking into this!
Comment 8 Martin Liška 2021-06-01 13:26:06 UTC
Closing as wontfix.
Comment 9 ValdikSS 2023-01-27 20:48:10 UTC
May I ask why was is closed as WONTFIX?
Just got hit by this bug in Debian 12, which uses special "sse2-support" package to detect SSE2 support by using __builtin_cpu_supports. It fails on VIA Eden Eshter.
/proc/cpuinfo returns SSE2 support.
Comment 10 Martin Liška 2023-02-01 14:03:05 UTC
(In reply to ValdikSS from comment #9)
> May I ask why was is closed as WONTFIX?

Because we're not planning to support such legacy hardware.

> It fails on VIA Eden Eshter.

Is it critical that the feature is not detected? We speak here about 15 years old hardware if I see correctly.
Comment 11 ValdikSS 2023-02-01 14:23:45 UTC
Well, the function is called __builtin_cpu_supports, for which one might expect that it just checks CPUID and returns reliable results, while in reality it operates using the build-in CPU support list. The function does not return an error if it's unable to detect the feature, resulting in incorrect results, which makes the function unreliable. The fact that it may not always detect features which are present in the CPU is not documented, which brings the confusion among developers such as this bug.

VIA processors have progressed into Zhaoxin CPU family which is fairly recent (2014-2019, with plans to release new processors soon). They both share almost the same CPUID data, that's why adding support for either of these CPU automatically implements the other one.
Comment 12 Martin Liška 2023-02-01 14:32:32 UTC
(In reply to ValdikSS from comment #11)
> Well, the function is called __builtin_cpu_supports, for which one might
> expect that it just checks CPUID and returns reliable results, while in
> reality it operates using the build-in CPU support list. The function does
> not return an error if it's unable to detect the feature, resulting in
> incorrect results, which makes the function unreliable. The fact that it may
> not always detect features which are present in the CPU is not documented,
> which brings the confusion among developers such as this bug.

Sure, it's not perfect, I can confirm that. I'm going to introduce a documentation entry for this limitation. 

> 
> VIA processors have progressed into Zhaoxin CPU family which is fairly
> recent (2014-2019, with plans to release new processors soon).

Yep, and we support these (vendor == signature_SHANGHAI_ebx) correctly since
r13-713-ga239aff82c3771.

Maybe the author of the commit will be interested in old VIA CPUs?
Comment 13 GCC Commits 2023-02-09 10:50:01 UTC
The master branch has been updated by Martin Liska <marxin@gcc.gnu.org>:

https://gcc.gnu.org/g:1189d1b38e2b9507488ea294cda771c79e972c1d

commit r13-5755-g1189d1b38e2b9507488ea294cda771c79e972c1d
Author: Martin Liska <mliska@suse.cz>
Date:   Thu Feb 9 11:33:31 2023 +0100

    docs: add caveat for __builtin_cpu_supports
    
    Document that the function does not work correctly for old
    VIA processors.
    
            PR target/100758
    
    gcc/ChangeLog:
    
            * doc/extend.texi: Document that the function
            does not work correctly for old VIA processors.
Comment 14 Jakub Jelinek 2023-02-09 11:40:00 UTC
Given the above dumps, wouldn't it be just a matter of:
--- gcc/common/config/i386/cpuinfo.h.jj	2023-01-16 11:52:15.910736614 +0100
+++ gcc/common/config/i386/cpuinfo.h	2023-02-09 12:30:31.430185107 +0100
@@ -601,8 +601,8 @@ get_intel_cpu (struct __processor_model
 
 static inline const char *
 get_zhaoxin_cpu (struct __processor_model *cpu_model,
-		struct __processor_model2 *cpu_model2,
-		unsigned int *cpu_features2)
+		 struct __processor_model2 *cpu_model2,
+		 unsigned int *cpu_features2)
 {
   const char *cpu = NULL;
   unsigned int family = cpu_model2->__cpu_family;
@@ -1057,24 +1057,27 @@ cpu_indicator_init (struct __processor_m
       cpu_model->__cpu_vendor = VENDOR_AMD;
     }
   else if (vendor == signature_CENTAUR_ebx && family < 0x07)
-    cpu_model->__cpu_vendor = VENDOR_CENTAUR;
+    {
+      /* Find available features. */
+      get_available_features (cpu_model, cpu_model2, cpu_features2,
+			      ecx, edx);
+      cpu_model->__cpu_vendor = VENDOR_CENTAUR;
+    }
   else if (vendor == signature_SHANGHAI_ebx
-		|| vendor == signature_CENTAUR_ebx)
+	   || vendor == signature_CENTAUR_ebx)
     {
       /* Adjust model and family for ZHAOXIN CPUS.  */
       if (family == 0x07)
-	{
-	  model += extended_model;
-	}
+	model += extended_model;
 
       cpu_model2->__cpu_family = family;
       cpu_model2->__cpu_model = model;
 
       /* Find available features.  */
       get_available_features (cpu_model, cpu_model2, cpu_features2,
-				  ecx, edx);
+			      ecx, edx);
       /* Get CPU type.  */
-      get_zhaoxin_cpu (cpu_model, cpu_model2,cpu_features2);
+      get_zhaoxin_cpu (cpu_model, cpu_model2, cpu_features2);
       cpu_model->__cpu_vendor = VENDOR_ZHAOXIN;
     }
   else if (vendor == signature_CYRIX_ebx)

The important part just adding get_available_features for the VENDOR_CENTAUR case (otherwise I've just fixed up the messed up formatting of the Zhaoxin stuff).
In fact, I wonder why get_available_features isn't called unconditionally for all CPUs that support at least max level >= 1.  Or is the worry that
some CPUs might misbehave on CPUID 0x80000000 leaf?
Comment 15 Martin Liška 2023-02-09 11:47:24 UTC
Your patch might work.

> In fact, I wonder why get_available_features isn't called unconditionally

Yes, I would also expect that, but it was not the case even before the big refactoring in g:1890f2f0e210ef515c39728c54151372d36dd187.

> for all CPUs that support at least max level >= 1.  Or is the worry that
> some CPUs might misbehave on CPUID 0x80000000 leaf?

Maybe H.J. can answer the question?
Comment 16 Jakub Jelinek 2023-02-09 12:22:56 UTC
(In reply to Erich Eckner from comment #4)
> Created attachment 50871 [details]
> cpuid probing
> 
> Does the attached program yield, what you need? (Sry, I'm quite unfamiliar
> with asm in gcc)
> 
> It gives:
> 
> 00000001 746e6543 736c7561 48727561 
> 000006d0 00000800 00004181 a7c9bbff 
> 
> and
> 
> 0000000a 746e6543 736c7561 48727561 
> 000006fa 00010800 008863a9 afc9fbff 
> 
> on the two machines, respectively.

Erich, if you still have access to those CPUs, could you retry with:
#include <stdio.h>

static void cpuid( unsigned int ax, unsigned int cx, unsigned int *p )
{
  unsigned int flags;

  __asm __volatile
    ("movl %%ebx, %%esi\n\t"
     "cpuid\n\t"
      "xchgl %%ebx, %%esi"
      : "=a" (p[0]), "=S" (p[1]),
      "=c" (p[2]), "=d" (p[3])
      : "0" (ax), "2" (cx));
}

int main() {
  unsigned int regs[4];
  for (int j=0; j<=18; j++) {
    int k = j > 10 ? -__INT_MAX__ - 1 + (j - 11) : j;
    cpuid( k, regs, 0 );
    printf("%d %d ", k, 0);
    for (int i=0; i<4; i++) {
      printf("%08x ", regs[i]);
    }
    printf("\n");
    cpuid( k, regs, 1 );
    printf("%d %d ", k, 1);
    for (int i=0; i<4; i++) {
      printf("%08x ", regs[i]);
    }
    printf("\n");
  }
  return 0;
}

Thanks.

I've posted a different patch to gcc-patches:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/611632.html
Comment 17 Erich Eckner 2023-02-09 16:22:20 UTC
With that, I get a segfault in cpuid():

(gdb) run
Starting program: /tmp/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000555555555191 in cpuid ()
(gdb) bt
#0  0x0000555555555191 in cpuid ()
#1  0x00005555555551ef in main ()
(gdb) 

... same on x86_64 - maybe, there's a typo, somewhere?
Comment 18 Jakub Jelinek 2023-02-09 16:37:27 UTC
(In reply to Erich Eckner from comment #17)
> With that, I get a segfault in cpuid():
> 
> (gdb) run
> Starting program: /tmp/a.out 
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/libthread_db.so.1".
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000555555555191 in cpuid ()
> (gdb) bt
> #0  0x0000555555555191 in cpuid ()
> #1  0x00005555555551ef in main ()
> (gdb) 
> 
> ... same on x86_64 - maybe, there's a typo, somewhere?

Sorry, wrote it without actually trying to compile it, I've swapped the last 2 arguments of cpuid.

Use
static void cpuid( unsigned int ax, unsigned int cx, unsigned int *p )
{
  unsigned int flags;

  __asm __volatile
    ("movl %%ebx, %%esi\n\t"
     "cpuid\n\t"
      "xchgl %%ebx, %%esi"
      : "=a" (p[0]), "=S" (p[1]),
      "=c" (p[2]), "=d" (p[3])
      : "0" (ax), "2" (cx));
}

int main() {
  unsigned int regs[4];
  for (int j=0; j<=18; j++) {
    int k = j > 10 ? -__INT_MAX__ - 1 + (j - 11) : j;
    cpuid( k, 0, regs );
    __builtin_printf("%d %d ", k, 0);
    for (int i=0; i<4; i++) {
      __builtin_printf("%08x ", regs[i]);
    }
    __builtin_printf("\n");
    cpuid( k, 1, regs );
    __builtin_printf("%d %d ", k, 1);
    for (int i=0; i<4; i++) {
      __builtin_printf("%08x ", regs[i]);
    }
    __builtin_printf("\n");
  }
  return 0;
}
Comment 19 GCC Commits 2023-02-09 16:45:56 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:b24e9c083093a9e1b1007933a184c02f7ff058db

commit r13-5759-gb24e9c083093a9e1b1007933a184c02f7ff058db
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Feb 9 17:43:19 2023 +0100

    i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]
    
    get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
    and just sets stuff up based on CPUID leaf 1, or some extended ones,
    so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
    and not for all other CPUs too?  I think various programs in the wild
    which aren't using __builtin_cpu_{is,supports} just check the various CPUID
    leafs and query bits in there, without blacklisting unknown CPU vendors,
    so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
    if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
    1 or some extended ones.  Calling it for all CPUs also means it can be
    inlined because there will be just a single caller.
    
    I have tested it on Intel and Martin tested it on AMD, but can't test it
    on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
    behavior.
    
    2023-02-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/100758
            * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Formatting fixes.
            (cpu_indicator_init): Call get_available_features for all CPUs with
            max_level >= 1, rather than just Intel, AMD or Zhaoxin.  Formatting
            fixes.
Comment 20 Erich Eckner 2023-02-09 20:37:20 UTC
Yeah, now it pulled some stuff :-)

0 0 0000000a 746e6543 736c7561 48727561 
0 1 0000000a 746e6543 736c7561 48727561 
1 0 000006fa 00010800 008863a9 afc9fbff 
1 1 000006fa 00010800 008863a9 afc9fbff 
2 0 02b3b001 00000000 00000000 2c04307d 
2 1 02b3b001 00000000 00000000 2c04307d 
3 0 00000000 00000000 00000000 00000000 
3 1 00000000 00000000 00000000 00000000 
4 0 00000000 00000000 00000000 00000000 
4 1 00000000 00000000 00000000 00000000 
5 0 00000040 00000040 00000003 00022220 
5 1 00000040 00000040 00000003 00022220 
6 0 00000002 00000000 00000000 00000000 
6 1 00000002 00000000 00000000 00000000 
7 0 00000000 00000000 00000000 00000000 
7 1 00000000 00000000 00000000 00000000 
8 0 00000000 00000000 00000000 00000000 
8 1 00000000 00000000 00000000 00000000 
9 0 00000000 00000000 00000000 00000000 
9 1 00000000 00000000 00000000 00000000 
10 0 06280202 00000000 00000000 00000503 
10 1 06280202 00000000 00000000 00000503 
-2147483648 0 80000008 00000000 00000000 00000000 
-2147483648 1 80000008 00000000 00000000 00000000 
-2147483647 0 00000000 00000000 00000001 20100800 
-2147483647 1 00000000 00000000 00000001 20100800 
-2147483646 0 20202020 20202020 20202020 20202020 
-2147483646 1 20202020 20202020 20202020 20202020 
-2147483645 0 20202020 20202020 49562020 614e2041 
-2147483645 1 20202020 20202020 49562020 614e2041 
-2147483644 0 55206f6e 30303433 30303840 007a484d 
-2147483644 1 55206f6e 30303433 30303840 007a484d 
-2147483643 0 00000000 08800880 40100140 40100140 
-2147483643 1 00000000 08800880 40100140 40100140 
-2147483642 0 00000000 00000000 04008140 00000000 
-2147483642 1 00000000 00000000 04008140 00000000 
-2147483641 0 00000000 00000000 00000000 00000000 
-2147483641 1 00000000 00000000 00000000 00000000

Thank you for your time!
Comment 21 ValdikSS 2023-02-09 21:35:29 UTC
VIA Eden Esther (32-bit core)

0 0 00000001 746e6543 736c7561 48727561 
0 1 00000001 746e6543 736c7561 48727561 
1 0 000006d0 00000800 00004181 a7c9bbff 
1 1 000006d0 00000800 00004181 a7c9bbff 
2 0 00000000 00000000 00000000 00000000 
2 1 00000000 00000000 00000000 00000000 
3 0 00000000 00000000 00000000 00000000 
3 1 00000000 00000000 00000000 00000000 
4 0 00000000 00000000 00000000 00000000 
4 1 00000000 00000000 00000000 00000000 
5 0 00000000 00000000 00000000 00000000 
5 1 00000000 00000000 00000000 00000000 
6 0 00000000 00000000 00000000 00000000 
6 1 00000000 00000000 00000000 00000000 
7 0 00000000 00000000 00000000 00000000 
7 1 00000000 00000000 00000000 00000000 
8 0 00000000 00000000 00000000 00000000 
8 1 00000000 00000000 00000000 00000000 
9 0 00000000 00000000 00000000 00000000 
9 1 00000000 00000000 00000000 00000000 
10 0 00000000 00000000 00000000 00000000 
10 1 00000000 00000000 00000000 00000000 
-2147483648 0 80000006 00000000 00000000 00000000 
-2147483648 1 80000006 00000000 00000000 00000000 
-2147483647 0 00000000 00000000 00000000 00100000 
-2147483647 1 00000000 00000000 00000000 00100000 
-2147483646 0 20202020 20202020 20202020 20202020 
-2147483646 1 20202020 20202020 20202020 20202020 
-2147483645 0 20202020 41495620 65644520 7250206e 
-2147483645 1 20202020 41495620 65644520 7250206e 
-2147483644 0 7365636f 20726f73 30303031 007a484d 
-2147483644 1 7365636f 20726f73 30303031 007a484d 
-2147483643 0 00000000 08800880 40040140 40040140 
-2147483643 1 00000000 08800880 40040140 40040140 
-2147483642 0 00000000 00000000 0080a140 00000000 
-2147483642 1 00000000 00000000 0080a140 00000000 
-2147483641 0 00000000 00000000 00000000 00000000 
-2147483641 1 00000000 00000000 00000000 00000000
Comment 22 Martin Liška 2023-02-10 08:27:50 UTC
Thank you Jakub, please revert my documentation patch if you are convinced enough the change works only on old VIA CPUs.
Comment 23 Jakub Jelinek 2023-02-10 09:45:04 UTC
I believe in the #c20 case you'd get
FEATURE_CMOV
FEATURE_MMX
FEATURE_SSE
FEATURE_SSE2
FEATURE_CMPXCHG8B
FEATURE_FXSAVE
FEATURE_POPCNT
FEATURE_SSE3
FEATURE_SSSE3
FEATURE_SSE4_1
FEATURE_CMPXCHG16B
FEATURE_LAHF_LM
FEATURE_LM
FEATURE_X86_64_BASELINE
set and in the #c21 case
FEATURE_CMOV
FEATURE_MMX
FEATURE_SSE
FEATURE_SSE2
FEATURE_CMPXCHG8B
FEATURE_FXSAVE
FEATURE_SSE3
If that matches what those CPUs provide (say compared to /proc/cpuinfo), then I think we are good.  The change has been committed to trunk already, so you can try it yourself (or apply the commit patch to say gcc 12).
Comment 24 Mayshao-oc 2023-02-20 09:50:44 UTC
Hi  Jakub:
                https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100758
                Thanks for your patch. We test it works on all zhaoxin platforms. We find the same bug still exist in previous  gcc releases , could you backport this patch to previous release?

BR
Mayshao
Comment 25 GCC Commits 2023-03-19 05:29:31 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:454bf9f4d55058589ac6a76261356cbda599e831

commit r12-9273-g454bf9f4d55058589ac6a76261356cbda599e831
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Feb 9 17:43:19 2023 +0100

    i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]
    
    get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
    and just sets stuff up based on CPUID leaf 1, or some extended ones,
    so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
    and not for all other CPUs too?  I think various programs in the wild
    which aren't using __builtin_cpu_{is,supports} just check the various CPUID
    leafs and query bits in there, without blacklisting unknown CPU vendors,
    so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
    if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
    1 or some extended ones.  Calling it for all CPUs also means it can be
    inlined because there will be just a single caller.
    
    I have tested it on Intel and Martin tested it on AMD, but can't test it
    on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
    behavior.
    
    2023-02-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/100758
            * common/config/i386/cpuinfo.h (cpu_indicator_init): Call
            get_available_features for all CPUs with max_level >= 1, rather
            than just Intel or AMD.
    
    (cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)
Comment 26 GCC Commits 2023-05-02 20:14:50 UTC
The releases/gcc-11 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7fce881241b4f6492d8d321322fcb3412dea2137

commit r11-10713-g7fce881241b4f6492d8d321322fcb3412dea2137
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Feb 9 17:43:19 2023 +0100

    i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]
    
    get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
    and just sets stuff up based on CPUID leaf 1, or some extended ones,
    so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
    and not for all other CPUs too?  I think various programs in the wild
    which aren't using __builtin_cpu_{is,supports} just check the various CPUID
    leafs and query bits in there, without blacklisting unknown CPU vendors,
    so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
    if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
    1 or some extended ones.  Calling it for all CPUs also means it can be
    inlined because there will be just a single caller.
    
    I have tested it on Intel and Martin tested it on AMD, but can't test it
    on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
    behavior.
    
    2023-02-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/100758
            * common/config/i386/cpuinfo.h (cpu_indicator_init): Call
            get_available_features for all CPUs with max_level >= 1, rather
            than just Intel or AMD.
    
    (cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)
Comment 27 GCC Commits 2023-05-03 15:21:40 UTC
The releases/gcc-10 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:be80a2e64a38540a371fa4f03513653d2741bc89

commit r10-11367-gbe80a2e64a38540a371fa4f03513653d2741bc89
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Thu Feb 9 17:43:19 2023 +0100

    i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]
    
    get_available_features doesn't depend on cpu_model2->__cpu_{family,model}
    and just sets stuff up based on CPUID leaf 1, or some extended ones,
    so I wonder why are we calling it separately for Intel, AMD and Zhaoxin
    and not for all other CPUs too?  I think various programs in the wild
    which aren't using __builtin_cpu_{is,supports} just check the various CPUID
    leafs and query bits in there, without blacklisting unknown CPU vendors,
    so I think even __builtin_cpu_supports ("sse2") etc. should be reliable
    if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf
    1 or some extended ones.  Calling it for all CPUs also means it can be
    inlined because there will be just a single caller.
    
    I have tested it on Intel and Martin tested it on AMD, but can't test it
    on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in
    behavior.
    
    2023-02-09  Jakub Jelinek  <jakub@redhat.com>
    
            PR target/100758
            * config/i386/cpuinfo.c (cpu_indicator_init): Call
            get_available_features for all CPUs with max_level >= 1, rather
            than just Intel or AMD.
    
    (cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)