This is the mail archive of the mailing list for the GCC project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Design for flag bit outputs from asms

On 05/02/2015 05:39 AM, Peter Zijlstra wrote:
> static inline bool __test_and_clear_bit(long nr, volatile unsigned long *addr)
> {
> 	bool oldbit;
> 	asm volatile ("btr %2, %1"
> 		      : "CF" (oldbit), "+m" (*addr)
> 		      : "Ir" (nr));
> 	return oldbit;
> }
> Be the far better solution for this? Bug 59615 comment 7 states that
> they actually modeled the flags in the .md file, so the above should be
> possible to implement.
> Now GCC can decide to use "sbb %0, %0" to convert CF into a register
> value or use "jnc" / "jc" for branches, depending on what
> __test_and_clear_bit() was used for.
> We don't have to (ab)use asm goto for these things anymore; furthermore
> I think the above will naturally work with our __builtin_expect() hints,
> whereas the asm goto stuff has a hard time with that (afaik).
> That's not to say output operants for asm goto would not still be useful
> for other things (like your EXTABLE example).

(0) The C level output variable should be an integral type, from bool on up.

The flags are a scarse resource, easily clobbered.  We cannot allow user code
to keep data in the flags.  While x86 does have lahf/sahf, they don't exactly
perform well.  And other targets like arm don't even have that bad option.

Therefore, the language level semantics are that the output is a boolean store
into the variable with a condition specified by a magic constraint.

That said, just like the compiler should be able to optimize

        void bar(int y)
          int x = (y <= 0);
          if (x) foo();

such that we only use a single compare against y, the expectation is that
within a similarly constrained context the compiler will not require two tests
for these boolean outputs.


(1) Each target defines a set of constraint strings,

   E.g. for x86, wherein we're almost out of constraint letters,

     ja   aux carry flag
     jc   carry flag
     jo   overflow flag
     jp   parity flag
     js   sign flag
     jz   zero flag

   E.g. for arm/aarch64 (using "j" here, but other possibilities exist):

     jn   negative flag
     jc   carry flag
     jz   zero flag
     jv   overflow flag

   E.g. for s390x (I've thought less about what's useful here)

     j<m>  where m is a hex digit, and is the mask of CC values
           for which the condition is true; exactly corresponding
           to the M1 field in the branch on condition instruction.

(2) A new target hook post-processes the asm_insn, looking for the
    new constraint strings.  The hook expands the condition prescribed
    by the string, adjusting the asm_insn as required.


    bool x, y, z;
    asm ("xyzzy" : "=jc"(x), "=jp"(y), "=jo"(z) : : );


    (parallel [
            (set (reg:QI 83 [ x ])
                (asm_operands/v:QI ("xyzzy") ("=jc") 0 []
                     [] z.c:4))
            (set (reg:QI 84 [ y ])
                (asm_operands/v:QI ("xyzzy") ("=jp") 1 []
                     [] z.c:4))
            (set (reg:QI 85 [ z ])
                (asm_operands/v:QI ("xyzzy") ("=jo") 2 []
                     [] z.c:4))
            (clobber (reg:QI 18 fpsr))
            (clobber (reg:QI 17 flags))


    (parallel [
            (set (reg:CC 17 flags)
                (asm_operands/v:CC ("xyzzy") ("=j_") 0 []
                     [] z.c:4))
            (clobber (reg:QI 18 fpsr))
    (set (reg:QI 83 [ x ])
         (ne:QI (reg:CCC 17 flags) (const_int 0)))
    (set (reg:QI 84 [ y ])
         (ne:QI (reg:CCP 17 flags) (const_int 0)))
    (set (reg:QI 85 [ z ])
         (ne:QI (reg:CCO 17 flags) (const_int 0)))

  which ought to assemble to something like

    setc  %dl
    setp  %cl
    seto  %r15l

  Note that rtl level data flow is preserved via the flags hard register,
  and the lifetime of flags would not extended any further than we would
  for a normal cstore pattern.

  Note that the output constraints are adjusted to a single internal "=j_"
  which would match the flags register in any mode.  We can collapse
  several output flags to a single set of the flags hard register.

(3) Note that ppc is both easier and more complicated.

  There we have 8 4-bit registers, although most of the integer
  non-comparisons only write to CR0.  And the vector non-comparisons
  only write to CR1, though of course that's of less interest in the
  context of kernel code.

  For the purposes of cr0, the same scheme could certainly work, although
  the hook would not insert a hard register use, but rather a pseudo to
  be allocated to cr0 (constaint "x").

  That said, it's my understanding that "dot insns", setting cr0 are
  expensive in current processor generations.  There's also a lot less
  of the x86-style "operate and set a flag based on something useful".

Can anyone think of any drawbacks, pitfalls, or portability issues to less
popular targets that I havn't considered?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]