[PATCH v2] libcpp: Implement -Wbidirectional for CVE-2021-42574 [PR103026]

Marek Polacek polacek@redhat.com
Mon Nov 8 21:33:43 GMT 2021


Ping, can we conclude on the name?   IMHO, -Wbidirectional is just fine,
but changing the name is a trivial operation. 

On Tue, Nov 02, 2021 at 03:52:05PM -0400, Marek Polacek wrote:
> On Tue, Nov 02, 2021 at 01:20:03PM -0600, Martin Sebor wrote:
> > On 11/2/21 11:18 AM, Marek Polacek via Gcc-patches wrote:
> > > On Mon, Nov 01, 2021 at 10:10:40PM +0000, Joseph Myers wrote:
> > > > On Mon, 1 Nov 2021, Marek Polacek via Gcc-patches wrote:
> > > > 
> > > > > +  /* We've read a bidi char, update the current vector as necessary.  */
> > > > > +  void on_char (kind k, bool ucn_p)
> > > > > +  {
> > > > > +    switch (k)
> > > > > +      {
> > > > > +      case kind::LRE:
> > > > > +      case kind::RLE:
> > > > > +      case kind::LRO:
> > > > > +      case kind::RLO:
> > > > > +	vec.push (ucn_p ? 3u : 1u);
> > > > > +	break;
> > > > > +      case kind::LRI:
> > > > > +      case kind::RLI:
> > > > > +      case kind::FSI:
> > > > > +	vec.push (ucn_p ? 2u : 0u);
> > > > > +	break;
> > > > > +      case kind::PDF:
> > > > > +	if (current_ctx () == kind::PDF)
> > > > > +	  pop ();
> > > > > +	break;
> > > > > +      case kind::PDI:
> > > > > +	if (current_ctx () == kind::PDI)
> > > > > +	  pop ();
> > > > 
> > > > My understanding is that PDI should pop all intermediate PDF contexts
> > > > outward to a PDI context, which it also pops.  (But if it's embedded only
> > > > in PDF contexts, with no PDI context containing it, it doesn't pop
> > > > anything.)
> > > > 
> > > > I think failing to handle that only means libcpp sometimes models there
> > > > as being more bidirectional contexts open than there should be, so it
> > > > might give spurious warnings when in fact all such contexts had been
> > > > closed by end of string or comment.
> > > 
> > > Ah, you're right.
> > > https://www.unicode.org/reports/tr9/#Terminating_Explicit_Directional_Isolates
> > > says that "[PDI] terminates the scope of the last LRI, RLI, or FSI whose
> > > scope has not yet been terminated, as well as the scopes of any subsequent
> > > LREs, RLEs, LROs, or RLOs whose scopes have not yet been terminated."
> > > but PDF doesn't have the latter quirk.
> > > 
> > > Fixed in the below: I added a suitable truncate into on_char.  The new test
> > > Wbidirectional-14.c exercises the handling of PDI.
> > > 
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > 
> > > -- >8 --
> > >  From a link below:
> > > "An issue was discovered in the Bidirectional Algorithm in the Unicode
> > > Specification through 14.0. It permits the visual reordering of
> > > characters via control sequences, which can be used to craft source code
> > > that renders different logic than the logical ordering of tokens
> > > ingested by compilers and interpreters. Adversaries can leverage this to
> > > encode source code for compilers accepting Unicode such that targeted
> > > vulnerabilities are introduced invisibly to human reviewers."
> > > 
> > > More info:
> > > https://nvd.nist.gov/vuln/detail/CVE-2021-42574
> > > https://trojansource.codes/
> > > 
> > > This is not a compiler bug.  However, to mitigate the problem, this patch
> > > implements -Wbidirectional=[none|unpaired|any] to warn about possibly
> > > misleading Unicode bidirectional characters the preprocessor may encounter.
> > 
> > Birectional sounds very general.  Can we come up with a name
> > that's a bit more descriptive of the problem the warning reports?
> > From skimming the docs and the tests it looks like the warning
> > points out uses of bidirectonal characters in the program source
> > code as well as comments.  Would -Wbidirectional-text be better?
> > Or -Wbidirectional-chars?  (If Clang is also adding a warning
> > for this, syncing up with them one way or the other might be
> > helpful.)
> 
> I dunno, I could go with -Wbidirectional-chars.  Does anyone else
> think I should rename the current name to -Wbidirectional-chars?
> 
> Other ideas: -Wunicode-bidi / -Wmultibyte-chars / -Wmisleading-bidirectional.
> 
> The patch for clang-tidy I saw called this misleading-bidirectional.
> 
> > ...
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index c5730228821..9dfb95dc24c 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -327,7 +327,9 @@ Objective-C and Objective-C++ Dialects}.
> > >   -Warith-conversion @gol
> > >   -Warray-bounds  -Warray-bounds=@var{n}  -Warray-compare @gol
> > >   -Wno-attributes  -Wattribute-alias=@var{n} -Wno-attribute-alias @gol
> > > --Wno-attribute-warning  -Wbool-compare  -Wbool-operation @gol
> > > +-Wno-attribute-warning  @gol
> > > +-Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]} @gol
> > > +-Wbool-compare  -Wbool-operation @gol
> > >   -Wno-builtin-declaration-mismatch @gol
> > >   -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
> > >   -Wc11-c2x-compat @gol
> > > @@ -7674,6 +7676,21 @@ Attributes considered include @code{alloc_align}, @code{alloc_size},
> > >   This is the default.  You can disable these warnings with either
> > >   @option{-Wno-attribute-alias} or @option{-Wattribute-alias=0}.
> > > +@item -Wbidirectional=@r{[}none@r{|}unpaired@r{|}any@r{]}
> > > +@opindex Wbidirectional=
> > > +@opindex Wbidirectional
> > > +@opindex Wno-bidirectional
> > > +Warn about UTF-8 bidirectional characters.
> > 
> > I suggest to mention where.  If everywhere, enumerate the most
> > common contexts to make it clear it means everywhere:
> > 
> >   Warn about UTF-8 bidirectional characters in source code,
> >   including string literals, identifiers, and comments.
> 
> OK, I've updated the text to:
> 
> Warn about possibly misleading UTF-8 bidirectional characters in comments,
> string literals, character constants, and identifiers.
> 
> Marek

Marek



More information about the Gcc-patches mailing list