[PATCH 1/2] Flag CPP_W_BIDIRECTIONAL so that source lines are escaped

David Malcolm dmalcolm@redhat.com
Tue Nov 2 21:07:25 GMT 2021


On Tue, 2021-11-02 at 16:58 -0400, David Malcolm wrote:
> Before:
> 
>   Wbidirectional-1.c: In function ‘main’:
>   Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       6 |     /*‮ } ⁦if (isAdmin)⁩ ⁦ begin admins only */
>         |                                           ^
>   Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       9 |     /* end admins only ‮ { ⁦*/
>         |                            ^
> 
>   Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
>       6 | int LRE_‪_PDF_\u202c;
>         |               ^
> 
> After setting rich_loc.set_escape_on_output (true):
> 
>   Wbidirectional-1.c:6:43: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       6 |     /*<U+202E> } <U+2066>if (isAdmin)<U+2069> <U+2066>
> begin admins only */
>        
> |                                                                    
>        ^
>   Wbidirectional-1.c:9:28: warning: unpaired UTF-8 bidirectional
> character detected [-Wbidirectional=]
>       9 |     /* end admins only <U+202E> { <U+2066>*/
>         |                                            ^
> 
>   Wbidirectional-11.c:6:15: warning: UTF-8 vs UCN mismatch when
> closing a context by "U+202C (POP DIRECTIONAL FORMATTING)" [-
> Wbidirectional=]
>       6 | int LRE_<U+202A>_PDF_\u202c;
>         |                       ^
> 
> libcpp/ChangeLog:
>         * lex.c (maybe_warn_bidi_on_close): Use a rich_location
>         and call set_escape_on_output (true) on it.
>         (maybe_warn_bidi_on_char): Likewise.
> 
> Signed-off-by: David Malcolm <dmalcolm@redhat.com>

[...snip...]

To be more explicit: part of the benefit of escaping non-ASCII bytes in
the source line is that it further mitigates against CVE-2021-42574,
since it "defangs" the bidi control characters - turning everything
into ASCII, so that the user can see the logical ordering of the
characters directly.  A similar consideration applies to homoglyph
attacks.

Dave




More information about the Gcc-patches mailing list