Bug 103027 - Implement warning for homoglyphs in identifiers
Summary: Implement warning for homoglyphs in identifiers
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: preprocessor (show other bugs)
Version: 12.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL: https://gcc.gnu.org/pipermail/gcc-pat...
Keywords: diagnostic, patch
Depends on:
Blocks: new-warning, new_warning
  Show dependency treegraph
 
Reported: 2021-11-01 15:05 UTC by David Malcolm
Modified: 2024-11-19 03:26 UTC (History)
6 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2021-11-01 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Malcolm 2021-11-01 15:05:16 UTC
An issue was discovered in the character definitions of the Unicode Specification through 14.0. The specification allows an adversary to produce source code identifiers such as function names using homoglyphs that render visually identical to a target identifier. Adversaries can leverage this to inject code via adversarial identifier definitions in upstream software dependencies invoked deceptively in downstream software.

We ought to have a diagnostic the warns about such problematic identifiers.

More info:
https://nvd.nist.gov/vuln/detail/CVE-2021-42694
https://trojansource.codes/
Comment 1 David Malcolm 2021-11-01 15:17:01 UTC
I have a work-in-progress patch for this, though it has some issues that need discussion; I hope to post it soon.
Comment 2 David Malcolm 2021-11-01 21:15:01 UTC
Initial version of patch posted for discussion to:
  https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583039.html
Comment 3 David Malcolm 2021-11-02 14:12:46 UTC
For reference, here's a patch to clang-tidy for this (currently under review):
  https://reviews.llvm.org/D112916
Comment 4 Reini Urban 2022-02-20 15:33:20 UTC
Just checking confusables.txt and ignoring the official TR39 Unicode security guidelines for identifiers won't get you very far. It's merely fighting a tiny symptom of a huge attack space.

I suggest to properly implement TR39, such as I did in libu8ident and proposed to the C++/C working groups. Latest here: https://github.com/rurban/libu8ident/blob/master/doc/P2528R1.md

confusables.txt itself is almost useless. I used it only to restrict some Greek letters not to be confused with its Latin counterparts. Checking mixed scripts is much more secure.

Note that the TR31 XID lists are also pretty insecure still, even if C23 will restrict the XID's to the official TR31 XID lists.
Comment 5 Eric Gallager 2022-04-13 13:18:24 UTC
Example bug that this warning flag could have found, if the string involved were a C string: https://twitter.com/nyt_first_said/status/1513148451210637313
Comment 6 Sundeep KOKKONDA 2024-11-19 03:03:40 UTC
This bug is still not in Confirmed status. Is it still active and is there a fix planned?
Comment 7 Andrew Pinski 2024-11-19 03:26:05 UTC
(In reply to Sundeep KOKKONDA from comment #6)
> This bug is still not in Confirmed status. Is it still active and is there a
> fix planned?

For bugzilla New is the confirmed status. While there is an unconfirmed status.
I have not looked into why the patch was not included yet though.