Bug 114461 - [C++26] P3034R1 - Disallow module declarations to be macros
Summary: [C++26] P3034R1 - Disallow module declarations to be macros
Status: NEW
Alias: None
Product: gcc
Classification: Unclassified
Component: c++ (show other bugs)
Version: 14.0
: P3 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords: accepts-invalid
Depends on:
Blocks: c++-core-issues c++26-core
  Show dependency treegraph
 
Reported: 2024-03-25 11:15 UTC by Jakub Jelinek
Modified: 2024-11-01 18:48 UTC (History)
8 users (show)

See Also:
Host:
Target:
Build:
Known to work:
Known to fail:
Last reconfirmed: 2024-03-25 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelinek 2024-03-25 11:15:48 UTC
See <https://wg21.link/P3034R1>.

DR for C++20 and up.
Comment 1 Andrew Pinski 2024-03-25 18:07:56 UTC
.
Comment 2 Jakub Jelinek 2024-08-07 11:34:57 UTC
Isn't the newly added http://eel.is/c++draft/cpp.module#2 in conflict with
http://eel.is/c++draft/cpp.module#3 ?
I mean if one is supposed to scan for
export module a.b.c : d.e.f
or similar without worrying about preprocessing, I think it should change the 3rd paragraph to talk about any preprocessing tokens after module and if present after tokens from pp-module-name and pp-module-partition if any.
Either trhe pp-module-name and pp-module-partition tokens if any have disabled macro expansion and then one can safely parse it without bothering about it, or they are macro expanded, but then the "No identifier in the pp-module-name or pp-module-partition shall currently be defined as an object-like macro."
doesn't make much sense as object-like macros were expanded already.
Things to consider:
#define DOT .
#define COLON :
#define FN(x) x
export module foo DOT bar;
export module baz COLON qux;
export module FN (;)
If the preprocessor should look through following tokens and as long as they match the pp-module-{name,partition} grammar have disabled expansion (tok->flags |= NO_EXPAND) it makes perfect sense to check for identifiers which are
          if (_cpp_defined_macro_p (node)
              && _cpp_maybe_notify_macro_use (pfile, node, tok->src_loc)
              && !cpp_fun_like_macro_p (node))
and error on those and perhaps the "shall not begin with a ( preprocessing token" is about preventing function-like macros in there.
Comment 3 Jakub Jelinek 2024-08-07 12:07:56 UTC
Oh, and
#define SEMICOLON ;
module SEMICOLON
or
#define EMTPY
module EMPTY ;
etc.
From what I understand, ; coming from macro was meant to be certainly ok since wg21.link/p1857r3 and the optional attribute after it supposedly too, including the [[s.
But if the module name and partition are supposed to be parseable without preprocessing, the semicolon alone can't come from object-like macro.
Comment 4 Jakub Jelinek 2024-08-07 12:54:44 UTC
Perhaps given http://eel.is/c++draft/cpp.pre#7 makes the default for preprocessing directives not macro expanded unless otherwise specified, change
http://eel.is/c++draft/cpp.module#3
"Any preprocessing tokens after the module preprocessing token in the module directive are processed just as in normal text."
to
"If pp-module-partition appears in pp-tokens, any preprocessing tokens after the last pp-module-partition token are processed just as in normal text; otherwise if pp-module-name appears in pp-tokens, any preprocessing tokens after the last pp-module-name token are processed just as in normal text."
That would make
#define EMPTY
export module EMPTY ;
and
#define SEMI
export module SEMI
;
and
#define DOT .
export module foo DOT bar ;
etc. invalid.
Comment 5 Jakub Jelinek 2024-08-07 15:47:05 UTC
Oh, and I think there is another problem with that paper.
module : private ;
is valid, but the addition of https://eel.is/c++draft/cpp.module#2 stands in a way,
the pp-tokens in that case don't match pp-module-name followed by something, the
first token is :.
So, : pp-tokens[opt] should be another valid form IMHO and in that case tokens after the : should be macro expanded.
#define PRIVATE :private
module PRIVATE;
would be invalid as that clashes with the other form.

I'm playing with:
--- libcpp/lex.cc.jj	2024-08-07 09:38:00.950805926 +0200
+++ libcpp/lex.cc	2024-08-07 17:23:22.987862447 +0200
@@ -3538,6 +3538,72 @@ cpp_maybe_module_directive (cpp_reader *
       /* Maybe tell the tokenizer we expect a header-name down the
 	 road.  */
       pfile->state.directive_file_token = header_count;
+
+      /* According to P3034R1, pp-module-name and pp-module-partition tokens
+	 if any shouldn't be macro expanded and identifiers shouldn't be
+	 defined as object-like macro.  */
+      if (!header_count && peek->type == CPP_NAME)
+	{
+	  int state = 0;
+	  do
+	    {
+	      cpp_token *tok = peek;
+	      if (tok->type == CPP_NAME)
+		{
+		  cpp_hashnode *node = tok->val.node.node;
+		  /* Don't attempt to expand the token.  */
+		  tok->flags |= NO_EXPAND;
+		  if (_cpp_defined_macro_p (node)
+		      && _cpp_maybe_notify_macro_use (pfile, node,
+						      tok->src_loc)
+		      && !cpp_fun_like_macro_p (node))
+		    {
+		      if (state == 0)
+			cpp_error_with_line (pfile, CPP_DL_ERROR,
+					     tok->src_loc, 0,
+					     "module name \"%s\" cannot "
+					     "be an object-like macro",
+					     NODE_NAME (node));
+		      else
+			cpp_error_with_line (pfile, CPP_DL_ERROR,
+					     tok->src_loc, 0,
+					     "module partition \"%s\" cannot "
+					     "be an object-like macro",
+					     NODE_NAME (node));
+		    }
+		}
+	      peek = _cpp_lex_direct (pfile);
+	      backup++;
+	      if (tok->type == CPP_NAME)
+		{
+		  if (peek->type == CPP_DOT)
+		    continue;
+		  else if (peek->type == CPP_COLON && state == 0)
+		    {
+		      ++state;
+		      continue;
+		    }
+		  else if (peek->type == CPP_OPEN_PAREN)
+		    {
+		      if (state == 0)
+			cpp_error_with_line (pfile, CPP_DL_ERROR,
+					     peek->src_loc, 0,
+					     "module name followed by \"(\"");
+		      else
+			cpp_error_with_line (pfile, CPP_DL_ERROR,
+					     peek->src_loc, 0,
+					     "module partition followed by "
+					     "\"(\"");
+		      break;
+		    }
+		  else
+		    break;
+		}
+	      else if (peek->type != CPP_NAME)
+		break;
+	    }
+	  while (true);
+	}
     }
   else
     {
but need to write some testcases.
Comment 6 GCC Commits 2024-11-01 18:48:30 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:1ae24f7e0bdbdeaef9265a053a737af11f8393d2

commit r15-4848-g1ae24f7e0bdbdeaef9265a053a737af11f8393d2
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Nov 1 19:42:28 2024 +0100

    c++: Attempt to implement C++26 P3034R1 - Module Declarations Shouldn't be Macros [PR114461]
    
    This is an attempt to implement the https://wg21.link/p3034r1 paper,
    but I'm afraid the wording in the paper is bad for multiple reasons.
    I think I understand the intent, that the module name and partition
    if any shouldn't come from macros so that they can be scanned for
    without preprocessing, but on the other side doesn't want to disable
    macro expansion in pp-module altogether, because e.g. the optional
    attribute in module-declaration would be nice to come from macros
    as which exact attribute is needed might need to be decided based on
    preprocessor checks.
    The paper added https://eel.is/c++draft/cpp.module#2
    which uses partly the wording from https://eel.is/c++draft/cpp.module#1
    
    The first issue I see is that using that "defined as an object-like macro"
    from there means IMHO something very different in those 2 paragraphs.
    As per https://eel.is/c++draft/cpp.pre#7.sentence-1 preprocessing tokens
    in preprocessing directives aren't subject to macro expansion unless
    otherwise stated, and so the export and module tokens aren't expanded
    and so the requirement that they aren't defined as an object-like macro
    makes perfect sense.  The problem with the new paragraph is that
    https://eel.is/c++draft/cpp.module#3.sentence-1 says that the rest of
    the tokens are macro expanded and after macro expansion none of the
    tokens can be defined as an object-like macro, if they would be, they'd
    be expanded to that.  So, I think either the wording needs to change
    such that not all preprocessing tokens after module are macro expanded,
    only those which are after the pp-module-name and if any pp-module-partition
    tokens, or all tokens after module are macro expanded but none of the tokens in
    pp-module-name and pp-module-partition if any must come from macro
    expansion.  The patch below implements it as if the former would be
    specified (but see later), so essentially scans the preprocessing tokens
    after module without expansion, if the first one is an identifier, it
    disables expansion for it and then if followed by . or : expects another
    such identifier (again with disabled expansion), but stops after second
    : is seen.
    
    Second issue is that while the global-module-fragment start is fine, matches
    the syntax of the new paragraph where the pp-tokens[opt] aren't present,
    there is also private-module-fragment in the syntax where module is
    followed by : private ; and in that case the colon doesn't match the
    pp-module-name grammar and appears now to be invalid.  I think the
    https://eel.is/c++draft/cpp.module#2
    paragraph needs to change so that it allows also that pp-tokens of
    a pp-module may also be : pp-tokens[opt] (and in that case, I think
    the colon shouldn't come from a macro and private and/or ; can).
    
    Third issue is that there are too many pp-tokens in
    https://eel.is/c++draft/cpp.module , one is all the tokens between
    module keyword and the semicolon and one is the optional extra tokens
    after pp-module-partition (if any, if missing, after pp-module).
    Perhaps introducing some other non-terminal would help talking about it?
    So in "where the pp-tokens (if any) shall not begin with a ( preprocessing
    token" it isn't obvious which pp-tokens it is talking about (my assumption
    is the latter) and also whether ( can't appear there just before macro
    expansion or also after expansion.  The patch expects only before expansion,
    so
     #define F ();
     export module foo F
    would be valid during preprocessing but obviously invalid during
    compilation, but
     #define foo(n) n;
     export module foo (3)
    would be invalid already during preprocessing.
    
    The last issue applies only if the first issue is resolved to allow
    expansion of tokens after : if first token, or after pp-module-partition
    if present or after pp-module-name if present.  When non-preprocessing
    scanner sees
     export module foo.bar:baz.qux;
    it knows nothing can come from preprocessing macros and is ok, but if it
    sees
     export module foo.bar:baz qux
    then it can't know whether it will be
     export module foo.bar:baz;
    or
     export module foo.bar:baz [[]];
    or
     export module foo.bar:baz.freddy.garply;
    because qux could be validly a macro, which expands to ; or [[]];
    or .freddy.garply; etc.  So, either the non-preprocessing scanner would
    need to note it as possible export of foo.bar:baz* module partitions
    and preprocess if it needs to know the details or just compile, or if that
    is not ok, the wording would need to rule out that the expansion of (the
    second) pp-tokens if any can't start with . or : (colon would be only
    problematic if it isn't present in the tokens before it already).
    So, if e.g. defining qux above to . whatever is invalid, then the scanner
    can rely it sees the whole module name and partition.
    
    The patch below implements what is above described as the first variant
    of the first issue resolution, i.e. disables expansion of as many tokens
    as could be in the valid module name and module partition syntax, but
    as soon as it e.g. sees two adjacent identifiers, the second one can be
    macro expanded.  If it is macro expanded though, the expansion can't
    start with . or :, and if it expands to nothing, tokens after it (whether
    they come from macro expansion or not) can't start with . or :.
    So, effectively:
     #define SEMI ;
     export module SEMI
    used to be valid and isn't anymore,
     #define FOO bar
     export module FOO;
    isn't valid,
     #define COLON :
     export module COLON private;
    isn't valid,
     #define BAR baz
     export module foo.bar:baz.qux.BAR;
    isn't valid,
     #define BAZ .qux
     export module foo BAZ;
    isn't valid,
     #define FREDDY :garply
     export module foo FREDDY;
    isn't valid,
    while
     #define QUX [[]]
     export module foo QUX;
    or
     #define GARPLY private
     module : GARPLY;
    etc. is.
    
    2024-11-01  Jakub Jelinek  <jakub@redhat.com>
    
            PR c++/114461
    libcpp/
            * include/cpplib.h: Implement C++26 P3034R1
            - Module Declarations Shouldnât be Macros (or more precisely
            its expected intent).
            (NO_DOT_COLON): Define.
            * internal.h (struct cpp_reader): Add diagnose_dot_colon_from_macro_p
            member.
            * lex.cc (cpp_maybe_module_directive): For pp-module, if
            module keyword is followed by CPP_NAME, ensure all CPP_NAME
            tokens possibly matching module name and module partition
            syntax aren't expanded and aren't defined as object-like macros.
            Verify first token after that doesn't start with open paren.
            If the next token after module name/partition is CPP_NAME defined
            as macro, set NO_DOT_COLON flag on it.
            * macro.cc (cpp_get_token_1): Set
            pfile->diagnose_dot_colon_from_macro_p if token to be expanded has
            NO_DOT_COLON bit set in flags.  Before returning, if
            pfile->diagnose_dot_colon_from_macro_p is true and not returning
            CPP_PADDING or CPP_COMMENT and not during macro expansion preparation,
            set pfile->diagnose_dot_colon_from_macro_p to false and diagnose
            if returning CPP_DOT or CPP_COLON.
    gcc/testsuite/
            * g++.dg/modules/cpp-7.C: New test.
            * g++.dg/modules/cpp-8.C: New test.
            * g++.dg/modules/cpp-9.C: New test.
            * g++.dg/modules/cpp-10.C: New test.
            * g++.dg/modules/cpp-11.C: New test.
            * g++.dg/modules/cpp-12.C: New test.
            * g++.dg/modules/cpp-13.C: New test.
            * g++.dg/modules/cpp-14.C: New test.
            * g++.dg/modules/cpp-15.C: New test.
            * g++.dg/modules/cpp-16.C: New test.
            * g++.dg/modules/cpp-17.C: New test.
            * g++.dg/modules/cpp-18.C: New test.
            * g++.dg/modules/cpp-19.C: New test.
            * g++.dg/modules/cpp-20.C: New test.
            * g++.dg/modules/pmp-4.C: New test.
            * g++.dg/modules/pmp-5.C: New test.
            * g++.dg/modules/pmp-6.C: New test.
            * g++.dg/modules/token-6.C: New test.
            * g++.dg/modules/token-7.C: New test.
            * g++.dg/modules/token-8.C: New test.
            * g++.dg/modules/token-9.C: New test.
            * g++.dg/modules/token-10.C: New test.
            * g++.dg/modules/token-11.C: New test.
            * g++.dg/modules/token-12.C: New test.
            * g++.dg/modules/token-13.C: New test.
            * g++.dg/modules/token-14.C: New test.
            * g++.dg/modules/token-15.C: New test.
            * g++.dg/modules/token-16.C: New test.
            * g++.dg/modules/dir-only-3.C: Expect an error.
            * g++.dg/modules/dir-only-4.C: Expect an error.
            * g++.dg/modules/dir-only-5.C: New test.
            * g++.dg/modules/atom-preamble-2_a.C: In export module malcolm;
            replace malcolm with kevin.  Don't define malcolm macro.
            * g++.dg/modules/atom-preamble-4.C: Expect an error.
            * g++.dg/modules/atom-preamble-5.C: New test.