rustc_ast: Visit tokens stored in AST nodes in mutable visitor
After #77271 token visiting is enabled only for one visitor in `rustc_expand\src\mbe\transcribe.rs` which applies hygiene marks to tokens produced by declarative macros (`macro_rules` or `macro`), so this change doesn't affect anything else.
When a macro has some interpolated token from an outer macro in its output
```rust
macro inner() {
$interpolated
}
```
we can use the usual interpretation of interpolated tokens in token-based model - a None-delimited group - to write this macro in an equivalent form
```rust
macro inner() {
⟪ a b c d ⟫
}
```
When we are expanding the macro `inner` we need to apply hygiene marks to all tokens produced by it, including the tokens inside the group.
Before this PR we did this by visiting the AST piece inside the interpolated token and applying marks to all spans in it.
I'm not sure this is 100% correct (ideally we should apply the marks to tokens and then re-parse the AST from tokens), but it's a very good approximation at least.
We didn't however apply the marks to actual tokens stored in the nonterminal, so if we used the nonterminal as a token rather than as an AST piece (e.g. passed it to a proc macro), then we got hygiene bugs.
This PR applies the marks to tokens in addition to the AST pieces thus fixing the issue.
r? `@Aaron1011`
Originally, there has been a dedicated pass for renumbering
AST NodeIds to have actual values. This pass had been added by
commit a5ad4c3794.
Then, later, this step was moved to where it resides now,
macro expansion. See commit c86c8d41a2
or PR #36438.
The comment snippet, added by the original commit, has
survived the times without any change, becoming outdated
at removal of the dedicated pass.
Nowadays, grepping for the next_node_id function will show up
multiple places in the compiler that call it, but the main
rewriting that the comment talks about is still done in the
expansion step, inside an innocious looking visit_id function
that's called during macro invocation collection.
Treat trailing semicolon as a statement in macro call
See #61733 (comment)
We now preserve the trailing semicolon in a macro invocation, even if
the macro expands to nothing. As a result, the following code no longer
compiles:
```rust
macro_rules! empty {
() => { }
}
fn foo() -> bool { //~ ERROR mismatched
{ true } //~ ERROR mismatched
empty!();
}
```
Previously, `{ true }` would be considered the trailing expression, even
though there's a semicolon in `empty!();`
This makes macro expansion more token-based.
See https://github.com/rust-lang/rust/issues/61733#issuecomment-716188981
We now preserve the trailing semicolon in a macro invocation, even if
the macro expands to nothing. As a result, the following code no longer
compiles:
```rust
macro_rules! empty {
() => { }
}
fn foo() -> bool { //~ ERROR mismatched
{ true } //~ ERROR mismatched
empty!();
}
```
Previously, `{ true }` would be considered the trailing expression, even
though there's a semicolon in `empty!();`
This makes macro expansion more token-based.
Suggest that expressions that look like const generic arguments should be enclosed in brackets
I pulled out the changes for const expressions from https://github.com/rust-lang/rust/pull/71592 (without the trait object diagnostic changes) and made some small changes; the implementation is `@estebank's.`
We're also going to want to make some changes separately to account for trait objects (they result in poor diagnostics, as is evident from one of the test cases here), such as an adaption of https://github.com/rust-lang/rust/pull/72273.
Fixes https://github.com/rust-lang/rust/issues/70753.
r? `@petrochenkov`
Split out statement attributes changes from #78306
This is the same as PR https://github.com/rust-lang/rust/pull/78306, but `unused_doc_comments` is modified to explicitly ignore statement items (which preserves the current behavior).
This shouldn't have any user-visible effects, so it can be landed without lang team discussion.
---------
When the 'early' and 'late' visitors visit an attribute target, they
activate any lint attributes (e.g. `#[allow]`) that apply to it.
This can affect warnings emitted on sibiling attributes. For example,
the following code does not produce an `unused_attributes` for
`#[inline]`, since the sibiling `#[allow(unused_attributes)]` suppressed
the warning.
```rust
trait Foo {
#[allow(unused_attributes)] #[inline] fn first();
#[inline] #[allow(unused_attributes)] fn second();
}
```
However, we do not do this for statements - instead, the lint attributes
only become active when we visit the struct nested inside `StmtKind`
(e.g. `Item`).
Currently, this is difficult to observe due to another issue - the
`HasAttrs` impl for `StmtKind` ignores attributes for `StmtKind::Item`.
As a result, the `unused_doc_comments` lint will never see attributes on
item statements.
This commit makes two interrelated fixes to the handling of inert
(non-proc-macro) attributes on statements:
* The `HasAttr` impl for `StmtKind` now returns attributes for
`StmtKind::Item`, treating it just like every other `StmtKind`
variant. The only place relying on the old behavior was macro
which has been updated to explicitly ignore attributes on item
statements. This allows the `unused_doc_comments` lint to fire for
item statements.
* The `early` and `late` lint visitors now activate lint attributes when
invoking the callback for `Stmt`. This ensures that a lint
attribute (e.g. `#[allow(unused_doc_comments)]`) can be applied to
sibiling attributes on an item statement.
For now, the `unused_doc_comments` lint is explicitly disabled on item
statements, which preserves the current behavior. The exact locatiosn
where this lint should fire are being discussed in PR #78306
When the 'early' and 'late' visitors visit an attribute target, they
activate any lint attributes (e.g. `#[allow]`) that apply to it.
This can affect warnings emitted on sibiling attributes. For example,
the following code does not produce an `unused_attributes` for
`#[inline]`, since the sibiling `#[allow(unused_attributes)]` suppressed
the warning.
```rust
trait Foo {
#[allow(unused_attributes)] #[inline] fn first();
#[inline] #[allow(unused_attributes)] fn second();
}
```
However, we do not do this for statements - instead, the lint attributes
only become active when we visit the struct nested inside `StmtKind`
(e.g. `Item`).
Currently, this is difficult to observe due to another issue - the
`HasAttrs` impl for `StmtKind` ignores attributes for `StmtKind::Item`.
As a result, the `unused_doc_comments` lint will never see attributes on
item statements.
This commit makes two interrelated fixes to the handling of inert
(non-proc-macro) attributes on statements:
* The `HasAttr` impl for `StmtKind` now returns attributes for
`StmtKind::Item`, treating it just like every other `StmtKind`
variant. The only place relying on the old behavior was macro
which has been updated to explicitly ignore attributes on item
statements. This allows the `unused_doc_comments` lint to fire for
item statements.
* The `early` and `late` lint visitors now activate lint attributes when
invoking the callback for `Stmt`. This ensures that a lint
attribute (e.g. `#[allow(unused_doc_comments)]`) can be applied to
sibiling attributes on an item statement.
For now, the `unused_doc_comments` lint is explicitly disabled on item
statements, which preserves the current behavior. The exact locatiosn
where this lint should fire are being discussed in PR #78306
This allows us to avoid synthesizing tokens in `prepend_attr`, since we
have the original tokens available.
We still need to synthesize tokens when expanding `cfg_attr`,
but this is an unavoidable consequence of the syntax of `cfg_attr` -
the user does not supply the `#` and `[]` tokens that a `cfg_attr`
expands to.
Rewrite `collect_tokens` implementations to use a flattened buffer
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.
The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.
This implementation has a number of advantages over the previous one:
* It is significantly simpler, with no edge cases around capturing the
start/end of a delimited group.
* It can be easily extended to allow replacing tokens an an arbitrary
'depth' by just using `Vec::splice` at the proper position. This is
important for PR #76130, which requires us to track information about
attributes along with tokens.
* The lazy approach to `TokenStream` construction allows us to easily
parse an AST struct, and then decide after the fact whether we need a
`TokenStream`. This will be useful when we start collecting tokens for
`Attribute` - we can discard the `LazyTokenStream` if the parsed
attribute doesn't need tokens (e.g. is a builtin attribute).
The performance impact seems to be neglibile (see
https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.
The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.
This implementation has a number of advantages over the previous one:
* It is significantly simpler, with no edge cases around capturing the
start/end of a delimited group.
* It can be easily extended to allow replacing tokens an an arbitrary
'depth' by just using `Vec::splice` at the proper position. This is
important for PR #76130, which requires us to track information about
attributes along with tokens.
* The lazy approach to `TokenStream` construction allows us to easily
parse an AST struct, and then decide after the fact whether we need a
`TokenStream`. This will be useful when we start collecting tokens for
`Attribute` - we can discard the `LazyTokenStream` if the parsed
attribute doesn't need tokens (e.g. is a builtin attribute).
The performance impact seems to be neglibile (see
https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
Remove unused code
Rustc has a builtin lint for detecting unused code inside a crate, but when an item is marked `pub`, the code, even if unused inside the entire workspace, is never marked as such. Therefore, I've built [warnalyzer](https://github.com/est31/warnalyzer) to detect unused items in a cross-crate setting.
Closes https://github.com/est31/warnalyzer/issues/2
Prevent stack overflow in deeply nested types.
Related issue #75577 (?)
Unfortunately, I am unable to test whether this actually solves the problem because apparently, 12GB RAM + 2GB swap is not enough to compile the (admittedly toy) source file.
use sort_unstable to sort primitive types
It's not important to retain original order if we have &[1, 1, 2, 3] for example.
clippy::stable_sort_primitive
We currently only attach tokens when parsing a `:stmt` matcher for a
`macro_rules!` macro. Proc-macro attributes on statements are still
unstable, and need additional work.
Allow try blocks as the argument to return expressions
Fixes#76271
I don't think this needs to be edition-aware (phew) since `return try` in 2015 is also the start of an expression, just with a struct literal instead of a block (`return try { x: 4, y: 5 }`).
Account for version number in NtIdent hack
Issue #74616 tracks a backwards-compatibility hack for certain macros.
This has is implemented by hard-coding the filenames and macro names of
certain code that we want to continue to compile.
However, the initial implementation of the hack was based on the
directory structure when building the crate from its repository (e.g.
`js-sys/src/lib.rs`). When the crate is build as a dependency, it will
include a version number from the clone from the cargo registry (e.g.
`js-sys-0.3.17/src/lib.rs`), which would fail the check.
This commit modifies the backwards-compatibility hack to check that
desired crate name (`js-sys` or `time-macros-impl`) is a prefix of the
proper part of the path.
See https://github.com/rust-lang/rust/issues/76070#issuecomment-687215646
for more details.