This PR modifies the macro expansion infrastructure to handle attributes
in a fully token-based manner. As a result:
* Derives macros no longer lose spans when their input is modified
by eager cfg-expansion. This is accomplished by performing eager
cfg-expansion on the token stream that we pass to the derive
proc-macro
* Inner attributes now preserve spans in all cases, including when we
have multiple inner attributes in a row.
This is accomplished through the following changes:
* New structs `AttrAnnotatedTokenStream` and `AttrAnnotatedTokenTree` are introduced.
These are very similar to a normal `TokenTree`, but they also track
the position of attributes and attribute targets within the stream.
They are built when we collect tokens during parsing.
An `AttrAnnotatedTokenStream` is converted to a regular `TokenStream` when
we invoke a macro.
* Token capturing and `LazyTokenStream` are modified to work with
`AttrAnnotatedTokenStream`. A new `ReplaceRange` type is introduced, which
is created during the parsing of a nested AST node to make the 'outer'
AST node aware of the attributes and attribute target stored deeper in the token stream.
* When we need to perform eager cfg-expansion (either due to `#[derive]` or `#[cfg_eval]`),
we tokenize and reparse our target, capturing additional information about the locations of
`#[cfg]` and `#[cfg_attr]` attributes at any depth within the target.
This is a performance optimization, allowing us to perform less work
in the typical case where captured tokens never have eager cfg-expansion run.
Avoid `;` -> `,` recovery and unclosed `}` recovery from being too verbose
Those two recovery attempts have a very bad interaction that causes too
unnecessary output. Add a simple gate to avoid interpreting a `;` as a
`,` when there are unclosed braces.
Fix#83498.
Those two recovery attempts have a very bad interaction that causes too
unnecessary output. Add a simple gate to avoid interpreting a `;` as a
`,` when there are unclosed braces.
Previously, we would silently remove any `None`-delimiters when
capturing a `TokenStream`, 'flattenting' them to their inner tokens.
This was not normally visible, since we usually have
`TokenKind::Interpolated` (which gets converted to a `None`-delimited
group during macro invocation) instead of an actual `None`-delimited
group.
However, there are a couple of cases where this becomes visible to
proc-macros:
1. A cross-crate `macro_rules!` macro has a `None`-delimited group
stored in its body (as a result of being produced by another
`macro_rules!` macro). The cross-crate `macro_rules!` invocation
can then expand to an attribute macro invocation, which needs
to be able to see the `None`-delimited group.
2. A proc-macro can invoke an attribute proc-macro with its re-collected
input. If there are any nonterminals present in the input, they will
get re-collected to `None`-delimited groups, which will then get
captured as part of the attribute macro invocation.
Both of these cases are incredibly obscure, so there hopefully won't be
any breakage. This change will allow more agressive 'flattenting' of
nonterminals in #82608 without losing `None`-delimited groups.
ast/hir: Rename field-related structures
I always forget what `ast::Field` and `ast::StructField` mean despite working with AST for long time, so this PR changes the naming to less confusing and more consistent.
- `StructField` -> `FieldDef` ("field definition")
- `Field` -> `ExprField` ("expression field", not "field expression")
- `FieldPat` -> `PatField` ("pattern field", not "field pattern")
Various visiting and other methods working with the fields are renamed correspondingly too.
The second commit reduces the size of `ExprKind` by boxing fields of `ExprKind::Struct` in preparation for https://github.com/rust-lang/rust/pull/80080.
StructField -> FieldDef ("field definition")
Field -> ExprField ("expression field", not "field expression")
FieldPat -> PatField ("pattern field", not "field pattern")
Also rename visiting and other methods working on them.
Avoid sorting predicates by `DefId`
Fixes issue #82920
Even if an item does not change between compilation sessions, it may end
up with a different `DefId`, since inserting/deleting an item affects
the `DefId`s of all subsequent items. Therefore, we use a `DefPathHash`
in the incremental compilation system, which is stable in the face of
changes to unrelated items.
In particular, the query system will consider the inputs to a query to
be unchanged if any `DefId`s in the inputs have their `DefPathHash`es
unchanged. Queries are pure functions, so the query result should be
unchanged if the query inputs are unchanged.
Unfortunately, it's possible to inadvertantly make a query result
incorrectly change across compilations, by relying on the specific value
of a `DefId`. Specifically, if the query result is a slice that gets
sorted by `DefId`, the precise order will depend on how the `DefId`s got
assigned in a particular compilation session. If some definitions end up
with different `DefId`s (but the same `DefPathHash`es) in a subsequent
compilation session, we will end up re-computing a *different* value for
the query, even though the query system expects the result to unchanged
due to the unchanged inputs.
It turns out that we have been sorting the predicates computed during
`astconv` by their `DefId`. These predicates make their way into the
`super_predicates_that_define_assoc_type`, which ends up getting used to
compute the vtables of trait objects. This, re-ordering these predicates
between compilation sessions can lead to undefined behavior at runtime -
the query system will re-use code built with a *differently ordered*
vtable, resulting in the wrong method being invoked at runtime.
This PR avoids sorting by `DefId` in `astconv`, fixing the
miscompilation. However, it's possible that other instances of this
issue exist - they could also be easily introduced in the future.
To fully fix this issue, we should
1. Turn on `-Z incremental-verify-ich` by default. This will cause the
compiler to ICE whenver an 'unchanged' query result changes between
compilation sessions, instead of causing a miscompilation.
2. Remove the `Ord` impls for `CrateNum` and `DefId`. This will make it
difficult to introduce ICEs in the first place.
or-patterns: disallow in `let` bindings
~~Blocked on https://github.com/rust-lang/rust/pull/81869~~
Disallows top-level or-patterns before type ascription. We want to reserve this syntactic space for possible future generalized type ascription.
r? ``@petrochenkov``
Suggest character encoding is incorrect when encountering random null bytes
This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in https://github.com/rust-lang/rust/issues/73979#issuecomment-653976451.
Closes#73979.
This adds recovery when in array type syntax user writes
[X; Y<Z, ...>]
instead of
[X; Y::<Z, ...>]
Fixes#82566
Note that whenever we parse an expression and know that the next token
cannot be `,`, we should be calling
check_mistyped_turbofish_with_multiple_type_params for this recovery.
Previously we only did this for statement parsing (e.g. `let x = f<a,
b>;`). We now also do it when parsing the length field in array type
syntax.
check_mistyped_turbofish_with_multiple_type_params was previously
expecting type arguments between angle brackets, which is not right, as
we can also see const expressions. We now use generic argument parser
instead of type parser.
Test with one, two, and three generic arguments added to check
consistentcy between
1. check_no_chained_comparison: Called after parsing a nested binop
application like `x < A > ...` where angle brackets are interpreted as
binary operators and `A` is an expression.
2. check_mistyped_turbofish_with_multiple_type_params: called by
`parse_full_stmt` when we expect to see a semicolon after parsing an
expression but don't see it.
(In `T2<1, 2>::C;`, the expression is `T2 < 1`)
When token-based attribute handling is implemeneted in #80689,
we will need to access tokens from `HasAttrs` (to perform
cfg-stripping), and we will to access attributes from `HasTokens` (to
construct a `PreexpTokenStream`).
This PR merges the `HasAttrs` and `HasTokens` traits into a new
`AstLike` trait. The previous `HasAttrs` impls from `Vec<Attribute>` and `AttrVec`
are removed - they aren't attribute targets, so the impls never really
made sense.
Improve suggestion for tuple struct pattern matching errors.
Closes#80174
This change allows numbers to be parsed as field names when pattern matching on structs, which allows us to provide better error messages when tuple structs are matched using a struct pattern.
r? ``@estebank``
ast: Keep expansion status for out-of-line module items
I.e. whether a module `mod foo;` is already loaded from a file or not.
This is a pre-requisite to correctly treating inner attributes on such modules (https://github.com/rust-lang/rust/issues/81661).
With this change AST structures for `mod` items diverge even more for AST structure for the crate root, which previously used `ast::Mod`.
Therefore this PR removes `ast::Mod` from `ast::Crate` in the first commit, these two things are sufficiently different from each other, at least at syntactic level.
Customization points for visiting a "`mod` item or crate root" were also removed from AST visitors (`fn visit_mod`).
`ast::Mod` itself was refactored away in the second commit in favor of `ItemKind::Mod(Unsafe, ModKind)`.
Crate root is sufficiently different from `mod` items, at least at syntactic level.
Also remove customization point for "`mod` item or crate root" from AST visitors.
Along the way, we also implement a handful of diagnostics improvements
and fixes, particularly with respect to the special handling of `||` in
place of `|` and when there are leading verts in function params, which
don't allow top-level or-patterns anyway.
Rollup of 11 pull requests
Successful merges:
- #80523 (#[doc(inline)] sym_generated)
- #80920 (Visit more targets when validating attributes)
- #81720 (Updated smallvec version due to RUSTSEC-2021-0003)
- #81891 ([rustdoc-json] Make `header` a vec of modifiers, and FunctionPointer consistent)
- #81912 (Implement the precise analysis pass for lint `disjoint_capture_drop_reorder`)
- #81914 (Fixing bad suggestion for `_` in `const` type when a function #81885)
- #81919 (BTreeMap: fix internal comments)
- #81927 (Add a regression test for #32498)
- #81965 (Fix MIR pretty printer for non-local DefIds)
- #82029 (Use debug log level for developer oriented logs)
- #82056 (fix ice (#82032))
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
This is a pure refactoring split out from #80689.
It represents the most invasive part of that PR, requiring changes in
every caller of `parse_outer_attributes`
In order to eagerly expand `#[cfg]` attributes while preserving the
original `TokenStream`, we need to know the range of tokens that
corresponds to every attribute target. This is accomplished by making
`parse_outer_attributes` return an opaque `AttrWrapper` struct. An
`AttrWrapper` must be converted to a plain `AttrVec` by passing it to
`collect_tokens_trailing_token`. This makes it difficult to accidentally
construct an AST node with attributes without calling `collect_tokens_trailing_token`,
since AST nodes store an `AttrVec`, not an `AttrWrapper`.
As a result, we now call `collect_tokens_trailing_token` for attribute
targets which only support inert attributes, such as generic arguments
and struct fields. Currently, the constructed `LazyTokenStream` is
simply discarded. Future PRs will record the token range corresponding
to the attribute target, allowing those tokens to be removed from an
enclosing `collect_tokens_trailing_token` call if necessary.
The panic happens when in recovery parsing a full `impl`
(`parse_item_impl`) fails and we drop the `DiagnosticBuilder` for the
recovery suggestion and return the `parse_item_impl` error.
We now raise the original error "expected identifier found `impl`" when
parsing the `impl` fails.
Note that the regression test is slightly simplified version of the
original repro in #81806, to make the error output smaller and more
resilient to unrelated changes in parser error messages.
Fixes#81806
Box the biggest ast::ItemKind variants
This PR is a different approach on https://github.com/rust-lang/rust/pull/81400, aiming to save memory in humongous ASTs.
The three affected item kind enums are:
- `ast::ItemKind` (208 -> 112 bytes)
- `ast::AssocItemKind` (176 -> 72 bytes)
- `ast::ForeignItemKind` (176 -> 72 bytes)
Improve handling of spans around macro result parse errors
Fixes#81543
After we expand a macro, we try to parse the resulting tokens as a AST
node. This commit makes several improvements to how we handle spans when
an error occurs:
* Only ovewrite the original `Span` if it's a dummy span. This preserves
a more-specific span if one is available.
* Use `self.prev_token` instead of `self.token` when emitting an error
message after encountering EOF, since an EOF token always has a dummy
span
* Make `SourceMap::next_point` leave dummy spans unused. A dummy span
does not have a logical 'next point', since it's a zero-length span.
Re-using the span span preserves its 'dummy-ness' for other checks
Fixes#81543
After we expand a macro, we try to parse the resulting tokens as a AST
node. This commit makes several improvements to how we handle spans when
an error occurs:
* Only ovewrite the original `Span` if it's a dummy span. This preserves
a more-specific span if one is available.
* Use `self.prev_token` instead of `self.token` when emitting an error
message after encountering EOF, since an EOF token always has a dummy
span
* Make `SourceMap::next_point` leave dummy spans unused. A dummy span
does not have a logical 'next point', since it's a zero-length span.
Re-using the span span preserves its 'dummy-ness' for other checks
Clone entire `TokenCursor` when collecting tokens
Reverts PR #80830Fixestaiki-e/pin-project#312
We can have an arbitrary number of `None`-delimited group frames pushed
on the stack due to proc-macro invocations, which can legally be exited.
Attempting to account for this would add a lot of complexity for a tiny
performance gain, so let's just use the original strategy.
Reverts PR #80830Fixestaiki-e/pin-project#312
We can have an arbitrary number of `None`-delimited group frames pushed
on the stack due to proc-macro invocations, which can legally be exited.
Attempting to account for this would add a lot of complexity for a tiny
performance gain, so let's just use the original strategy.
Improve diagnostics when parsing angle args
https://github.com/rust-lang/rust/pull/79266 introduced parsing of generic arguments in associated type constraints, this however resulted in possibly very confusing error messages in cases in which closing angle brackets were missing such as in `Vec<(u32, _, _) = vec![]`, which outputs an incorrectly parsed equality constraint error, as noted by `@cynecx.`
This PR tries to provide better error messages in such cases.
r? `@petrochenkov`
Refactor token collection to capture trailing token immediately
Split out from https://github.com/rust-lang/rust/pull/80689 - when we start capturing more information about attribute targets, we'll need to know in advance if we're capturing a trailing token or not.
r? `@ghost`
Currently, when a user uses a struct pattern to pattern match on
a tuple struct, the errors we emit generally suggest adding fields
using their field names, which are numbers. However, numbers are
not valid identifiers, so the suggestions, which use the shorthand
notation, are not valid syntax. This commit changes those errors
to suggest using the actual tuple struct pattern syntax instead,
which is a more actionable suggestion.
Fixes#81007
Previously, we would fail to collect tokens in the proper place when
only builtin attributes were present. As a result, we would end up with
attribute tokens in the collected `TokenStream`, leading to duplication
when we attempted to prepend the attributes from the AST node.
We now explicitly track when token collection must be performed due to
nomterminal parsing.
Set tokens on AST node in `collect_tokens`
A new `HasTokens` trait is introduced, which is used to move logic from
the callers of `collect_tokens` into the body of `collect_tokens`.
In addition to reducing duplication, this paves the way for PR #80689,
which needs to perform additional logic during token collection.
A new `HasTokens` trait is introduced, which is used to move logic from
the callers of `collect_tokens` into the body of `collect_tokens`.
In addition to reducing duplication, this paves the way for PR #80689,
which needs to perform additional logic during token collection.
Rework diagnostics for wrong number of generic args (fixes#66228 and #71924)
This PR reworks the `wrong number of {} arguments` message, so that it provides more details and contextual hints.
Suggest async {} for async || {}
Fixes#76011
This adds support for adding help diagnostics to the feature gating checks and
then uses it for the async_closure gate to add the extra bit of help
information as described in the issue.