Add flag to forbid recovery in the parser
To start the effort of fixing #103534, this adds a new flag to the parser, which forbids the parser from doing recovery, which it shouldn't do in macros.
This doesn't add any new checks for recoveries yet and is just here to bikeshed the names for the functions here before doing more.
r? `@compiler-errors`
Rollup of 6 pull requests
Successful merges:
- #101293 (Recover when unclosed char literal is parsed as a lifetime in some positions)
- #101908 (Suggest let for assignment, and some code refactor)
- #103192 (rustdoc: Eliminate uses of `EarlyDocLinkResolver::all_traits`)
- #103226 (Check `needs_infer` before `needs_drop` during HIR generator analysis)
- #103249 (resolve: Revert "Set effective visibilities for imports more precisely")
- #103305 (Move some tests to more reasonable places)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Flatten diagnostic slug modules
This makes it easier to grep for the slugs in the code.
See https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Localization.20infra.20interferes.20with.20grepping.20for.20error for more discussion about it.
This was mostly done with a few regexes and a bunch of manual work. This also exposes a pretty annoying inconsistency for the extra labels. Some of the extra labels are defined as additional properties in the fluent message (which makes them not prefixed with the crate name) and some of them are new fluent messages themselves (which makes them prefixed with the crate name). I don't know whether we want to clean this up at some point but it's useful to know.
r? `@davidtwco`
Fix `let` keyword removal suggestion in structs
(1.) Fixes a bug where, given this code:
```rust
struct Foo {
let x: i32,
}
```
We were parsing the field name as `let` instead of `x`, which causes issues later on in the type-checking phase.
(2.) Also, suggestions for `let: i32` as a field regressed, displaying this extra `help:` which is removed by this PR
```
help: remove the let, the `let` keyword is not allowed in struct field definitions
|
2 - let: i32,
2 + : i32,
```
(3.) Makes the suggestion text a bit more succinct, since we don't need to re-explain that `let` is not allowed in this position (since it's in a note that follows). This causes the suggestion to render inline as well.
cc `@gimbles,` this addresses a few nits I mentioned in your PR.
It's now only used in one function. Also, the "should we glue the
tokens?" check is only necessary when pushing a `TokenTree::Token`, not
when pushing a `TokenTree::Delimited`.
As part of this, we now do the "should we glue the tokens?" check
immediately, which avoids having look back at the previous token. It
also puts all the logic dealing with token gluing in a single place.
Remove `expr_parentheses_needed` from `ParseSess`
Not sure why this method needed to exist on `ParseSess`, but we can achieve the same behavior by just inlining it everywhere.
Group together more size assertions.
Also add a few more assertions for some relevant token-related types.
And fix an erroneous comment in `rustc_errors`.
r? `@lqd`
Improve errors for incomplete functions in struct definitions
Given the following code:
```rust
fn main() {}
struct Foo {
fn
}
```
[playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=29139f870511f6918324be5ddc26c345)
The current output is:
```
Compiling playground v0.0.1 (/playground)
error: functions are not allowed in struct definitions
--> src/main.rs:4:5
|
4 | fn
| ^^
|
= help: unlike in C++, Java, and C#, functions are declared in `impl` blocks
= help: see https://doc.rust-lang.org/book/ch05-03-method-syntax.html for more information
error: could not compile `playground` due to previous error
```
In this case, rustc should suggest escaping `fn` to use it as an identifier.
Migrate more of rustc_parse to SessionDiagnostic
Still far from complete, but I thought I'd add a checkpoint here because rebasing was starting to get annoying.
Rollup of 5 pull requests
Successful merges:
- #102143 (Recover from struct nested in struct)
- #102178 (bootstrap: the backtrace feature is stable, no need to allow it any more)
- #102197 (Stabilize const `BTree{Map,Set}::new`)
- #102267 (Don't set RUSTC in the bootstrap build script)
- #102270 (Remove benches from `rustc_middle`)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
`Cursor` is currently hidden, and the main tokenization path uses
`rustc_lexer::first_token` which involves constructing a new `Cursor`
for every single token, which is weird. Also, `first_token` also can't
handle empty input, so callers have to check for that first.
This commit makes `Cursor` public, so `StringReader` can contain a
`Cursor`, which results in a simpler structure. The commit also changes
`StringReader::advance_token` so it returns an `Option<Token>`,
simplifying the the empty input case.
`TokenTreesReader` wraps a `StringReader`, but the `into_token_trees`
function obscures this. This commit moves to a more straightforward
control flow.
The spacing computation is done in two parts. In the first part
`next_token` and `bump` use `Spacing::Alone` to mean "preceded by
whitespace" and `Spacing::Joint` to mean the opposite. In the second
part `parse_token_tree_other` then adjusts the `spacing` value to mean
the usual thing (i.e. "is the following token joinable punctuation?").
This shift in meaning is very confusing and it took me some time to
understand what was going on.
This commit changes the first part to use a bool, and adds some
comments, which makes things much clearer.
`parse_token_tree` is basically a match with four arms: `Eof`,
`OpenDelim`, `CloseDelim`, and "other". It has two call sites, and at
each call site one of the arms is unreachable. It's also not inlined.
This commit removes `parse_token_tree` by splitting it into four
functions and inlining them. This avoids some repeated conditional
tests and also some non-inlined function calls on the hot path.
FIX - ambiguous Diagnostic link in docs
UPDATE - rename diagnostic_items to IntoDiagnostic and AddToDiagnostic
[Gardening] FIX - formatting via `x fmt`
FIX - rebase conflicts. NOTE: Confirm wheather or not we want to handle TargetDataLayoutErrorsWrapper this way
DELETE - unneeded allow attributes in Handler method
FIX - broken test
FIX - Rebase conflict
UPDATE - rename residual _SessionDiagnostic and fix LintDiag link
On later stages, the feature is already stable.
Result of running:
rg -l "feature.let_else" compiler/ src/librustdoc/ library/ | xargs sed -s -i "s#\\[feature.let_else#\\[cfg_attr\\(bootstrap, feature\\(let_else\\)#"
make `mk_attr_id` part of `ParseSess`
Updates #48685
The current `mk_attr_id` uses the `AtomicU32` type, which is not very efficient and adds a lot of lock contention in a parallel environment.
This PR refers to the task list in #48685, uses `mk_attr_id` as a method of the `AttrIdGenerator` struct, and adds a new field `attr_id_generator` to `ParseSess`.
`AttrIdGenerator` uses the `WorkerLocal`, which has two advantages: 1. `Cell` is more efficient than `AtomicU32`, and does not increase any lock contention. 2. We put the index of the work thread in the first few bits of the generated `AttrId`, so that the `AttrId` generated in different threads can be easily guaranteed to be unique.
cc `@cjgillot`
Initial implementation of dyn*
This PR adds extremely basic and incomplete support for [dyn*](https://smallcultfollowing.com/babysteps//blog/2022/03/29/dyn-can-we-make-dyn-sized/). The goal is to get something in tree behind a flag to make collaboration easier, and also to make sure the implementation so far is not unreasonable. This PR does quite a few things:
* Introduce `dyn_star` feature flag
* Adds parsing for `dyn* Trait` types
* Defines `dyn* Trait` as a sized type
* Adds support for explicit casts, like `42usize as dyn* Debug`
* Including const evaluation of such casts
* Adds codegen for drop glue so things are cleaned up properly when a `dyn* Trait` object goes out of scope
* Adds codegen for method calls, at least for methods that take `&self`
Quite a bit is still missing, but this gives us a starting point. Note that this is never intended to become stable surface syntax for Rust, but rather `dyn*` is planned to be used as an implementation detail for async functions in dyn traits.
Joint work with `@nikomatsakis` and `@compiler-errors.`
r? `@bjorn3`
The primary purpose of this commit is to introduce the
dyn_star flag so we can begin experimenting with implementation.
In order to have something to do in the feature gate test, we also add
parser support for `dyn* Trait` objects. These are currently treated
just like `dyn Trait` objects, but this will change in the future.
Note that for now `dyn* Trait` is experimental syntax to enable
implementing some of the machinery needed for async fn in dyn traits
without fully supporting the feature.
`To` is better than `Create` for indicating that this is a non-consuming
conversion, rather than creating something out of nothing.
And the addition of `Attr` is because the current names makes them sound
like they relate to `TokenStream`, but really they relate to
`AttrTokenStream`.
These two type names are long and have long matching prefixes. I find
them hard to read, especially in combinations like
`AttrAnnotatedTokenStream::new(vec![AttrAnnotatedTokenTree::Token(..)])`.
This commit renames them as `AttrToken{Stream,Tree}`.
Recover from using `;` as separator between fields
Partially fixes#101440 (only for record structs).
Doing that for tuple structs is harder as their parsing passes through a bunch of helper functions. I don't know how to do that. But [their error message is better anyway](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cc6ee8bb2593596c0cea89d49e79bcb4) and suggests using a comma, even if it doesn't suggest replacing the semicolon with it.
Update `SessionDiagnostic::into_diagnostic` to take `Handler` instead of `ParseSess`
Suggested by the team in [this Zulip Topic](https://rust-lang.zulipchat.com/#narrow/stream/336883-i18n/topic/.23100717.20SessionDiagnostic.20on.20Handler).
`Handler` already has almost all the capabilities of `ParseSess` when it comes to diagnostic emission, in this migration we only needed to add the ability to access `source_map` from the emitter in order to get a `Snippet` and the `start_point`. Not sure if adding these two methods [`span_to_snippet_from_emitter` and `span_start_point_from_emitter`] is the best way to address this gap.
P.S. If this goes in the right direction, then we probably may want to move `SessionDiagnostic` to `rustc_errors` and rename it to `DiagnosticHandler` or something similar.
r? `@davidtwco`
r? `@compiler-errors`
Suggest removing unnecessary prefix let in patterns
Helps with #101291, though I think `@estebank` probably wants this:
> Finally, I think it'd be nice if we could detect that we don't know for sure and "just" swallow the rest of the expression (find the next ; accounting for nested braces) or the end of the item (easier).
... to be implemented before we close that issue out completely.
`BindingAnnotation` refactor
* `ast::BindingMode` is deleted and replaced with `hir::BindingAnnotation` (which is moved to `ast`)
* `BindingAnnotation` is changed from an enum to a tuple struct e.g. `BindingAnnotation(ByRef::No, Mutability::Mut)`
* Associated constants added for convenience `BindingAnnotation::{NONE, REF, MUT, REF_MUT}`
One goal is to make it more clear that `BindingAnnotation` merely represents syntax `ref mut` and not the actual binding mode. This was especially confusing since we had `ast::BindingMode`->`hir::BindingAnnotation`->`thir::BindingMode`.
I wish there were more symmetry between `ByRef` and `Mutability` (variant) naming (maybe `Mutable::Yes`?), and I also don't love how long the name `BindingAnnotation` is, but this seems like the best compromise. Ideas welcome.
Suggested by the team in this Zulip Topic https://rust-lang.zulipchat.com/#narrow/stream/336883-i18n/topic/.23100717.20SessionDiagnostic.20on.20Handler
Handler already has almost all the capabilities of ParseSess when it comes to diagnostic emission, in this migration we only needed to add the ability to access source_map from the emitter in order to get a Snippet and the start_point. Not sure if this is the best way to address this gap
Replace `rustc_data_structures::thin_vec::ThinVec` with `thin_vec::ThinVec`
`rustc_data_structures::thin_vec::ThinVec` looks like this:
```
pub struct ThinVec<T>(Option<Box<Vec<T>>>);
```
It's just a zero word if the vector is empty, but requires two
allocations if it is non-empty. So it's only usable in cases where the
vector is empty most of the time.
This commit removes it in favour of `thin_vec::ThinVec`, which is also
word-sized, but stores the length and capacity in the same allocation as
the elements. It's good in a wider variety of situation, e.g. in enum
variants where the vector is usually/always non-empty.
The commit also:
- Sorts some `Cargo.toml` dependency lists, to make additions easier.
- Sorts some `use` item lists, to make additions easier.
- Changes `clean_trait_ref_with_bindings` to take a
`ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this
avoid some unnecessary allocations.
r? `@spastorino`
This PR will fix some typos detected by [typos].
I only picked the ones I was sure were spelling errors to fix, mostly in
the comments.
[typos]: https://github.com/crate-ci/typos
`rustc_data_structures::thin_vec::ThinVec` looks like this:
```
pub struct ThinVec<T>(Option<Box<Vec<T>>>);
```
It's just a zero word if the vector is empty, but requires two
allocations if it is non-empty. So it's only usable in cases where the
vector is empty most of the time.
This commit removes it in favour of `thin_vec::ThinVec`, which is also
word-sized, but stores the length and capacity in the same allocation as
the elements. It's good in a wider variety of situation, e.g. in enum
variants where the vector is usually/always non-empty.
The commit also:
- Sorts some `Cargo.toml` dependency lists, to make additions easier.
- Sorts some `use` item lists, to make additions easier.
- Changes `clean_trait_ref_with_bindings` to take a
`ThinVec<TypeBinding>` rather than a `&[TypeBinding]`, because this
avoid some unnecessary allocations.
Convert diagnostics in parser/expr to SessionDiagnostic
This migrates all the easy cases in `rustc_parse::parser::expr` to `SessionDiagnostic`s, I've left things such as `multipart_suggestion`s out for now in the hopes of a derive API being developed soon.
Recover keywords in trait bounds
(_this pr was inspired by [this tweet](https://twitter.com/Azumanga/status/1552982326409367561)_)
Recover keywords in trait bound, motivational example:
```rust
fn f(_: impl fn()) {} // mistyped, meant `Fn`
```
<details><summary>Current nightly (3 needless and confusing errors!)</summary>
<p>
```text
error: expected identifier, found keyword `fn`
--> ./t.rs:1:15
|
1 | fn _f(_: impl fn()) {}
| ^^ expected identifier, found keyword
|
help: escape `fn` to use it as an identifier
|
1 | fn _f(_: impl r#fn()) {}
| ++
error: expected one of `:` or `|`, found `)`
--> ./t.rs:1:19
|
1 | fn _f(_: impl fn()) {}
| ^ expected one of `:` or `|`
error: expected one of `!`, `(`, `)`, `,`, `?`, `for`, `~`, lifetime, or path, found keyword `fn`
--> ./t.rs:1:15
|
1 | fn _f(_: impl fn()) {}
| -^^ expected one of 9 possible tokens
| |
| help: missing `,`
error: at least one trait must be specified
--> ./t.rs:1:10
|
1 | fn _f(_: impl fn()) {}
| ^^^^
```
</p>
</details>
This PR:
```text
error: expected identifier, found keyword `fn`
--> ./t.rs:1:15
|
1 | fn _f(_: impl fn()) {}
| ^^ expected identifier, found keyword
|
help: escape `fn` to use it as an identifier
|
1 | fn _f(_: impl r#fn()) {}
| ++
error[E0405]: cannot find trait `r#fn` in this scope
--> ./t.rs:1:15
|
1 | fn _f(_: impl fn()) {}
| ^^ help: a trait with a similar name exists (notice the capitalization): `Fn`
|
::: /home/waffle/projects/repos/rust/library/core/src/ops/function.rs:74:1
|
74 | pub trait Fn<Args>: FnMut<Args> {
| ------------------------------- similarly named trait `Fn` defined here
```
It would be nice to have suggestion in the first error like "have you meant `Fn` trait", instead of a separate error, but the recovery is deep inside ident parsing, which makes it a lot harder to do.
r? `@compiler-errors`
In some places we use `Vec<Attribute>` and some places we use
`ThinVec<Attribute>` (a.k.a. `AttrVec`). This results in various points
where we have to convert between `Vec` and `ThinVec`.
This commit changes the places that use `Vec<Attribute>` to use
`AttrVec`. A lot of this is mechanical and boring, but there are
some interesting parts:
- It adds a few new methods to `ThinVec`.
- It implements `MapInPlace` for `ThinVec`, and introduces a macro to
avoid the repetition of this trait for `Vec`, `SmallVec`, and
`ThinVec`.
Overall, it makes the code a little nicer, and has little effect on
performance. But it is a precursor to removing
`rustc_data_structures::thin_vec::ThinVec` and replacing it with
`thin_vec::ThinVec`, which is implemented more efficiently.
Migrate "invalid variable declaration" errors to SessionDiagnostic
After seeing the great blog post on Inside Rust, I decided to try my hand at this. Just one diagnostic for now to get used to the workflow and to check if this is the way to do it or if there are any problems.
Fix documentation of rustc_parse::parser::Parser::parse_stmt_without_recovery
Something seems to have gotten out of sync during the creation of #81177, where both the argument and comment were introduced.
- Rename `ast::Lit::token` as `ast::Lit::token_lit`, because its type is
`token::Lit`, which is not a token. (This has been confusing me for a
long time.)
reasonable because we have an `ast::token::Lit` inside an `ast::Lit`.
- Rename `LitKind::{from,to}_lit_token` as
`LitKind::{from,to}_token_lit`, to match the above change and
`token::Lit`.
Adjust span of fn argument declaration
Span of a fn argument declaration goes from:
```
fn foo(i : i32 , ...)
^^^^^^^^
```
to:
```
fn foo(i : i32 , ...)
^^^^^^^
```
That is, we don't include the extra spacing up to the trailing comma, which I think is more correct.
cc https://github.com/rust-lang/rust/pull/99646#discussion_r944568074
r? ``@estebank``
---
The two tests that had dramatic changes in their rendering I think actually are improved, though they are kinda poor spans both before and after the changes. 🤷 Thoughts?
`Parser::parse_bottom_expr` currently constructs an empty `attrs` and
then passes it to a large number of other functions. This makes the code
harder to read than it should be, because it's not clear that many
`attrs` arguments are always empty.
This commit removes `attrs` and the passing, simplifying a lot of
functions. The commit also renames `Parser::mk_expr` (which takes an
`attrs` argument) as `mk_expr_with_attrs`, and introduces a new
`mk_expr` which creates an expression with no attributes, which is the
more common case.
Stringify non-shorthand visibility correctly
This makes `stringify!(pub(in crate))` evaluate to `pub(in crate)` rather than `pub(crate)`, matching the behavior before the `crate` shorthand was removed. Further, this changes `stringify!(pub(in super))` to evaluate to `pub(in super)` rather than the current `pub(super)`. If the latter is not desired (it is _technically_ breaking), it can be undone.
Fixes#99981
`@rustbot` label +C-bug +regression-from-stable-to-beta +T-compiler
Use `&mut Diagnostic` instead of `&mut DiagnosticBuilder` unless needed
This seems to be the established convention (02ff9e0) when `DiagnosticBuilder` was first added. I am guilty of introducing some of these.
Improve diagnostics for `const a: = expr;`
Adds a suggestion to write a type when there is a colon, but the type is not present.
I've also shrunk spans a little, so the suggestions are a little nicer.
Resolves#100146
r? `@compiler-errors`
Use Parser's `restrictions` instead of `let_expr_allowed`
This also means that the `ALLOW_LET` flag is reset properly for subexpressions, so we can properly deny things like `a && (b && let c = d)`. Also the parser is a tiny bit smaller now.
It doesn't reject _all_ bad `let` expr usages, just a bit more.
cc `@c410-f3r`
From 72 bytes to 12 bytes (on x86-64).
There are two parts to this:
- Changing various source code offsets from 64-bit to 32-bit. This is
not a problem because the rest of rustc also uses 32-bit source code
offsets. This means `Token` is no longer `Copy` but this causes no
problems.
- Removing the `RawStrError` from `LiteralKind`. Raw string literal
invalidity is now indicated by a `None` value within
`RawStr`/`RawByteStr`, and the new `validate_raw_str` function can be
used to re-lex an invalid raw string literal to get the `RawStrError`.
There is one very small change in behaviour. Previously, if a raw string
literal matched both the `InvalidStarter` and `TooManyHashes` cases,
the latter would override the former. This has now changed, because
`raw_double_quoted_string` now uses `?` and so returns immediately upon
detecting the `InvalidStarter` case. I think this is a slight
improvement to report the earlier-detected error, and it explains the
change in the `test_too_many_hashes` test.
The commit also removes a couple of comments that refer to #77629 and
say that the size of these types don't affect performance. These
comments are wrong, though the performance effect is small.