Use restricted Damerau-Levenshtein distance for diagnostics
This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a _restricted_ implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.
This was first brought up [on IRLO](https://internals.rust-lang.org/t/18227) when it was noticed that the diagnostic for `prinltn!` (transposed L and T) was `print!` and not `println!`. Only a single existing UI test was effected, with the result being an objective improvement.
~~I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.~~
r? ``@estebank``
``@rustbot`` label +A-diagnostics +C-enhancement
Don't recover lifetimes/labels containing emojis as character literals
Fixes#108019.
Note that at the time of this commit, `unic-emoji-char` seems to have data tables only up to Unicode 5.0, but Unicode is already newer than this.
A newer emoji such as `🥺` will not be recognized as an emoji but older emojis such as `🐱` will.
This PR leaves a couple of FIXMEs where `unic_emoji_char::is_emoji` is used.
Suggest fix for misplaced generic params on fn item #103366fixes#103366
This still has some work to go, but works for 2/3 of the initial base cases described in #1033366
simple fn:
```
error: expected identifier, found `<`
--> shreys/test_1.rs:1:3
|
1 | fn<T> id(x: T) -> T { x }
| ^ expected identifier
|
help: help: place the generic parameter list after the function name:
|
1 | fn id<T>(x: T) -> T { x }
| ~~~~
```
Complicated bounds
```
error: expected identifier, found `<`
--> spanishpear/test_2.rs:1:3
|
1 | fn<'a, B: 'a + std::ops::Add<Output = u32>> f(_x: B) { }
| ^ expected identifier
|
help: help: place the generic parameter list after the function name:
|
1 | fn f<'a, B: 'a + std::ops::Add<Output = u32>>(_x: B) { }
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Opening a draft PR for comments on approach, particularly I have the following questions:
- [x] Is it okay to be using `err.span_suggestion` over struct derives? I struggled to get the initial implementation (particularly the correct suggestion message) on struct derives, although I think given what I've learned since starting, I could attempt re-doing it with that approach.
- [x] in the case where the snippet cannot be obtained from a span, is the `help` but no suggestion okay? I think yes (also, when does this case occur?)
- [x] are there any red flags for the generalisation of this work for relevant item kinds (i.e. `struct`, `enum`, `trait`, and `union`). My basic testing indicates it does work for those types except the help tip is currently hardcoded to `after the function name` - which should change dependent on the item.
- [x] I am planning to not show the suggestion if there is already a `<` after the item identifier, (i.e. if there are already generics, as after a function name per the original issue). Any major objections?
- [x] Is the style of error okay? I wasn't sure if there was a way to make it display nicer, or if thats handled by span_suggestion
These aren't blocking questions, and I will keep working on:
- check if there is a `<` after the ident (and if so, not showing the suggestion)
- generalize the help message
- figuring out how to write/run/etc ui tests (including reading the docs for them)
- logic cleanups
Note that at the time of this commit, `unic-emoji-char` seems to have
data tables only up to Unicode 5.0, but Unicode is already newer than
this.
A newer emoji such as `🥺` will not be recognized as an emoji
but older emojis such as `🐱` will.
The motivation here is to eliminate the `Option<(Delimiter,
DelimSpan)>`, which is `None` for the outermost token stream and `Some`
for all other token streams.
We are already treating the innermost frame specially -- this is the
`frame` vs `stack` distinction in `TokenCursor`. We can push that
further so that `frame` only contains the cursor, and `stack` elements
contain the delimiters for their children. When we are in the outermost
token stream `stack` is empty, so there are no stored delimiters, which
is what we want because the outermost token stream *has* no delimiters.
This change also shrinks `TokenCursor`, which shrinks `Parser` and
`LazyAttrTokenStreamImpl`, which is nice.
Sometimes the parser needs to desugar a doc comment into `#[doc =
r"foo"]`. Currently it does this in a hacky way: by pushing a "fake" new
frame (one without a delimiter) onto the `TokenCursor` stack.
This commit changes things so that the token stream itself is modified
in place. The nice thing about this is that it means
`TokenCursorFrame::delim_sp` is now only `None` for the outermost frame.
Improve diagnostic for missing space in range pattern
Improves the diagnostic in #107425 by turning it into a note explaining the parsing issue.
r? `@compiler-errors`
Revert "review comment: Remove AST AnonTy"
This reverts commit 020cca8d36.
Revert "Ensure macros are not affected"
This reverts commit 12d18e4031.
Revert "Emit fewer errors on patterns with possible type ascription"
This reverts commit c847a01a3b.
Revert "Teach parser to understand fake anonymous enum syntax"
This reverts commit 2d82420665.
The check previously matched this, and suggested adding a missing
`struct`:
pub Foo(...):
It was probably intended to match this instead (semicolon instead of
colon):
pub Foo(...);
Fix invalid float literal suggestions when recovering an integer
Only suggest adding a zero to integers with a preceding dot when the change will result in a valid floating point literal.
For example, `.0x0` should not be turned into `0.0x0`.
r? nnethercote
Only suggest adding a zero to integers with a preceding dot when the change will
result in a valid floating point literal.
For example, `.0x0` should not be turned into `0.0x0`.
Improve unexpected close and mismatch delimiter hint in TokenTreesReader
Fixes#103882Fixes#68987Fixes#69259
The inner indentation mismatching will be covered by outer block, the new added function `report_error_prone_delim_block` will find out the error prone candidates for reporting.
Recover from more const arguments that are not wrapped in curly braces
Recover from some array, borrow, tuple & arithmetic expressions in const argument positions that lack curly braces and provide a suggestion to fix the issue continuing where #92884 left off. Examples of such expressions: `[]`, `[0]`, `[1, 2]`, `[0; 0xff]`, `&9`, `("", 0)` and `(1 + 2) * 3` (we previously did not recover from them).
I am not entirely happy with my current solution because the code that recovers from `[0]` (coinciding with a malformed slice type) and `[0; 0]` (coinciding with a malformed array type) is quite fragile as the aforementioned snippets are actually successfully parsed as types by `parse_ty` since it itself already recovers from them (returning `[⟨error⟩]` and `[⟨error⟩; 0]` respectively) meaning I have to manually look for `TyKind::Err`s and construct a separate diagnostic for the suggestion to attach to (thereby emitting two diagnostics in total).
Fixes#81698.
`@rustbot` label A-diagnostics
r? diagnostics
Teach parser to understand fake anonymous enum syntax
Parse `Ty | OtherTy` in function argument and return types.
Parse type ascription in top level patterns.
Minimally address #100741.
Adds an additional hint to failures where we encounter an else keyword
while we're parsing an if-let block.
This is likely that the user has accidentally mixed if-let and let...else
together.
--wip-- [skip ci]
get the generic text and put it int he suggestion, but suggestion not working on derive subdiagnostic
refactor away from derives and use span_suggestion() instead. Show's the correct(?) generic contents, but overwrites the fn name :(
x fmt
drop commented code and s/todo/fixme
get the correct diagnostic for functions, at least
x fmt
remove some debugs
remove format
remove debugs
remove useless change
remove useless change
remove legacy approach
correct lookahead + error message contains the ident name
fmt
refactor code
tests
add tests
remoev debug
remove comment
Recognise double-equals homoglyph
Recognise `⩵` as a homoglyph for `==`.
The first commit switches `char` to `&str`, as all previous homoglyphs corresponded to a single ASCII character, while the second implements the fix.
`@rustbot` label +A-diagnostics +A-parser
make error emitted on `impl &Trait` nicer
Fixes#106694
Turned out to be simpler than I thought, also added UI test.
Before: ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=9bda53271ef3a8886793cf427b8cea91))
```text
error: expected one of `:`, ``@`,` or `|`, found `)`
--> src/main.rs:2:22
|
2 | fn foo(_: impl &Trait) {}
| ^ expected one of `:`, ``@`,` or `|`
|
= note: anonymous parameters are removed in the 2018 edition (see RFC 1685)
help: if this is a parameter name, give it a type
|
2 | fn foo(_: impl Trait: &TypeName) {}
| ~~~~~~~~~~~~~~~~
help: if this is a type, explicitly ignore the parameter name
|
2 | fn foo(_: impl _: &Trait) {}
| ++
error: expected one of `!`, `(`, `)`, `,`, `?`, `for`, `~`, lifetime, or path, found `&`
--> src/main.rs:2:16
|
2 | fn foo(_: impl &Trait) {}
| -^ expected one of 9 possible tokens
| |
| help: missing `,`
error: expected one of `!`, `(`, `,`, `=`, `>`, `?`, `for`, `~`, lifetime, or path, found `&`
--> src/main.rs:3:11
|
3 | fn bar<T: &Trait>(_: T) {}
| ^ expected one of 10 possible tokens
```
After:
```text
error: expected a trait, found type
--> <anon>:2:16
|
2 | fn foo(_: impl &Trait) {}
| -^^^^^
| |
| help: consider removing the indirection
error: expected a trait, found type
--> <anon>:3:11
|
3 | fn bar<T: &Trait>(_: T) {}
| -^^^^^
| |
| help: consider removing the indirection
```
Emit only one nbsp error per file
Fixes#106101.
See https://github.com/rust-lang/rust/issues/106098 for an explanation of how someone would end up with a large number of these nbsp characters in their source code, which is why I think rustc needs to handle this specific case in a friendlier way.
Emit a single error for contiguous sequences of unknown tokens
Closes#106101
On encountering a sequence of identical source characters which are unknown tokens, note the amount of subsequent characters and advance past them silently. The old behavior was to emit an error and 'help' note for every single one.
`@rustbot` label +A-diagnostics +A-parser
Recover from where clauses placed before tuple struct bodies
Open to any suggestions regarding the phrasing of the diagnostic.
Fixes#100790.
`@rustbot` label A-diagnostics
r? diagnostics
Rollup of 9 pull requests
Successful merges:
- #104531 (Provide a better error and a suggestion for `Fn` traits with lifetime params)
- #105899 (`./x doc library --open` opens `std`)
- #106190 (Account for multiple multiline spans with empty padding)
- #106202 (Trim more paths in obligation types)
- #106234 (rustdoc: simplify settings, help, and copy button CSS by not reusing)
- #106236 (docs/test: add docs and a UI test for `E0514` and `E0519`)
- #106259 (Update Clippy)
- #106260 (Fix index out of bounds issues in rustdoc)
- #106263 (Formatter should not try to format non-Rust files)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Currently, given `Fn`-family traits with lifetime params like
`Fn<'a>(&'a str) -> bool`, many unhelpful errors show up. These are a
bit confusing.
This commit allows these situations to suggest simply using
higher-ranked trait bounds like `for<'a> Fn(&'a str) -> bool`.
Properly calculate best failure in macro matching
Previously, we used spans. This was not good. Sometimes, the span of the token that failed to match may come from a position later in the file which has been transcribed into a token stream way earlier in the file. If precisely this token fails to match, we think that it was the best match because its span is so high, even though other arms might have gotten further in the token stream.
We now try to properly use the location in the token stream.
This needs a little cleanup as the `best_failure` field is getting out of hand but it should be mostly good to go. I hope I didn't violate too many abstraction boundaries..
Always suggest as `MachineApplicable` in `recover_intersection_pat`
This resolves one FIXME in `recover_intersection_pat` by always applying `MachineApplicable` when suggesting, as `bindings_after_at` is now stable.
This also separates a test to apply `// run-rustfix`.
Signed-off-by: Yuki Okushi <jtitor@2k36.org>
Previously, we used spans. This was not good. Sometimes, the span of the
token that failed to match may come from a position later in the file
which has been transcribed into a token stream way earlier in the file.
If precisely this token fails to match, we think that it was the best
match because its span is so high, even though other arms might have
gotten further in the token stream.
We now try to properly use the location in the token stream.
Remove `token::Lit` from `ast::MetaItemLit`.
Currently `ast::MetaItemLit` represents the literal kind twice. This PR removes that redundancy. Best reviewed one commit at a time.
r? `@petrochenkov`
propagate the error from parsing enum variant to the parser and emit out
While parsing enum variant, the error message always disappear
Because the error message that emit out is from main error of parser
The information of enum variant disappears while parsing enum variant with error
We only check the syntax of expecting token, i.e, in case https://github.com/rust-lang/rust/issues/103869
It will error it without telling the message that this error is from pasring enum variant.
Propagate the sub-error from parsing enum variant to the main error of parser by chaining it with map_err
Check the sub-error before emitting the main error of parser and attach it.
Fix https://github.com/rust-lang/rust/issues/103869
These two methods both produce a `MetaItemLit`, and then some of the
call sites convert the `MetaItemLit` to a `token::Lit` with
`as_token_lit`.
This commit parameterises these two methods with a `mk_lit_char`
closure, which can be used to produce either `MetaItemLit` or
`token::Lit` directly as necessary.
Fix ICE on invalid variable declarations in macro calls
This fixes ICE that happens with invalid variable declarations in macro calls like:
```rust
macro_rules! m { ($s:stmt) => {} }
m! { var x }
m! { auto x }
m! { mut x }
```
Found this is because of not collecting tokens on recovery, so I changed to force collect them.
Fixes https://github.com/rust-lang/rust/issues/103529.
Previously, the `recover_local_after_let` function was called from the
body of the `recover_stmt_local` function. Unifying these two functions
make it more simple and more readable.
Because the error message that emit out is from main error of parser
The information of enum variant disappears while parsing enum variant with error
We only check the syntax of expecting token, i.e, in case #103869
It will error it without telling the message that this error is from pasring enum variant.
Propagate the sub-error from parsing enum variant to the main error of parser by chaining it with map_err
Check the sub-error before emitting the main error of parser and attach it.
Fix#103869
`token::Lit` contains a `kind` field that indicates what kind of literal
it is. `ast::MetaItemLit` currently wraps a `token::Lit` but also has
its own `kind` field. This means that `ast::MetaItemLit` encodes the
literal kind in two different ways.
This commit changes `ast::MetaItemLit` so it no longer wraps
`token::Lit`. It now contains the `symbol` and `suffix` fields from
`token::Lit`, but not the `kind` field, eliminating the redundancy.
Lower them into a single item with multiple resolutions instead.
This also allows to remove additional `NodId`s and `DefId`s related to those additional items.
`check_builtin_attribute` calls `parse_meta` to convert an `Attribute`
to a `MetaItem`, which it then checks. However, many callers of
`check_builtin_attribute` start with a `MetaItem`, and then convert it
to an `Attribute` by calling `cx.attribute(meta_item)`. This `MetaItem`
to `Attribute` to `MetaItem` conversion is silly.
This commit adds a new function `check_builtin_meta_item`, which can be
called instead from these call sites. `check_builtin_attribute` also now
calls it. The commit also renames `check_meta` as `check_attr` to better
match its arguments.
Use `as_deref` in compiler (but only where it makes sense)
This simplifies some code :3
(there are some changes that are not exacly `as_deref`, but more like "clever `Option`/`Result` method use")
Split `MacArgs` in two.
`MacArgs` is an enum with three variants: `Empty`, `Delimited`, and `Eq`. It's used in two ways:
- For representing attribute macro arguments (e.g. in `AttrItem`), where all three variants are used.
- For representing function-like macros (e.g. in `MacCall` and `MacroDef`), where only the `Delimited` variant is used.
In other words, `MacArgs` is used in two quite different places due to them having partial overlap. I find this makes the code hard to read. It also leads to various unreachable code paths, and allows invalid values (such as accidentally using `MacArgs::Empty` in a `MacCall`).
This commit splits `MacArgs` in two:
- `DelimArgs` is a new struct just for the "delimited arguments" case. It is now used in `MacCall` and `MacroDef`.
- `AttrArgs` is a renaming of the old `MacArgs` enum for the attribute macro case. Its `Delimited` variant now contains a `DelimArgs`.
Various other related things are renamed as well.
These changes make the code clearer, avoids several unreachable paths, and disallows the invalid values.
r? `@petrochenkov`
`MacArgs` is an enum with three variants: `Empty`, `Delimited`, and `Eq`. It's
used in two ways:
- For representing attribute macro arguments (e.g. in `AttrItem`), where all
three variants are used.
- For representing function-like macros (e.g. in `MacCall` and `MacroDef`),
where only the `Delimited` variant is used.
In other words, `MacArgs` is used in two quite different places due to them
having partial overlap. I find this makes the code hard to read. It also leads
to various unreachable code paths, and allows invalid values (such as
accidentally using `MacArgs::Empty` in a `MacCall`).
This commit splits `MacArgs` in two:
- `DelimArgs` is a new struct just for the "delimited arguments" case. It is
now used in `MacCall` and `MacroDef`.
- `AttrArgs` is a renaming of the old `MacArgs` enum for the attribute macro
case. Its `Delimited` variant now contains a `DelimArgs`.
Various other related things are renamed as well.
These changes make the code clearer, avoids several unreachable paths, and
disallows the invalid values.
Rollup of 8 pull requests
Successful merges:
- #101162 (Migrate rustc_resolve to use SessionDiagnostic, part # 1)
- #103386 (Don't allow `CoerceUnsized` into `dyn*` (except for trait upcasting))
- #103405 (Detect incorrect chaining of if and if let conditions and recover)
- #103594 (Fix non-associativity of `Instant` math on `aarch64-apple-darwin` targets)
- #104006 (Add variant_name function to `LangItem`)
- #104494 (Migrate GUI test to use functions)
- #104516 (rustdoc: clean up sidebar width CSS)
- #104550 (fix a typo)
Failed merges:
- #104554 (Use `ErrorGuaranteed::unchecked_claim_error_was_emitted` less)
r? `@ghost`
`@rustbot` modify labels: rollup
Use `token::Lit` in `ast::ExprKind::Lit`.
Instead of `ast::Lit`.
Literal lowering now happens at two different times. Expression literals are lowered when HIR is crated. Attribute literals are lowered during parsing.
r? `@petrochenkov`
Instead of `ast::Lit`.
Literal lowering now happens at two different times. Expression literals
are lowered when HIR is crated. Attribute literals are lowered during
parsing.
This commit changes the language very slightly. Some programs that used
to not compile now will compile. This is because some invalid literals
that are removed by `cfg` or attribute macros will no longer trigger
errors. See this comment for more details:
https://github.com/rust-lang/rust/pull/102944#issuecomment-1277476773
Recover from function pointer types with generic parameter list
Give a more helpful error when encountering function pointer types with a generic parameter list like `fn<'a>(&'a str) -> bool` or `fn<T>(T) -> T` and suggest moving lifetime parameters to a `for<>` parameter list.
I've added a bunch of extra code to properly handle (unlikely?) corner cases like `for<'a> fn<'b>()` (where there already exists a `for<>` parameter list) correctly suggesting `for<'a, 'b> fn()` (merging the lists). If you deem this useless, I can simplify the code by suggesting nothing at all in this case.
I am quite open to suggestions regarding the wording of the diagnostic messages.
Fixes#103487.
``@rustbot`` label A-diagnostics
r? diagnostics
Delay `include_bytes` to AST lowering
Hopefully addresses #65818.
This PR introduces a new `ExprKind::IncludedBytes` which stores the path and bytes of a file included with `include_bytes!()`. We can then create a literal from the bytes during AST lowering, which means we don't need to escape the bytes into valid UTF8 which is the cause of most of the overhead of embedding large binary blobs.
Recover wrong-cased keywords that start items
(_this pr was inspired by [this tweet](https://twitter.com/Azumanga/status/1552982326409367561)_)
r? `@estebank`
We've talked a bit about this recovery, but I just wanted to make sure that this is the right approach :)
For now I've only added the case insensitive recovery to `use`s, since most other items like `impl` blocks, modules, functions can start with multiple keywords which complicates the matter.
Rollup of 9 pull requests
Successful merges:
- #102763 (Some diagnostic-related nits)
- #103443 (Parser: Recover from using colon as path separator in imports)
- #103675 (remove redundent "<>" for ty::Slice with reference type)
- #104046 (bootstrap: add support for running Miri on a file)
- #104115 (Migrate crate-search element to CSS variables)
- #104190 (Ignore "Change InferCtxtBuilder from enter to build" in git blame)
- #104201 (Add check in GUI test for file loading failure)
- #104211 (⬆️ rust-analyzer)
- #104231 (Update mailmap)
Failed merges:
- #104169 (Migrate `:target` rules to use CSS variables)
r? `@ghost`
`@rustbot` modify labels: rollup
Parser: Recover from using colon as path separator in imports
I don't know if this is the right approach, any feedback is welcome.
r? ```@compiler-errors```
Fixes#103269
Some diagnostic-related nits
1. Use `&mut Diagnostic` instead of `&mut DiagnosticBuilder<'_, T>`
2. Make `diag.span_suggestions` take an `IntoIterator` instead of `Iterator`, just to remove some `.into_iter` calls on the caller.
idk if I should add a lint to make sure people use `&mut Diagnostic` instead of `&mut DiagnosticBuilder<'_, T>` in cases where we're just, e.g., adding subdiagnostics to the diagnostic... maybe a followup.