Replace parse_[sth]_expr with parse_expr_[sth] function names
This resolves an inconsistency in naming style for functions on the parser, where:
* functions parsing specific kinds of items are named `parse_item_[sth]` and
* functions parsing specific kinds of *expressions* are named `parse_[sth]_expr`
favoring the style used by functions for items. There are multiple advantages of that style:
* functions of both categories are collected in the same place in the [rustdoc output](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html).
* it helps with autocompletion, as you can narrow down your search for a function to those about expressions.
* it mirrors rust's path syntax where less specific things come first, then it gets more specific, i.e. `std::collections::hash_map::Entry`.
The disadvantage is that it doesn't "read like a sentence" any more. But I think the advantages weigh more greatly.
This change was mostly application of this command:
```
sed -i -E 's/(fn |\.)parse_([[:alnum:]_]+)_expr/\1parse_expr_\2/' compiler/rustc_parse/src/parser/*.rs
```
Plus very minor fixes outside of `rustc_parse`, and an invocation of `x fmt`.
This resolves an inconsistency in naming style for functions
on the parser, between functions parsing specific kinds of items
and those for expressions, favoring the parse_item_[sth] style
used by functions for items. There are multiple advantages
of that style:
* functions of both categories are collected in the same place
in the rustdoc output.
* it helps with autocompletion, as you can narrow down your
search for a function to those about expressions.
* it mirrors rust's path syntax where less specific things
come first, then it gets more specific, i.e.
std::collections::hash_map::Entry
The disadvantage is that it doesn't "read like a sentence"
any more, but I think the advantages weigh more greatly.
This change was mostly application of this command:
sed -i -E 's/(fn |\.)parse_([[:alnum:]_]+)_expr/\1parse_expr_\2/' compiler/rustc_parse/src/parser/*.rs
Plus very minor fixes outside of rustc_parse, and an invocation
of x fmt.
We currently provide wrong suggestions and unhelpful errors on closure
bodies with braces missing. For example, given the following code:
```
fn main() {
let _x = Box::new(|x|x+1;);
}
```
the current output is like this:
```
error: expected expression, found `)`
--> ./main.rs:2:30
|
2 | let _x = Box::new(|x|x+1;);
| ^ expected expression
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^
3 | }
| ^
|
...
help: try adding braces
|
2 ~ let _x = Box::new(|x| {x+1;);
3 ~ }}
...
error: expected `;`, found `}`
--> ./main.rs:2:32
|
2 | let _x = Box::new(|x|x+1;);
| ^ help: add `;` here
3 | }
| - unexpected token
error: aborting due to 3 previous errors
```
This commit allows outputting correct suggestions and errors. The above
code would output like this:
```
error: closure bodies that contain statements must be surrounded by braces
--> ./main.rs:2:25
|
2 | let _x = Box::new(|x|x+1;);
| ^ ^
|
note: statement found outside of a block
--> ./main.rs:2:29
|
2 | let _x = Box::new(|x|x+1;);
| ---^ this `;` turns the preceding closure into a statement
| |
| this expression is a statement because of the trailing semicolon
note: the closure body may be incorrectly delimited
--> ./main.rs:2:23
|
2 | let _x = Box::new(|x|x+1;);
| ^^^^^^ - ...but likely you meant the closure to end here
| |
| this is the parsed closure...
help: try adding braces
|
2 | let _x = Box::new(|x| {x+1;});
| + +
error: aborting due to previous error
```
Instead of loading the Fluent resources for every crate in
`rustc_error_messages`, each crate generates typed identifiers for its
own diagnostics and creates a static which are pulled together in the
`rustc_driver` crate and provided to the diagnostic emitter.
Signed-off-by: David Wood <david.wood@huawei.com>
Use restricted Damerau-Levenshtein distance for diagnostics
This replaces the existing Levenshtein algorithm with the Damerau-Levenshtein algorithm. This means that "ab" to "ba" is one change (a transposition) instead of two (a deletion and insertion). More specifically, this is a _restricted_ implementation, in that "ca" to "abc" cannot be performed as "ca" → "ac" → "abc", as there is an insertion in the middle of a transposition. I believe that errors like that are sufficiently rare that it's not worth taking into account.
This was first brought up [on IRLO](https://internals.rust-lang.org/t/18227) when it was noticed that the diagnostic for `prinltn!` (transposed L and T) was `print!` and not `println!`. Only a single existing UI test was effected, with the result being an objective improvement.
~~I have left the method name and various other references to the Levenshtein algorithm untouched, as the exact manner in which the edit distance is calculated should not be relevant to the caller.~~
r? ``@estebank``
``@rustbot`` label +A-diagnostics +C-enhancement
Don't recover lifetimes/labels containing emojis as character literals
Fixes#108019.
Note that at the time of this commit, `unic-emoji-char` seems to have data tables only up to Unicode 5.0, but Unicode is already newer than this.
A newer emoji such as `🥺` will not be recognized as an emoji but older emojis such as `🐱` will.
This PR leaves a couple of FIXMEs where `unic_emoji_char::is_emoji` is used.
Suggest fix for misplaced generic params on fn item #103366fixes#103366
This still has some work to go, but works for 2/3 of the initial base cases described in #1033366
simple fn:
```
error: expected identifier, found `<`
--> shreys/test_1.rs:1:3
|
1 | fn<T> id(x: T) -> T { x }
| ^ expected identifier
|
help: help: place the generic parameter list after the function name:
|
1 | fn id<T>(x: T) -> T { x }
| ~~~~
```
Complicated bounds
```
error: expected identifier, found `<`
--> spanishpear/test_2.rs:1:3
|
1 | fn<'a, B: 'a + std::ops::Add<Output = u32>> f(_x: B) { }
| ^ expected identifier
|
help: help: place the generic parameter list after the function name:
|
1 | fn f<'a, B: 'a + std::ops::Add<Output = u32>>(_x: B) { }
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Opening a draft PR for comments on approach, particularly I have the following questions:
- [x] Is it okay to be using `err.span_suggestion` over struct derives? I struggled to get the initial implementation (particularly the correct suggestion message) on struct derives, although I think given what I've learned since starting, I could attempt re-doing it with that approach.
- [x] in the case where the snippet cannot be obtained from a span, is the `help` but no suggestion okay? I think yes (also, when does this case occur?)
- [x] are there any red flags for the generalisation of this work for relevant item kinds (i.e. `struct`, `enum`, `trait`, and `union`). My basic testing indicates it does work for those types except the help tip is currently hardcoded to `after the function name` - which should change dependent on the item.
- [x] I am planning to not show the suggestion if there is already a `<` after the item identifier, (i.e. if there are already generics, as after a function name per the original issue). Any major objections?
- [x] Is the style of error okay? I wasn't sure if there was a way to make it display nicer, or if thats handled by span_suggestion
These aren't blocking questions, and I will keep working on:
- check if there is a `<` after the ident (and if so, not showing the suggestion)
- generalize the help message
- figuring out how to write/run/etc ui tests (including reading the docs for them)
- logic cleanups
Note that at the time of this commit, `unic-emoji-char` seems to have
data tables only up to Unicode 5.0, but Unicode is already newer than
this.
A newer emoji such as `🥺` will not be recognized as an emoji
but older emojis such as `🐱` will.
The motivation here is to eliminate the `Option<(Delimiter,
DelimSpan)>`, which is `None` for the outermost token stream and `Some`
for all other token streams.
We are already treating the innermost frame specially -- this is the
`frame` vs `stack` distinction in `TokenCursor`. We can push that
further so that `frame` only contains the cursor, and `stack` elements
contain the delimiters for their children. When we are in the outermost
token stream `stack` is empty, so there are no stored delimiters, which
is what we want because the outermost token stream *has* no delimiters.
This change also shrinks `TokenCursor`, which shrinks `Parser` and
`LazyAttrTokenStreamImpl`, which is nice.