Reduce unnecessary escaping in proc_macro::Literal::character/string
I noticed that https://doc.rust-lang.org/proc_macro/struct.Literal.html#method.character is producing unreadable literals that make macro-expanded code unnecessarily hard to read. Since the proc macro server was using `escape_unicode()`, every char is escaped using `\u{…}` regardless of whether there is any need to do so. For example `Literal::character('=')` would previously produce `'\u{3d}'` which unnecessarily obscures the meaning when reading the macro-expanded code.
I've changed Literal::string also in this PR because `str`'s `Debug` impl is also smarter than just calling `escape_debug` on every char. For example `Literal::string("ferris's")` would previously produce `"ferris\'s"` but will now produce `"ferris's"`.
A new matcher representation for use in `parse_tt`
By transforming the matcher into a different form, `parse_tt` can run faster and be easier to understand.
r? `@petrochenkov`
Rework `undocumented_unsafe_blocks`
fixes: #8264fixes: #8449
One thing came up while working on this. Currently comments on the same line are supported like so:
```rust
/* SAFETY: reason */ unsafe {}
```
Is this worth supporting at all? Anything other than a couple of words doesn't really fit well.
edit: [zulip topic](https://rust-lang.zulipchat.com/#narrow/stream/257328-clippy/topic/.60undocumented_unsafe_blocks.60.20same.20line.20comment)
changelog: Don't lint `undocumented_unsafe_blocks` when the unsafe block comes from a proc-macro.
changelog: Don't lint `undocumented_unsafe_blocks` when the preceding line has a safety comment and the unsafe block is a sub-expression.
Improve method name suggestions
Attempts to improve method name suggestions when a matching method name
is not found. The approach taken is use the Levenshtein distance and
account for substrings having a high distance but can sometimes be very
close to the intended method (eg. empty vs is_empty).
resolves#94747
add `empty_structs_with_brackets`
<!-- Thank you for making Clippy better!
We're collecting our changelog from pull request descriptions.
If your PR only includes internal changes, you can just write
`changelog: none`. Otherwise, please write a short comment
explaining your change. Also, it's helpful for us that
the lint name is put into brackets `[]` and backticks `` ` ` ``,
e.g. ``[`lint_name`]``.
If your PR fixes an issue, you can add "fixes #issue_number" into this
PR description. This way the issue will be automatically closed when
your PR is merged.
If you added a new lint, here's a checklist for things that will be
checked during review or continuous integration.
- \[ ] Followed [lint naming conventions][lint_naming]
- \[ ] Added passing UI tests (including committed `.stderr` file)
- \[ ] `cargo test` passes locally
- \[ ] Executed `cargo dev update_lints`
- \[ ] Added lint documentation
- \[ ] Run `cargo dev fmt`
[lint_naming]: https://rust-lang.github.io/rfcs/0344-conventions-galore.html#lints
Note that you can skip the above if you are just opening a WIP PR in
order to get feedback.
Delete this line and everything above before opening your PR.
--
*Please write a short comment explaining your change (or "none" for internal only changes)*
-->
Closes#8591
I'm already sorry for the massive diff 😅
changelog: New lint [`empty_structs_with_brackets`]
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
Do not use `ParamEnv::and` when building a cache key from a param-env and trait eval candidate
Do not use `ParamEnv::and` to cache a param-env with a selection/evaluation candidate.
This is because if the param-env is `RevealAll` mode, and the candidate looks global (i.e. it has erased regions, which can show up when we normalize a projection type under a binder<sup>1</sup>), then when we use `ParamEnv::and` to pair the candidate and the param-env for use as a cache key, we will throw away the param-env's caller bounds, and we'll end up caching a candidate that we inferred from the param-env with a empty param-env, which may cause cache-hit later when we have an empty param-env, and possibly mess with normalization like we see in the referenced issue during codegen.
Not sure how to trigger this with a more structured test, but changing `check-pass` to `build-pass` triggers the case that https://github.com/rust-lang/rust/issues/94903 detected.
<sup>1.</sup> That is, we will replace the late-bound region with a placeholder, which gets canonicalized and turned into an infererence variable, which gets erased during region freshening right before we cache the result. Sorry, it's quite a few steps.
Fixes#94903
r? `@Aaron1011` (or reassign as you see fit)
Mark Location::caller() as #[inline]
This function gets compiled to a single register move as it actually gets it's return value passed in as argument.
Fix &mut invalidation in ptr::swap doctest
Under Stacked Borrows with raw pointer tagging, the previous code was UB
because the code which creates the the second pointer borrows the array
through a tag in the borrow stacks below the Unique tag that our first
pointer is based on, thus invalidating the first pointer.
This is not definitely a bug and may never be real UB, but I desperately
want people to write code that conforms to SB with raw pointer tagging
so that I can write good diagnostics. The alternative aliasing models
aren't possible to diagnose well due to state space explosion.
Therefore, it would be super cool if the standard library nudged people
towards writing code that is valid with respect to SB with raw pointer
tagging.
The diagnostics that I want to write are implemented in a branch of Miri and the one for this case is below:
```
error: Undefined Behavior: attempting a read access using <2170> at alloc1068[0x0], but that tag does not exist in the borrow stack for this location
--> /home/ben/rust/library/core/src/intrinsics.rs:2103:14
|
2103 | unsafe { copy_nonoverlapping(src, dst, count) }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| attempting a read access using <2170> at alloc1068[0x0], but that tag does not exist in the borrow stack for this location
| this error occurs as part of an access at alloc1068[0x0..0x8]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <2170> was created due to a retag at offsets [0x0..0x10]
--> ../libcore/src/ptr/mod.rs:640:9
|
8 | let x = array[0..].as_mut_ptr() as *mut [u32; 2]; // this is `array[0..2]`
| ^^^^^^^^^^^^^^^^^^^^^^^
help: <2170> was later invalidated due to a retag at offsets [0x0..0x10]
--> ../libcore/src/ptr/mod.rs:641:9
|
9 | let y = array[2..].as_mut_ptr() as *mut [u32; 2]; // this is `array[2..4]`
| ^^^^^
= note: inside `std::intrinsics::copy_nonoverlapping::<[u32; 2]>` at /home/ben/rust/library/core/src/intrinsics.rs:2103:14
= note: inside `std::ptr::swap::<[u32; 2]>` at /home/ben/rust/library/core/src/ptr/mod.rs:685:9
note: inside `main::_doctest_main____libcore_src_ptr_mod_rs_635_0` at ../libcore/src/ptr/mod.rs:12:5
--> ../libcore/src/ptr/mod.rs:644:5
|
12 | ptr::swap(x, y);
| ^^^^^^^^^^^^^^^
note: inside `main` at ../libcore/src/ptr/mod.rs:15:3
--> ../libcore/src/ptr/mod.rs:647:3
|
15 | } _doctest_main____libcore_src_ptr_mod_rs_635_0() }
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace
error: aborting due to previous error
```
Don't emit non-asm contents error for naked function composed of errors
## Motivation
For naked functions an error is emitted when they are composed of anything other than a single asm!() block. However, this error triggers in a couple situations in which it adds no additional information or is actively misleading.
One example is if you do have an asm!() block but simply one with a syntax error:
```rust
#[naked]
unsafe extern "C" fn compiler_errors() {
asm!(invalid_syntax)
}
```
This results in two errors, one for the syntax error itself and another telling you that you need an asm block in your function:
```rust
error[E0787]: naked functions must contain a single asm block
--> src/main.rs:6:1
|
6 | / unsafe extern "C" fn naked_compile_error() {
7 | | asm!(blah)
8 | | }
| |_^
```
This issue also comes up when [utilizing `compile_error!()` for improving your diagnostics](https://twitter.com/steveklabnik/status/1509538243020218372), such as raising a compiler error when compiling for an unsupported target.
## Implementation
The rules this PR implements are as follows:
1. If any non-erroneous non-asm statement is included, an error will still occur
2. If multiple asm statements are included, an error will still occur
3. If 0 or 1 asm statements are present, as well as any non-zero number of erroneous statements, then this error will *not* be raised as it is likely either redundant or incorrect
The rule of thumb is effectively "if an error is present and its correction could change things, don't raise an error".
Reduce the cost of loading all built-ins targets
This PR started by measuring the exact slowdown of checking of well known conditional values.
Than this PR implemented some technics to reduce the cost of loading all built-ins targets.
cf. https://github.com/rust-lang/rust/issues/82450#issuecomment-1073992323
libc::prctl and the prctl definitions in glibc, musl, and the kernel
headers are C variadic functions. Therefore, all the arguments (except
for the first) are untyped. It is only the Linux man page which says
that prctl takes 4 unsigned long arguments. I have no idea why it says
this.
In any case, the upshot is that we don't need to cast the pointer to an
integer and confuse Miri.
Under Stacked Borrows with raw pointer tagging, the previous code was UB
because the code which creates the the second pointer borrows the array
through a tag in the borrow stacks below the Unique tag that our first
pointer is based on, thus invalidating the first pointer.
This is not definitely a bug and may never be real UB, but I desperately
want people to write code that conforms to SB with raw pointer tagging
so that I can write good diagnostics. The alternative aliasing models
aren't possible to diagnose well due to state space explosion.
Therefore, it would be super cool if the standard library nudged people
towards writing code that is valid with respect to SB with raw pointer
tagging.