self-profile: Support arguments for generic_activities.
This PR adds support for recording arguments of "generic activities". The most notable use case is LLVM module names, which should be very interesting for `crox` profiles. In the future it might be interesting to add more fine-grained events for pre-query passes like macro expansion.
I tried to judiciously de-duplicate existing self-profile events with `extra_verbose_generic_activity`, now that the latter also generates self-profile events.
r? @wesleywiser
Add long error code explanation message for E0637
Reference issue [#61137](https://github.com/rust-lang/rust/issues/61137)
To incorporate a long error description for E0637, I have made the necessary modification to error_codes.rs and added error_codes/E0637.md, and blessed the relevant .stderror files. ~~, however when I build rustc stage 1, I am unable to make `$ rustc --explain E0637` work even though rustc appears to be able to call up the long error explanations for other errors. I wanted to guarantee this would work before moving on the blessing the various ui tests that have been affected. @GuillaumeGomez Do you know the most likely reason(s) why this would be the case?~~
Update: `$ rustc --explain E0637` works now.
Remove problematic specialization from RangeInclusive
Fixes#67194 using the approach [outlined by Mark-Simulacrum](https://github.com/rust-lang/rust/issues/67194#issuecomment-581669549).
> I believe the property we want is that if `PartialEq(&self, &other) == true`, then `self.next() == other.next()`. It is true that this is satisfied by removing the specialization and always doing `is_empty.unwrap_or_default()`; the "wrong" behavior there arises from calling `next()` having an effect on initially empty ranges, as we should be in `is_empty = true` but are not (yet) there. It might be possible to detect that the current state is always empty (i.e., `start > end`) and then not fill in the empty slot. I think this might solve the problem without regressing tests; however, this could have performance implications.
> That approach essentially states that we only use the `is_empty` slot for cases where `start <= end`. That means that `Idx: !Step` and `start > end` would both behave the same, and correctly -- we do not need the boolean if we're not ever going to emit any values from the iterator.
This is implemented here by replacing the `is_empty: Option<bool>` slot with an `exhausted: bool` slot. This flag is
- `false` upon construction,
- `false` when iteration has not yielded an element -- importantly, this means it is always `false` for an iterator empty by construction,
- `false` when iteration has yielded an element and the iterator is not exhausted, and
- only `true` when iteration has been used to exhaust the iterator.
For completeness, this also adds a note to the `Debug` representation to note when the range is exhausted.
Rollup of 6 pull requests
Successful merges:
- #68694 (Reduce the number of `RefCell`s in `InferCtxt`.)
- #68966 (Improve performance of coherence checks)
- #68976 (Make `num::NonZeroX::new` an unstable `const fn`)
- #68992 (Correctly parse `mut a @ b`)
- #69005 (Small graphviz improvements for the new dataflow framework)
- #69006 (parser: Keep current and previous tokens precisely)
Failed merges:
r? @ghost
The current code in `SipHasher128::short_write` is inefficient. It uses
`u8to64_le` (which is complex and slow) to extract just the right number of
bytes of the input into a u64 and pad the result with zeroes. It then
left-shifts that value in order to bitwise-OR it with `self.tail`.
For example, imagine we have a u32 input 0xIIHH_GGFF and only need three bytes
to fill up `self.tail`. The current code uses `u8to64_le` to construct
0x0000_0000_00HH_GGFF, which is just 0xIIHH_GGFF with the 0xII removed and
zero-extended to a u64. The code then left-shifts that value by five bytes --
discarding the 0x00 byte that replaced the 0xII byte! -- to give
0xHHGG_FF00_0000_0000. It then then ORs that value with self.tail.
There's a much simpler way to do it: zero-extend to u64 first, then left shift.
E.g. 0xIIHH_GGFF is zero-extended to 0x0000_0000_IIHH_GGFF, and then
left-shifted to 0xHHGG_FF00_0000_0000. We don't have to take time to exclude
the unneeded 0xII byte, because it just gets shifted out anyway! It also avoids
multiple occurrences of `unsafe`.
There's a similar story with the setting of `self.tail` at the method's end.
The current code uses `u8to64_le` to extract the remaining part of the input,
but the same effect can be achieved more quickly with a right shift on the
zero-extended input.
All that works on little-endian. It doesn't work for big-endian, but we
can just do a `to_le` before calling `short_write` and then it works.
This commit changes `SipHasher128` to use the simpler shift-based approach. The
code is also smaller, which means that `short_write` is now inlined where
previously it wasn't, which makes things faster again. This gives big
speed-ups for all incremental builds, especially "baseline" incremental
builds.
parser: Keep current and previous tokens precisely
...including their unnormalized forms.
Add more documentation for them.
Hopefully, this will help to eliminate footguns like https://github.com/rust-lang/rust/pull/68728#discussion_r373787486.
I'll try to address the FIXMEs in separate PRs during the next week.
r? @Centril
Make `num::NonZeroX::new` an unstable `const fn`
cc #53718
These require `#[feature(const_if_match)]`, meaning they must remain unstable for the time being.
Reduce the number of `RefCell`s in `InferCtxt`.
`InferCtxt` contains six structures within `RefCell`s. Every time we
create and dispose of (commit or rollback) a snapshot we have to
`borrow_mut` each one of them.
This commit moves the six structures under a single `RefCell`, which
gives significant speed-ups by reducing the number of `borrow_mut`
calls. To avoid runtime errors I had to reduce the lifetimes of dynamic
borrows in a couple of places.
r? @varkor
Fixes#67844
Previously, opaque types would only get parent generics if they
a return-position-impl-trait (e.g. `fn foo<A>() -> impl MyTrait<A>`).
However, it's possible for opaque types to be nested inside one another:
```rust
trait WithAssoc { type AssocType; }
trait WithParam<A> {}
type Return<A> = impl WithAssoc<AssocType = impl WithParam<A>>;
```
When this occurs, we need to ensure that the nested opaque types
properly inherit generic parameters from their parent opaque type.
This commit fixes the `generics_of` query to take the parent item
into account when determining the generics for an opaque type.
This was previously a future-compat warning that has since been turned into
hard error, but without actually removing all the code.
Avoids a call to `traits::overlapping_impls`, which is expensive.
This has negligible perf impact, but it does improve the code a bit.
* Only query the specialization graph of any trait once instead of once per
impl
* Loop over impls only once, precomputing impl DefId and TraitRef
Improve reporting errors and suggestions for trait bounds
Fix#66802
- When printing errors for unsized function parameter, properly point at the parameter instead of function's body.
- Improve `consider further restricting this bound` (and related) messages by separating human-oriented hints from the machine-oriented ones.
`InferCtxt` contains six structures within `RefCell`s. Every time we
create and dispose of (commit or rollback) a snapshot we have to
`borrow_mut` each one of them.
This commit moves the six structures under a single `RefCell`, which
gives significant speed-ups by reducing the number of `borrow_mut`
calls. To avoid runtime errors I had to reduce the lifetimes of dynamic
borrows in a couple of places.
Rollup of 5 pull requests
Successful merges:
- #68738 (Derive Clone + Eq for std::string::FromUtf8Error)
- #68742 (implement AsMut<str> for String)
- #68881 (rustc_codegen_llvm: always set AlwaysPreserve on all debuginfo variables)
- #68911 (Speed up the inherent impl overlap check)
- #68913 (Pretty-print generic params and where clauses on associated types)
Failed merges:
r? @ghost
Now the line for each statement will show the diff resulting from the
combination of `before_statement_effect` and `statement_effect`. It's
still possible to observe each in isolation via
`borrowck_graphviz_format = "two_phase"`.
rustc_codegen_llvm: always set AlwaysPreserve on all debuginfo variables
Making this depend on the optimization level appears to have been a copy-paste mistake (other LLVM functions called in this module also take a `bool` argument, but there it means something unrelated).
Also see https://github.com/rust-lang/rust/pull/8855#discussion_r374392128.
I don't believe we have any reason to let LLVM omit user variables from DWARF, and we were already setting this to `true` when LLVM *could* optimize them away, so this PR should have no effect anyway.
r? @michaelwoerister or @nagisa cc @hanna-kruppe @nikomatsakis
Derive Clone + Eq for std::string::FromUtf8Error
Implement `Clone` and `Eq` for `std::string::FromUtf8Error`.
Both the inner `Vec<u8>` and `std::str::Utf8Error` are also `Clone + Eq`, so I don't see why we shouldn't derive them on `FromUtf8Error` as well.
(impl are insta-stable, requiring FCP from T-libs.)
Add an option to use LLD to link the compiler on Windows platforms
Based on https://github.com/rust-lang/rust/pull/68609.
Using LLD is good way to improve compile times on Windows since `link.exe` is quite slow. The time for `x.py build --stage 1 src/libtest` goes from 0:12:00 to 0:08:29. Compile time for `rustc_driver` goes from 226.34s to 18.5s. `rustc_macros` goes from 28.69s to 7.7s. The size of `rustc_driver` is also reduced from 83.3 MB to 78.7 MB.
r? @Mark-Simulacrum