Cached stable hash cleanups
r? `@nnethercote`
Add a sanity assertion in debug mode to check that the cached hashes are actually the ones we get if we compute the hash each time.
Add a new data structure that bundles all the hash-caching work to make it easier to re-use it for different interned data structures
Enforce well formedness for type alias impl trait's hidden type
fixes#84657
This was not an issue with return-position-impl-trait because the generic bounds of the function are the same as those of the opaque type, and the hidden type must already be well formed within the function.
With type-alias-impl-trait the hidden type could be defined in a function that has *more* lifetime bounds than the type alias. This is fine, but the hidden type must still be well formed without those additional bounds.
This allows profiling costly arguments to be recorded only when `-Zself-profile-events=args` is on: using a closure that takes an `EventArgRecorder` and call its `record_arg` or `record_args` methods.
Add debug assertions to some unsafe functions
As suggested by https://github.com/rust-lang/rust/issues/51713
~~Some similar code calls `abort()` instead of `panic!()` but aborting doesn't work in a `const fn`, and the intrinsic for doing dispatch based on whether execution is in a const is unstable.~~
This picked up some invalid uses of `get_unchecked` in the compiler, and fixes them.
I can confirm that they do in fact pick up invalid uses of `get_unchecked` in the wild, though the user experience is less-than-awesome:
```
Running unittests (target/x86_64-unknown-linux-gnu/debug/deps/rle_decode_fast-04b7918da2001b50)
running 6 tests
error: test failed, to rerun pass '--lib'
Caused by:
process didn't exit successfully: `/home/ben/rle-decode-helper/target/x86_64-unknown-linux-gnu/debug/deps/rle_decode_fast-04b7918da2001b50` (signal: 4, SIGILL: illegal instruction)
```
~~As best I can tell these changes produce a 6% regression in the runtime of `./x.py test` when `[rust] debug = true` is set.~~
Latest commit (6894d559bd) brings the additional overhead from this PR down to 0.5%, while also adding a few more assertions. I think this actually covers all the places in `core` that it is reasonable to check for safety requirements at runtime.
Thoughts?
Rollup of 5 pull requests
Successful merges:
- #95294 (Document Linux kernel handoff in std::io::copy and std::fs::copy)
- #95443 (Clarify how `src/tools/x` searches for python)
- #95452 (fix since field version for termination stabilization)
- #95460 (Spellchecking compiler code)
- #95461 (Spellchecking some comments)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Lazy type-alias-impl-trait take two
### user visible change 1: RPIT inference from recursive call sites
Lazy TAIT has an insta-stable change. The following snippet now compiles, because opaque types can now have their hidden type set from wherever the opaque type is mentioned.
```rust
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return 42
}
let x: u32 = bar(false); // this errors on stable
99
}
```
The return type of `bar` stays opaque, you can't do `bar(false) + 42`, you need to actually mention the hidden type.
### user visible change 2: divergence between RPIT and TAIT in return statements
Note that `return` statements and the trailing return expression are special with RPIT (but not TAIT). So
```rust
#![feature(type_alias_impl_trait)]
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![42];
}
std::iter::empty().collect() //~ ERROR `Foo` cannot be built from an iterator
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![42]
}
std::iter::empty().collect() // Works, magic (accidentally stabilized, not intended)
}
```
But when we are working with the return value of a recursive call, the behavior of RPIT and TAIT is the same:
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
return vec![];
}
let mut x = foo(false);
x = std::iter::empty().collect(); //~ ERROR `Foo` cannot be built from an iterator
vec![]
}
fn bar(b: bool) -> impl std::fmt::Debug {
if b {
return vec![];
}
let mut x = bar(false);
x = std::iter::empty().collect(); //~ ERROR `impl Debug` cannot be built from an iterator
vec![]
}
```
### user visible change 3: TAIT does not merge types across branches
In contrast to RPIT, TAIT does not merge types across branches, so the following does not compile.
```rust
type Foo = impl std::fmt::Debug;
fn foo(b: bool) -> Foo {
if b {
vec![42_i32]
} else {
std::iter::empty().collect()
//~^ ERROR `Foo` cannot be built from an iterator over elements of type `_`
}
}
```
It is easy to support, but we should make an explicit decision to include the additional complexity in the implementation (it's not much, see a721052457cf513487fb4266e3ade65c29b272d2 which needs to be reverted to enable this).
### PR formalities
previous attempt: #92007
This PR also includes #92306 and #93783, as they were reverted along with #92007 in #93893fixes#93411fixes#88236fixes#89312fixes#87340fixes#86800fixes#86719fixes#84073fixes#83919fixes#82139fixes#77987fixes#74282fixes#67830fixes#62742fixes#54895
These debug assertions are all implemented only at runtime using
`const_eval_select`, and in the error path they execute
`intrinsics::abort` instead of being a normal debug assertion to
minimize the impact of these assertions on code size, when enabled.
Of all these changes, the bounds checks for unchecked indexing are
expected to be most impactful (case in point, they found a problem in
rustc).
`Layout` is another type that is sometimes interned, sometimes not, and
we always use references to refer to it so we can't take any advantage
of the uniqueness properties for hashing or equality checks.
This commit renames `Layout` as `LayoutS`, and then introduces a new
`Layout` that is a newtype around an `Interned<LayoutS>`. It also
interns more layouts than before. Previously layouts within layouts
(via the `variants` field) were never interned, but now they are. Hence
the lifetime on the new `Layout` type.
Unlike other interned types, these ones are in `rustc_target` instead of
`rustc_middle`. This reflects the existing structure of the code, which
does layout-specific stuff in `rustc_target` while `TyAndLayout` is
generic over the `Ty`, allowing the type-specific stuff to occur in
`rustc_middle`.
The commit also adds a `HashStable` impl for `Interned`, which was
needed. It hashes the contents, unlike the `Hash` impl which hashes the
pointer.
Currently some `Allocation`s are interned, some are not, and it's very
hard to tell at a use point which is which.
This commit introduces `ConstAllocation` for the known-interned ones,
which makes the division much clearer. `ConstAllocation::inner()` is
used to get the underlying `Allocation`.
In some places it's natural to use an `Allocation`, in some it's natural
to use a `ConstAllocation`, and in some places there's no clear choice.
I've tried to make things look as nice as possible, while generally
favouring `ConstAllocation`, which is the type that embodies more
information. This does require quite a few calls to `inner()`.
The commit also tweaks how `PartialOrd` works for `Interned`. The
previous code was too clever by half, building on `T: Ord` to make the
code shorter. That caused problems with deriving `PartialOrd` and `Ord`
for `ConstAllocation`, so I changed it to build on `T: PartialOrd`,
which is slightly more verbose but much more standard and avoided the
problems.
Always include global target features in function attributes
This ensures that information about target features configured with
`-C target-feature=...` or detected with `-C target-cpu=native` is
retained for subsequent consumers of LLVM bitcode.
This is crucial for linker plugin LTO, since this information is not
conveyed to the plugin otherwise.
<details><summary>Additional test case demonstrating the issue</summary>
```rust
extern crate core;
#[inline]
#[target_feature(enable = "aes")]
unsafe fn f(a: u128, b: u128) -> u128 {
use core::arch::x86_64::*;
use core::mem::transmute;
transmute(_mm_aesenc_si128(transmute(a), transmute(b)))
}
pub fn g(a: u128, b: u128) -> u128 {
unsafe { f(a, b) }
}
fn main() {
let mut args = std::env::args();
let _ = args.next().unwrap();
let a: u128 = args.next().unwrap().parse().unwrap();
let b: u128 = args.next().unwrap().parse().unwrap();
println!("{}", g(a, b));
}
```
```console
$ rustc --edition=2021 a.rs -Clinker-plugin-lto -Clink-arg=-fuse-ld=lld -Ctarget-feature=+aes -O
...
= note: LLVM ERROR: Cannot select: intrinsic %llvm.x86.aesni.aesenc
```
</details>
r? `@nagisa`
Rollup of 9 pull requests
Successful merges:
- #94464 (Suggest adding a new lifetime parameter when two elided lifetimes should match up for traits and impls.)
- #94476 (7 - Make more use of `let_chains`)
- #94478 (Fix panic when handling intra doc links generated from macro)
- #94482 (compiler: fix some typos)
- #94490 (Update books)
- #94496 (tests: accept llvm intrinsic in align-checking test)
- #94498 (9 - Make more use of `let_chains`)
- #94503 (Provide C FFI types via core::ffi, not just in std)
- #94513 (update Miri)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid query cache sharding code in single-threaded mode
In non-parallel compilers, this is just adding needless overhead at compilation time (since there is only one shard statically anyway). This amounts to roughly ~10 seconds reduction in bootstrap time, with overall neutral (some wins, some losses) performance results.
Parallel compiler performance should be largely unaffected by this PR; sharding is kept there.
Avoid exhausting stack space in dominator compression
Doesn't add a test case -- I ended up running into this while playing with the generated example from #43578, which we could do with a run-make test (to avoid checking a large code snippet into tree), but I suspect we don't want to wait for it to compile (locally it takes ~14s -- not terrible, but doesn't seem worth it to me). In practice stack space exhaustion is difficult to test for, too, since if we set the bound too low a different call structure above us (e.g., a nearer ensure_sufficient_stack call) would let the test pass even with the old impl, most likely.
Locally it seems like this manages to perform approximately equivalently to the recursion, but will run perf to confirm.
Introduce `ChunkedBitSet` and use it for some dataflow analyses.
This reduces peak memory usage significantly for some programs with very
large functions.
r? `@ghost`
This reduces peak memory usage significantly for some programs with very
large functions, such as:
- `keccak`, `unicode_normalization`, and `match-stress-enum`, from
the `rustc-perf` benchmark suite;
- `http-0.2.6` from crates.io.
The new type is used in the analyses where the bitsets can get huge
(e.g. 10s of thousands of bits): `MaybeInitializedPlaces`,
`MaybeUninitializedPlaces`, and `EverInitializedPlaces`.
Some refactoring was required in `rustc_mir_dataflow`. All existing
analysis domains are either `BitSet` or a trivial wrapper around
`BitSet`, and access in a few places is done via `Borrow<BitSet>` or
`BorrowMut<BitSet>`. Now that some of these domains are `ClusterBitSet`,
that no longer works. So this commit replaces the `Borrow`/`BorrowMut`
usage with a new trait `BitSetExt` containing the needed bitset
operations. The impls just forward these to the underlying bitset type.
This required fiddling with trait bounds in a few places.
The commit also:
- Moves `static_assert_size` from `rustc_data_structures` to
`rustc_index` so it can be used in the latter; the former now
re-exports it so existing users are unaffected.
- Factors out some common "clear excess bits in the final word"
functionality in `bit_set.rs`.
- Uses `fill` in a few places instead of loops.
Allow inlining of `ensure_sufficient_stack()`
This functions is monomorphized a lot and allowing the compiler to inline it improves instructions count and max RSS significantly in my local tests.
In particular, there's now more protection against incorrect usage,
because you can only create one via `Interned::new_unchecked`, which
makes it more obvious that you must be careful.
There are also some tests.
Compress amount of hashed bytes for `isize` values in StableHasher
This is another attempt to land https://github.com/rust-lang/rust/pull/92103, this time hopefully with a correct implementation w.r.t. stable hashing guarantees. The previous PR was [reverted](https://github.com/rust-lang/rust/pull/93014) because it could produce the [same hash](https://github.com/rust-lang/rust/pull/92103#issuecomment-1014625442) for different values even in quite simple situations. I have since added a basic [test](https://github.com/rust-lang/rust/pull/93193) that should guard against that situation, I also added a new test in this PR, specialised for this optimization.
## Why this optimization helps
Since the original PR, I have tried to analyze why this optimization even helps (and why it especially helps for `clap`). I found that the vast majority of stable-hashing `i64` actually comes from hashing `isize` (which is converted to `i64` in the stable hasher). I only found a single place where is this datatype used directly in the compiler, and this place has also been showing up in traces that I used to find out when is `isize` being hashed. This place is `rustc_span::FileName::DocTest`, however, I suppose that isizes also come from other places, but they might not be so easy to find (there were some other entries in the trace). `clap` hashes about 8.5 million `isize`s, and all of them fit into a single byte, which is why this optimization has helped it [quite a lot](https://github.com/rust-lang/rust/pull/92103#issuecomment-1005711861).
Now, I'm not sure if special casing `isize` is the correct solution here, maybe something could be done with that `isize` inside `DocTest` or in other places, but that's for another discussion I suppose. In this PR, instead of hardcoding a special case inside `SipHasher128`, I instead put it into `StableHasher`, and only used it for `isize` (I tested that for `i64` it doesn't help, or at least not for `clap` and other few benchmarks that I was testing).
## New approach
Since the most common case is a single byte, I added a fast path for hashing `isize` values which positive value fits within a single byte, and a cold path for the rest of the values.
To avoid the previous correctness problem, we need to make sure that each unique `isize` value will produce a unique hash stream to the hasher. By hash stream I mean a sequence of bytes that will be hashed (a different sequence should produce a different hash, but that is of course not guaranteed).
We have to distinguish different values that produce the same bit pattern when we combine them. For example, if we just simply skipped the leading zero bytes for values that fit within a single byte, `(0xFF, 0xFFFFFFFFFFFFFFFF)` and `(0xFFFFFFFFFFFFFFFF, 0xFF)` would send the same hash stream to the hasher, which must not happen.
To avoid this situation, values `[0, 0xFE]` are hashed as a single byte. When we hash a larger (treating `isize` as `u64`) value, we first hash an additional byte `0xFF`. Since `0xFF` cannot occur when we apply the single byte optimization, we guarantee that the hash streams will be unique when hashing two values `(a, b)` and `(b, a)` if `a != b`:
1) When both `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
2) When neither `a` and `b` are within `[0, 0xFE]`, their hash streams will be different.
3) When `a` is within `[0, 0xFE]` and `b` isn't, when we hash `(a, b)`, the hash stream will definitely not begin with `0xFF`. When we hash `(b, a)`, the hash stream will definitely begin with `0xFF`. Therefore the hash streams will be different.
r? `@the8472`
Make `Fingerprint::combine_commutative` associative
The previous implementation swapped lower and upper 64-bits of a result
of modular addition, so the function was non-associative.
r? `@Aaron1011`
Add test for stable hash uniqueness of adjacent field values
This PR adds a simple test to check that stable hash will produce a different hash if the order of two values that have the same combined bit pattern changes.
r? `@the8472`
`Decoder` has two impls:
- opaque: this impl is already partly infallible, i.e. in some places it
currently panics on failure (e.g. if the input is too short, or on a
bad `Result` discriminant), and in some places it returns an error
(e.g. on a bad `Option` discriminant). The number of places where
either happens is surprisingly small, just because the binary
representation has very little redundancy and a lot of input reading
can occur even on malformed data.
- json: this impl is fully fallible, but it's only used (a) for the
`.rlink` file production, and there's a `FIXME` comment suggesting it
should change to a binary format, and (b) in a few tests in
non-fundamental ways. Indeed #85993 is open to remove it entirely.
And the top-level places in the compiler that call into decoding just
abort on error anyway. So the fallibility is providing little value, and
getting rid of it leads to some non-trivial performance improvements.
Much of this commit is pretty boring and mechanical. Some notes about
a few interesting parts:
- The commit removes `Decoder::{Error,error}`.
- `InternIteratorElement::intern_with`: the impl for `T` now has the same
optimization for small counts that the impl for `Result<T, E>` has,
because it's now much hotter.
- Decodable impls for SmallVec, LinkedList, VecDeque now all use
`collect`, which is nice; the one for `Vec` uses unsafe code, because
that gave better perf on some benchmarks.
Update rayon and rustc-rayon
This updates rayon for various tools and rustc-rayon for the compiler's parallel mode.
- rayon v1.3.1 -> v1.5.1
- rayon-core v1.7.1 -> v1.9.1
- rustc-rayon v0.3.1 -> v0.3.2
- rustc-rayon-core v0.3.1 -> v0.3.2
... and indirectly, this updates all of crossbeam-* to their latest versions.
Fixes#92677 by removing crossbeam-queue, but there's still a lingering question about how tidy discovers "runtime" dependencies. None of this is truly in the standard library's dependency tree at all.
Replace usages of vec![].into_iter with [].into_iter
`[].into_iter` is idiomatic over `vec![].into_iter` because its simpler and faster (unless the vec is optimized away in which case it would be the same)
So we should change all the implementation, documentation and tests to use it.
I skipped:
* `src/tools` - Those are copied in from upstream
* `src/test/ui` - Hard to tell if `vec![].into_iter` was used intentionally or not here and not much benefit to changing it.
* any case where `vec![].into_iter` was used because we specifically needed a `Vec::IntoIter<T>`
* any case where it looked like we were intentionally using `vec![].into_iter` to test it.
Fixes#92266
In some `HashStable` impls, we use a cache to avoid re-computing
the same `Fingerprint` from the same structure (e.g. an `AdtDef`).
However, the `StableHashingContext` used can be configured to
perform hashing in different ways (e.g. skipping `Span`s). This
configuration information is not included in the cache key,
which will cause an incorrect `Fingerprint` to be used if
we hash the same structure with different `StableHashingContext`
settings.
To fix this, the configuration settings of `StableHashingContext`
are split out into a separate `HashingControls` struct. This
struct is used as part of the cache key, ensuring that our caches
always produce the correct result for the given settings.
With this in place, we now turn off `Span` hashing during the
entire process of computing the hash included in legacy symbols.
This current has no effect, but will matter when a future PR
starts hashing more `Span`s that we currently skip.
Implement StableHash for BitSet and BitMatrix via Hash
This fixes an issue where bit sets / bit matrices the same word
content but a different domain size would receive the same hash.
Avoid sorting in hash map stable hashing
Suggested by `@the8472` [here](https://github.com/rust-lang/rust/pull/89404#issuecomment-991813333). I hope that I understood it right, I replaced the sort with modular multiplication, which should be commutative.
Can I ask for a perf. run? However, locally it didn't help at all. Creating the `StableHasher` all over again is probably slowing it down quite a lot. And using `FxHasher` is not straightforward, because the keys and values only implement `HashStable` (and probably they shouldn't be just hashed via `Hash` anyway for it to actually be stable).
Maybe the `StableHash` interface could be changed somehow to better suppor these scenarios where the hasher is short-lived. Or the `StableHasher` implementation could have variants with e.g. a shorter buffer for these scenarios.
Slightly optimize hash map stable hashing
I was profiling some of the `rustc-perf` benchmarks locally and noticed that quite some time is spent inside the stable hash of hashmaps. I tried to use a `SmallVec` instead of a `Vec` there, which helped very slightly.
Then I tried to remove the sorting, which was a bottleneck, and replaced it with insertion into a binary heap. Locally, it yielded nice improvements in instruction counts and RSS in several benchmarks for incremental builds. The implementation could probably be much nicer and possibly extended to other stable hashes, but first I wanted to test the perf impact properly.
Can I ask someone to do a perf run? Thank you!
This largely avoids remapping from and to the 'real' indices, with the exception
of predecessor lookup and the final merge back, and is conceptually better.
As the paper indicates, the unprocessed vertices in the DFS tree and processed
vertices are disjoint, and we can use them in the same space, tracking only the index
of the split.
This replaces the previous implementation with the simple variant of
Lengauer-Tarjan, which performs better in the general case. Performance on the
keccak benchmark is about equivalent between the two, but we don't see
regressions (and indeed see improvements) on other benchmarks, even on a
partially optimized implementation.
The implementation here follows that of the pseudocode in "Linear-Time
Algorithms for Dominators and Related Problems" thesis by Loukas Georgiadis. The
next few commits will optimize the implementation as suggested in the thesis.
Several related works are cited in the comments within the implementation, as
well.
Implement the simple Lengauer-Tarjan algorithm
This replaces the previous implementation (from #34169), which has not been
optimized since, with the simple variant of Lengauer-Tarjan which performs
better in the general case. A previous attempt -- not kept in commit history --
attempted a replacement with a bitset-based implementation, but this led to
regressions on perf.rust-lang.org benchmarks and equivalent wins for the keccak
benchmark, so was rejected.
The implementation here follows that of the pseudocode in "Linear-Time
Algorithms for Dominators and Related Problems" thesis by Loukas Georgiadis. The
next few commits will optimize the implementation as suggested in the thesis.
Several related works are cited in the comments within the implementation, as
well.
On the keccak benchmark, we were previously spending 15% of our cycles computing
the NCA / intersect function; this function is quite expensive, especially on
modern CPUs, as it chases pointers on every iteration in a tight loop. With this
commit, we spend ~0.05% of our time in dominator computation.
There's a conversation in the tracking issue about possibly unaccepting `in_band_lifetimes`, but it's used heavily in the compiler, and thus there'd need to be a bunch of PRs like this if that were to happen.
So here's one to see how much of an impact it has.
(Oh, and I removed `nll` while I was here too, since it didn't seem needed. Let me know if I should put that back.)
Implement write() method for Box<MaybeUninit<T>>
This adds method similar to `MaybeUninit::write` main difference being
it returns owned `Box`. This can be used to elide copy from stack
safely, however it's not currently tested that the optimization actually
occurs.
Analogous methods are not provided for `Rc` and `Arc` as those need to
handle the possibility of sharing. Some version of them may be added in
the future.
This was discussed in #63291 which this change extends.
This adds method similar to `MaybeUninit::write` main difference being
it returns owned `Box`. This can be used to elide copy from stack
safely, however it's not currently tested that the optimization actually
occurs.
Analogous methods are not provided for `Rc` and `Arc` as those need to
handle the possibility of sharing. Some version of them may be added in
the future.
This was discussed in #63291 which this change extends.
Revert "Add rustc lint, warning when iterating over hashmaps"
Fixes perf regressions introduced in https://github.com/rust-lang/rust/pull/90235 by temporarily reverting the relevant PR.
Particularly for ctfe-stress-4, the hashing of byte slices as part of the
MIR Allocation is quite hot. Previously, we were falling back on byte-by-byte
copying of the slice into the SipHash buffer (64 bytes long) before hashing a 64
byte chunk, and then doing that again and again.
This should hopefully be an improvement for that code.
Stabilize `const_panic`
Closes#51999
FCP completed in #89006
```@rustbot``` label +A-const-eval +A-const-fn +T-lang
cc ```@oli-obk``` for review (not `r?`'ing as not on lang team)
Rollup of 12 pull requests
Successful merges:
- #88795 (Print a note if a character literal contains a variation selector)
- #89015 (core::ascii::escape_default: reduce struct size)
- #89078 (Cleanup: Remove needless reference in ParentHirIterator)
- #89086 (Stabilize `Iterator::map_while`)
- #89096 ([bootstrap] Improve the error message when `ninja` is not found to link to installation instructions)
- #89113 (dont `.ensure()` the `thir_abstract_const` query call in `mir_build`)
- #89114 (Fixes a technicality regarding the size of C's `char` type)
- #89115 (⬆️ rust-analyzer)
- #89126 (Fix ICE when `indirect_structural_match` is allowed)
- #89141 (Impl `Error` for `FromSecsError` without foreign type)
- #89142 (Fix match for placeholder region)
- #89147 (add case for checking const refs in check_const_value_eq)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Migrate in-tree crates to 2021
This replaces #89075 (cherry picking some of the commits from there), and closes#88637 and fixes#89074.
It excludes a migration of the library crates for now (see tidy diff) because we have some pending bugs around macro spans to fix there.
I instrumented bootstrap during the migration to make sure all crates moved from 2018 to 2021 had the compatibility warnings applied first.
Originally, the intent was to support cargo fix --edition within bootstrap, but this proved fairly difficult to pull off. We'd need to architect the check functionality to support running cargo check and cargo fix within the same x.py invocation, and only resetting sysroots on check. Further, it was found that cargo fix doesn't behave too well with "not quite workspaces", such as Clippy which has several crates. Bootstrap runs with --manifest-path ... for all the tools, and this makes cargo fix only attempt migration for that crate. We can't use e.g. --workspace due to needing to maintain sysroots for different phases of compilation appropriately.
It is recommended to skip the mass migration of Cargo.toml's to 2021 for review purposes; you can also use `git diff d6cd2c6c87 -I'^edition = .20...$'` to ignore the edition = 2018/21 lines in the diff.
Rework DepthFirstSearch API
This expands the API to be more flexible, allowing for more visitation patterns
on graphs. This will be useful to avoid extra datasets (and allocations) in
cases where the expanded DFS API is sufficient.
This also fixes a bug with the previous DFS constructor, which left the start
node not marked as visited (even though it was immediately returned).
Commit written by ```@nikomatsakis``` originally, cherry picked from several commits in work on never type stabilization, but stands alone.
This expands the API to be more flexible, allowing for more visitation patterns
on graphs. This will be useful to avoid extra datasets (and allocations) in
cases where the expanded DFS API is sufficient.
This also fixes a bug with the previous DFS constructor, which left the start
node not marked as visited (even though it was immediately returned).
Remove SmallVector mention
SmallVector is long gone, as it's been first replaced
by OneVector in commit e5e6375352,
which then has been removed entirely in favour of SmallVec in
commit 130a32fa72.
SmallVector is long gone, as it's been first replaced
by OneVector in commit e5e6375352,
which then has been removed entirely in favour of SmallVec in
commit 130a32fa72.
Since RFC 3052 soft deprecated the authors field anyway, hiding it from
crates.io, docs.rs, and making Cargo not add it by default, and it is
not generally up to date/useful information, we should remove it from
crates in this repo.
Make mir borrowck's use of opaque types independent of the typeck query's result
fixes#87218fixes#86465
we used to use the typeck results only to generate an obligation for the mir borrowck type to be equal to the typeck result.
When i removed the `fixup_opaque_types` function in #87200, I exposed a bug that showed that mir borrowck can't doesn't get enough information from typeck in order to build the correct lifetime mapping from opaque type usage to the actual concrete type. We therefor now fully compute the information within mir borrowck (we already did that, but we only used it to verify the typeck result) and stop using the typeck information.
We will likely be able to remove most opaque type information from the borrowck results in the future and just have all current callers use the mir borrowck result instead.
r? `@spastorino`
Profile incremental compilation hashing fingerprints
Adds profiling instrumentation for the hashing of incremental compilation fingerprints per query.
This will eventually feed into the `measureme` and `rustc-perf` infrastructure for tracking if computing hashes changes over time.
TODOs:
* [x] Address the FIXME where we are including node interning in the hash timing.
* [ ] Update measureme/summarize to handle this new data: https://github.com/rust-lang/measureme/pull/166
* [ ] ~Update rustc-perf to handle the new data from measureme~ (will be done at a later time)
r? `@ghost`
cc `@michaelwoerister`
rustc_data_structures has a dependency on crossbeam-utils but never uses
it. It appears to have originally had this dependency in order to set
the "nightly" feature; however, its other dependencies use a different
version of crossbeam-utils, so this doesn't actually affect anything.
Furthermore, in current crossbeam-utils, the "nightly" feature has
become a no-op.
Don't use a generator for BoxedResolver
The generator is non-trivial and requires unsafe code anyway. Using regular unsafe code without a generator is much easier to follow.
Based on #85810 as it touches rustc_interface too.
Remove unused feature gates
The first commit removes a usage of a feature gate, but I don't expect it to be controversial as the feature gate was only used to workaround a limitation of rust in the past. (closures never being `Clone`)
The second commit uses `#[allow_internal_unstable]` to avoid leaking the `trusted_step` feature gate usage from inside the index newtype macro. It didn't work for the `min_specialization` feature gate though.
The third commit removes (almost) all feature gates from the compiler that weren't used anyway.
don't enable parking_lot nightly features
Having the compiler itself depend on external libraries that use nightly features can lead to "fun" bootstrap situations. Within the rustc repo we use `cfg(bootstrap)` to resolve those, but that is not a reasonable option for external dependencies.
So I propose we stop enabling the "nightly" feature of `parking_lot` here. In my experiments, this then indeed leads to the feature not being enabled (i.e., nothing else enables it), and everything still builds. However, this means parking_lot's `RwLock` will no longer have hardware lock elision for readers -- I hope that is okay to lose in exchange for less bootstrap brain twisting. ;)
Cc `@Amanieu`
With the arrival of min const generics, many alt-vec libraries have
updated to use it in some way and arrayvec is no exception. Use the
latest with minor refactoring.
Also, rustc_workspace_hack is the only user of smallvec 0.6 in the
entire tree, so drop it.
Add `FromIterator` and `IntoIterator` impls for `ThinVec`
These should make using `ThinVec` feel much more like using `Vec`.
They will allow users of `Vec` to switch to `ThinVec` while continuing
to use `collect()`, `for` loops, and other parts of the iterator API.
I don't know if there were use cases before for using the iterator API
with `ThinVec`, but I would like to start using `ThinVec` in rustdoc,
and having it conform to the iterator API would make the transition
*a lot* easier.
I added a `FromIterator` impl, an `IntoIterator` impl that yields owned
elements, and `IntoIterator` impls that yield immutable or mutable
references to elements. I also added some unit tests for `ThinVec`.
These should make using `ThinVec` feel much more like using `Vec`.
They will allow users of `Vec` to switch to `ThinVec` while continuing
to use `collect()`, `for` loops, and other parts of the iterator API.
I don't know if there were use cases before for using the iterator API
with `ThinVec`, but I would like to start using `ThinVec` in rustdoc,
and having it conform to the iterator API would make the transition
*a lot* easier.
I added a `FromIterator` impl, an `IntoIterator` impl that yields owned
elements, and `IntoIterator` impls that yield immutable or mutable
references to elements. I also added some unit tests for `ThinVec`.
Add an Mmap wrapper to rustc_data_structures
This wrapper implements StableAddress and falls back to directly reading the file on wasm32.
Taken from #83640, which I will close due to the perf regression.
There isn't currently a good reviewer for these, and I don't want to
remove things that will just be added again. I plan to make a separate
PR for these changes so the rest of the cleanup can land.
- Add back `HirIdVec`, with a comment that it will soon be used.
- Add back `*_region` functions, with a comment they may soon be used.
- Remove `-Z borrowck_stats` completely. It didn't do anything.
- Remove `make_nop` completely.
- Add back `current_loc`, which is used by an out-of-tree tool.
- Fix style nits
- Remove `AtomicCell` with `cfg(parallel_compiler)` for consistency.
Found with https://github.com/est31/warnalyzer.
Dubious changes:
- Is anyone else using rustc_apfloat? I feel weird completely deleting
x87 support.
- Maybe some of the dead code in rustc_data_structures, in case someone
wants to use it in the future?
- Don't change rustc_serialize
I plan to scrap most of the json module in the near future (see
https://github.com/rust-lang/compiler-team/issues/418) and fixing the
tests needed more work than I expected.
TODO: check if any of the comments on the deleted code should be kept.
Allow for reading raw bytes from rustc_serialize::Decoder without unsafe code
The current `read_raw_bytes` method requires using `MaybeUninit` and `unsafe`. I don't think this is necessary. Let's see if a safe interface has any performance drawbacks.
This is a followup to #83273 and will make it easier to rebase #82183.
r? `@cjgillot`
Update to rustc-rayon 0.3.1
This pulls in rust-lang/rustc-rayon#8 to fix#81425. (h/t `@ammaraskar)`
That revealed weak constraints on `rustc_arena::DropArena`, because its
`DropType` was holding type-erased raw pointers to generic `T`. We can
implement `Send` for `DropType` (under `cfg(parallel_compiler)`) by
requiring all `T: Send` before they're type-erased.
This pulls in rust-lang/rustc-rayon#8 to fix#81425. (h/t @ammaraskar)
That revealed weak constraints on `rustc_arena::DropArena`, because its
`DropType` was holding type-erased raw pointers to generic `T`. We can
implement `Send` for `DropType` (under `cfg(parallel_compiler)`) by
requiring all `T: Send` before they're type-erased.
Let a portion of DefPathHash uniquely identify the DefPath's crate.
This allows to directly map from a `DefPathHash` to the crate it originates from, without constructing side tables to do that mapping -- something that is useful for incremental compilation where we deal with `DefPathHash` instead of `DefId` a lot.
It also allows to reliably and cheaply check for `DefPathHash` collisions which allows the compiler to gracefully abort compilation instead of running into a subsequent ICE at some random place in the code.
The following new piece of documentation describes the most interesting aspects of the changes:
```rust
/// A `DefPathHash` is a fixed-size representation of a `DefPath` that is
/// stable across crate and compilation session boundaries. It consists of two
/// separate 64-bit hashes. The first uniquely identifies the crate this
/// `DefPathHash` originates from (see [StableCrateId]), and the second
/// uniquely identifies the corresponding `DefPath` within that crate. Together
/// they form a unique identifier within an entire crate graph.
///
/// There is a very small chance of hash collisions, which would mean that two
/// different `DefPath`s map to the same `DefPathHash`. Proceeding compilation
/// with such a hash collision would very probably lead to an ICE and, in the
/// worst case, to a silent mis-compilation. The compiler therefore actively
/// and exhaustively checks for such hash collisions and aborts compilation if
/// it finds one.
///
/// `DefPathHash` uses 64-bit hashes for both the crate-id part and the
/// crate-internal part, even though it is likely that there are many more
/// `LocalDefId`s in a single crate than there are individual crates in a crate
/// graph. Since we use the same number of bits in both cases, the collision
/// probability for the crate-local part will be quite a bit higher (though
/// still very small).
///
/// This imbalance is not by accident: A hash collision in the
/// crate-local part of a `DefPathHash` will be detected and reported while
/// compiling the crate in question. Such a collision does not depend on
/// outside factors and can be easily fixed by the crate maintainer (e.g. by
/// renaming the item in question or by bumping the crate version in a harmless
/// way).
///
/// A collision between crate-id hashes on the other hand is harder to fix
/// because it depends on the set of crates in the entire crate graph of a
/// compilation session. Again, using the same crate with a different version
/// number would fix the issue with a high probability -- but that might be
/// easier said then done if the crates in questions are dependencies of
/// third-party crates.
///
/// That being said, given a high quality hash function, the collision
/// probabilities in question are very small. For example, for a big crate like
/// `rustc_middle` (with ~50000 `LocalDefId`s as of the time of writing) there
/// is a probability of roughly 1 in 14,750,000,000 of a crate-internal
/// collision occurring. For a big crate graph with 1000 crates in it, there is
/// a probability of 1 in 36,890,000,000,000 of a `StableCrateId` collision.
```
Given the probabilities involved I hope that no one will ever actually see the error messages. Nonetheless, I'd be glad about some feedback on how to improve them. Should we create a GH issue describing the problem and possible solutions to point to? Or a page in the rustc book?
r? `@pnkfelix` (feel free to re-assign)
Update measureme dependency to the latest version
This version adds the ability to use `rdpmc` hardware-based performance
counters instead of wall-clock time for measuring duration. This also
introduces a dependency on the `perf-event-open-sys` crate on Linux
which is used when using hardware counters.
r? ```@oli-obk```
Replace const_cstr with cstr crate
This PR replaces the `const_cstr` macro inside `rustc_data_structures` with `cstr` macro from [cstr](https://crates.io/crates/cstr) crate.
The two macros basically serve the same purpose, which is to generate `&'static CStr` from a string literal. `cstr` is better because it validates the literal at compile time, while the existing `const_cstr` does it at runtime when `debug_assertions` is enabled. In addition, the value `cstr` generates can be used in constant context (which is seemingly not needed anywhere currently, though).
This version adds the ability to use `rdpmc` hardware-based performance
counters instead of wall-clock time for measuring duration. This also
introduces a dependency on the `perf-event-open-sys` crate on Linux
which is used when using hardware counters.
Check the result cache before the DepGraph when ensuring queries
Split out of https://github.com/rust-lang/rust/pull/70951
Calling `ensure` on already forced queries is a common operation.
Looking at the results cache first is faster than checking the DepGraph for a green node.
Indicate change in RSS from start to end of pass in time-passes output
Previously, this was omitted because it could be misleading, but the
functionality seems too useful not to include.
r? ``@oli-obk``
This allows to directly map from a DefPathHash to the crate it
originates from, without constructing side tables to do that mapping.
It also allows to reliably and cheaply check for DefPathHash collisions.
Indicate both start and end of pass RSS in time-passes output
Previously, only the end of pass RSS was indicated. This could easily
lead one to believe that the change in RSS from one pass to the next was
attributable to the second pass, when in fact it occurred between the
end of the first pass and the start of the second.
Also, improve alignment of columns.
Sample of output:
```
time: 0.739; rss: 607MB -> 637MB item_types_checking
time: 8.429; rss: 637MB -> 775MB item_bodies_checking
time: 11.063; rss: 470MB -> 775MB type_check_crate
time: 0.232; rss: 775MB -> 777MB match_checking
time: 0.139; rss: 777MB -> 779MB liveness_and_intrinsic_checking
time: 0.372; rss: 775MB -> 779MB misc_checking_2
time: 8.188; rss: 779MB -> 1019MB MIR_borrow_checking
time: 0.062; rss: 1019MB -> 1021MB MIR_effect_checking
```
Previously, only the end of pass RSS was indicated. This could easily
lead one to believe that the change in RSS from one pass to the next was
attributable to the second pass, when in fact it occurred between the
end of the first pass and the start of the second.
Also, improve alignment of columns.
Enforce that query results implement Debug
Currently, we require that query keys implement `Debug`, but we do not do the same for query values. This can make incremental compilation bugs difficult to debug - there isn't a good place to print out the result loaded from disk.
This PR adds `Debug` bounds to several query-related functions, allowing us to debug-print the query value when an 'unstable fingerprint' error occurs. This required adding `#[derive(Debug)]` to a fairly large number of types - hopefully, this doesn't have much of an impact on compiler bootstrapping times.
Before:
```
thread 'rustc' panicked at 'attempt to read from stolen value', /home/joshua/rustc/compiler/rustc_data_structures/src/steal.rs:43:15
```
After:
```
thread 'rustc' panicked at 'attempt to steal from stolen value', compiler/rustc_mir/src/transform/mod.rs:423:25
```
Reduce a large memory spike that happens during serialization by writing
the incr comp structures to file by way of a fixed-size buffer, rather
than an unbounded vector.
Effort was made to keep the instruction count close to that of the
previous implementation. However, buffered writing to a file inherently
has more overhead than writing to a vector, because each write may
result in a handleable error. To reduce this overhead, arrangements are
made so that each LEB128-encoded integer can be written to the buffer
with only one capacity and error check. Higher-level optimizations in
which entire composite structures can be written with one capacity and
error check are possible, but would require much more work.
The performance is mostly on par with the previous implementation, with
small to moderate instruction count regressions. The memory reduction is
significant, however, so it seems like a worth-while trade-off.