Don't leak the function that is called on drop
It probably wasn't causing problems anyway, but still, a `// this leaks, please don't pass anything that owns memory` is not sustainable.
I could implement a version which does not require `Option`, but it would require `unsafe`, at which point it's probably not worth it.
Use dynamic dispatch for queries
This replaces most concrete query values `V` with `MaybeUninit<[u8; { size_of::<V>() }]>` reducing the code instantiated by queries. The compile time of `rustc_query_impl` is reduced by 27%. It is an alternative to https://github.com/rust-lang/rust/pull/107937 which uses unstable const generics while this uses a `EraseType` trait which maps query values to their erased variant.
This is achieved by introducing an `Erased` type which does sanity check with `cfg(debug_assertions)`. The query caches gets instantiated with these erased types leaving the code in `rustc_query_system` unaware of them. `rustc_query_system` is changed to use instances of `QueryConfig` so that `rustc_query_impl` can pass in `DynamicConfig` which holds a pointer to a virtual table.
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7055s</td><td align="right">1.6949s</td><td align="right"> -0.62%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2547s</td><td align="right">0.2528s</td><td align="right"> -0.73%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9590s</td><td align="right">0.9553s</td><td align="right"> -0.39%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5457s</td><td align="right">1.5440s</td><td align="right"> -0.11%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9092s</td><td align="right">5.9009s</td><td align="right"> -0.14%</td></tr><tr><td>Total</td><td align="right">10.3741s</td><td align="right">10.3479s</td><td align="right"> -0.25%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9960s</td><td align="right"> -0.40%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:initial</td><td align="right">2.0605s</td><td align="right">2.0575s</td><td align="right"> -0.15%</td></tr><tr><td>🟣 <b>hyper</b>:check:initial</td><td align="right">0.3218s</td><td align="right">0.3216s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>regex</b>:check:initial</td><td align="right">1.1848s</td><td align="right">1.1839s</td><td align="right"> -0.07%</td></tr><tr><td>🟣 <b>syn</b>:check:initial</td><td align="right">1.9409s</td><td align="right">1.9376s</td><td align="right"> -0.17%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:initial</td><td align="right">7.3105s</td><td align="right">7.2928s</td><td align="right"> -0.24%</td></tr><tr><td>Total</td><td align="right">12.8185s</td><td align="right">12.7935s</td><td align="right"> -0.20%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9986s</td><td align="right"> -0.14%</td></tr></table>
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check:unchanged</td><td align="right">0.4606s</td><td align="right">0.4617s</td><td align="right"> 0.24%</td></tr><tr><td>🟣 <b>hyper</b>:check:unchanged</td><td align="right">0.1335s</td><td align="right">0.1336s</td><td align="right"> 0.08%</td></tr><tr><td>🟣 <b>regex</b>:check:unchanged</td><td align="right">0.3324s</td><td align="right">0.3346s</td><td align="right"> 0.65%</td></tr><tr><td>🟣 <b>syn</b>:check:unchanged</td><td align="right">0.6268s</td><td align="right">0.6307s</td><td align="right"> 0.64%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check:unchanged</td><td align="right">1.8248s</td><td align="right">1.8508s</td><td align="right">💔 1.43%</td></tr><tr><td>Total</td><td align="right">3.3779s</td><td align="right">3.4113s</td><td align="right"> 0.99%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">1.0061s</td><td align="right"> 0.61%</td></tr></table>
It's based on https://github.com/rust-lang/rust/pull/108167.
r? `@cjgillot`
Currently a `{D,Subd}iagnosticMessage` can be created from any type that
impls `Into<String>`. That includes `&str`, `String`, and `Cow<'static,
str>`, which are reasonable. It also includes `&String`, which is pretty
weird, and results in many places making unnecessary allocations for
patterns like this:
```
self.fatal(&format!(...))
```
This creates a string with `format!`, takes a reference, passes the
reference to `fatal`, which does an `into()`, which clones the
reference, doing a second allocation. Two allocations for a single
string, bleh.
This commit changes the `From` impls so that you can only create a
`{D,Subd}iagnosticMessage` from `&str`, `String`, or `Cow<'static,
str>`. This requires changing all the places that currently create one
from a `&String`. Most of these are of the `&format!(...)` form
described above; each one removes an unnecessary static `&`, plus an
allocation when executed. There are also a few places where the existing
use of `&String` was more reasonable; these now just use `clone()` at
the call site.
As well as making the code nicer and more efficient, this is a step
towards possibly using `Cow<'static, str>` in
`{D,Subd}iagnosticMessage::{Str,Eager}`. That would require changing
the `From<&'a str>` impls to `From<&'static str>`, which is doable, but
I'm not yet sure if it's worthwhile.
Remove `QueryEngine` trait
This removes the `QueryEngine` trait and `Queries` from `rustc_query_impl` and replaced them with function pointers and fields in `QuerySystem`. As a side effect `OnDiskCache` is moved back into `rustc_middle` and the `OnDiskCache` trait is also removed.
This has a couple of benefits.
- `TyCtxt` is used in the query system instead of the removed `QueryCtxt` which is larger.
- Function pointers are more flexible to work with. A variant of https://github.com/rust-lang/rust/pull/107802 is included which avoids the double indirection. For https://github.com/rust-lang/rust/pull/108938 we can name entry point `__rust_end_short_backtrace` to avoid some overhead. For https://github.com/rust-lang/rust/pull/108062 it avoids the duplicate `QueryEngine` structs.
- `QueryContext` now implements `DepContext` which avoids many `dep_context()` calls in `rustc_query_system`.
- The `rustc_driver` size is reduced by 0.33%, hopefully that means some bootstrap improvements.
- This avoids the unsafe code around the `QueryEngine` trait.
r? `@cjgillot`
Currently it creates an `Option` and then does `map`/`unwrap_or` and
`map_or_else` on it, which is hard to read.
This commit simplifies things by moving more code into the two arms of
the if/else.
Rewrite MemDecoder around pointers not a slice
This is basically https://github.com/rust-lang/rust/pull/109910 but I'm being a lot more aggressive. The pointer-based structure means that it makes a lot more sense to absorb more complexity into `MemDecoder`, most of the diff is just complexity moving from one place to another.
The primary argument for this structure is that we only incur a single bounds check when doing multi-byte reads from a `MemDecoder`. With the slice-based implementation we need to do those with `data[position..position + len]` , which needs to account for `position + len` wrapping. It would be possible to dodge the first bounds check if we stored a slice that starts at `position`, but that would require updating the pointer and length on every read.
This PR also embeds the failure path in a separate function, which means that this PR should subsume all the perf wins observed in https://github.com/rust-lang/rust/pull/109867.
Add `rustc_fluent_macro` to decouple fluent from `rustc_macros`
Fluent, with all the icu4x it brings in, takes quite some time to compile. `fluent_messages!` is only needed in further downstream rustc crates, but is blocking more upstream crates like `rustc_index`. By splitting it out, we allow `rustc_macros` to be compiled earlier, which speeds up `x check compiler` by about 5 seconds (and even more after the needless dependency on `serde_json` is removed from `rustc_data_structures`).
Encode hashes as bytes, not varint
In a few places, we store hashes as `u64` or `u128` and then apply `derive(Decodable, Encodable)` to the enclosing struct/enum. It is more efficient to encode hashes directly than try to apply some varint encoding. This PR adds two new types `Hash64` and `Hash128` which are produced by `StableHasher` and replace every use of storing a `u64` or `u128` that represents a hash.
Distribution of the byte lengths of leb128 encodings, from `x build --stage 2` with `incremental = true`
Before:
```
( 1) 373418203 (53.7%, 53.7%): 1
( 2) 196240113 (28.2%, 81.9%): 3
( 3) 108157958 (15.6%, 97.5%): 2
( 4) 17213120 ( 2.5%, 99.9%): 4
( 5) 223614 ( 0.0%,100.0%): 9
( 6) 216262 ( 0.0%,100.0%): 10
( 7) 15447 ( 0.0%,100.0%): 5
( 8) 3633 ( 0.0%,100.0%): 19
( 9) 3030 ( 0.0%,100.0%): 8
( 10) 1167 ( 0.0%,100.0%): 18
( 11) 1032 ( 0.0%,100.0%): 7
( 12) 1003 ( 0.0%,100.0%): 6
( 13) 10 ( 0.0%,100.0%): 16
( 14) 10 ( 0.0%,100.0%): 17
( 15) 5 ( 0.0%,100.0%): 12
( 16) 4 ( 0.0%,100.0%): 14
```
After:
```
( 1) 372939136 (53.7%, 53.7%): 1
( 2) 196240140 (28.3%, 82.0%): 3
( 3) 108014969 (15.6%, 97.5%): 2
( 4) 17192375 ( 2.5%,100.0%): 4
( 5) 435 ( 0.0%,100.0%): 5
( 6) 83 ( 0.0%,100.0%): 18
( 7) 79 ( 0.0%,100.0%): 10
( 8) 50 ( 0.0%,100.0%): 9
( 9) 6 ( 0.0%,100.0%): 19
```
The remaining 9 or 10 and 18 or 19 are `u64` and `u128` respectively that have the high bits set. As far as I can tell these are coming primarily from `SwitchTargets`.
Fluent, with all the icu4x it brings in, takes quite some time to
compile. `fluent_messages!` is only needed in further downstream rustc
crates, but is blocking more upstream crates like `rustc_index`. By
splitting it out, we allow `rustc_macros` to be compiled earlier, which
speeds up `x check compiler` by about 5 seconds (and even more after the
needless dependency on `serde_json` is removed from
`rustc_data_structures`).
Various minor Idx-related tweaks
Nothing particularly exciting here, but a couple of things I noticed as I was looking for more index conversions to simplify.
cc https://github.com/rust-lang/compiler-team/issues/606
r? `@WaffleLapkin`
incr.comp.: Make sure dependencies are recorded when feeding queries during eval-always queries.
This PR makes sure we don't drop dependency edges when feeding queries during an eval-always query.
Background: During eval-always queries, no dependencies are recorded because the system knows to unconditionally re-evaluate them regardless of any actual dependencies. This works fine for these queries themselves but leads to a problem when feeding other queries: When queries are fed, we set up their dependency edges by copying the current set of dependencies of the feeding query. But because this set is empty for eval-always queries, we record no edges at all -- which has the effect that the fed query instances always look "green" to the system, although they should always be "red".
The fix is to explicitly add a dependency on the artificial "always red" dep-node when feeding during eval-always queries.
Fixes https://github.com/rust-lang/rust/issues/108481
Maybe also fixes issue https://github.com/rust-lang/rust/issues/88488.
cc `@jyn514`
r? `@cjgillot` or `@oli-obk`
Use Rayon's TLV directly
This accesses Rayon's `TLV` thread local directly avoiding wrapper functions. This makes rustc work with https://github.com/rust-lang/rustc-rayon/pull/10.
r? `@cuviper`
Refactor `try_execute_query`
This merges `JobOwner::try_start` into `try_execute_query`, removing `TryGetJob` in the processes. 3 new functions are extracted from `try_execute_query`: `execute_job`, `cycle_error` and `wait_for_query`. This makes the control flow a bit clearer and improves performance.
Based on https://github.com/rust-lang/rust/pull/109046.
<table><tr><td rowspan="2">Benchmark</td><td colspan="1"><b>Before</b></th><td colspan="2"><b>After</b></th></tr><tr><td align="right">Time</td><td align="right">Time</td><td align="right">%</th></tr><tr><td>🟣 <b>clap</b>:check</td><td align="right">1.7134s</td><td align="right">1.7061s</td><td align="right"> -0.43%</td></tr><tr><td>🟣 <b>hyper</b>:check</td><td align="right">0.2519s</td><td align="right">0.2510s</td><td align="right"> -0.35%</td></tr><tr><td>🟣 <b>regex</b>:check</td><td align="right">0.9517s</td><td align="right">0.9481s</td><td align="right"> -0.38%</td></tr><tr><td>🟣 <b>syn</b>:check</td><td align="right">1.5389s</td><td align="right">1.5338s</td><td align="right"> -0.33%</td></tr><tr><td>🟣 <b>syntex_syntax</b>:check</td><td align="right">5.9488s</td><td align="right">5.9258s</td><td align="right"> -0.39%</td></tr><tr><td>Total</td><td align="right">10.4048s</td><td align="right">10.3647s</td><td align="right"> -0.38%</td></tr><tr><td>Summary</td><td align="right">1.0000s</td><td align="right">0.9962s</td><td align="right"> -0.38%</td></tr></table>
r? `@cjgillot`
Split `execute_job` into `execute_job_incr` and `execute_job_non_incr`
`execute_job` was a bit large, so this splits it in 2. Performance was neutral locally, but this may affect bootstrap times.
Optimize dep node backtrace and ignore fatal errors
This attempts to optimize https://github.com/rust-lang/rust/pull/91742 while also passing through fatal errors.
r? `@cjgillot`