Use tracing spans in rustc_trait_selection
Spans are very helpful when debugging this code. It's also hot enough to make a good benchmark.
r? `@oli-obk`
Rollup of 10 pull requests
Successful merges:
- #77550 (add shims for WithOptConstParam query calls)
- #77699 (Add word wrap for short descriptions)
- #77724 (Implement `AsRawFd` for `StdinLock` etc. on WASI.)
- #77746 (Fix `x.py setup` sets `changelog-seen`)
- #77784 (Fix intra-docs link in core::ffi::VaList)
- #77811 (rustdoc: Make some functions private that don't need to be public)
- #77818 (Mono collector: replace pair of ints with Range)
- #77831 (Use std methods on char instead of open coding them)
- #77852 (update url in bootstrap README (gcc-rs -> cc-rs))
- #77863 (Remove `mark-i-m` from rustc-dev-guide maintainers)
Failed merges:
r? `@ghost`
Mono collector: replace pair of ints with Range
I found the initial PR (#33171) that introduced this piece of code but I didn't find any information about why a tuple was preferred over a `Range<usize>`.
I'm hoping there are no technical reasons to not do this.
Implement `AsRawFd` for `StdinLock` etc. on WASI.
WASI implements `AsRawFd` for `Stdin`, `Stdout`, and `Stderr`, so
implement it for `StdinLock`, `StdoutLock`, and `StderrLock` as well.
r? @alexcrichton
build-manifest: stop generating numbered channel names except for stable
This fixes numbered channel names being created for the nightly channel, and once the root cause of this rides the trains, for beta.
r? `@Mark-Simulacrum`
Remove unnecessary unsafe block around calls to discriminant_value
Since 63793 the discriminant_value intrinsic is safe to call. Remove
unnecessary unsafe block around calls to this intrinsic in built-in
derive macros.
Promote aarch64-pc-windows-msvc to Tier 2 Development Platform
Adds a GitHub Actions CI build for `aarch64-pc-windows-msvc` via cross-compilation on an x86_64 host.
This promotes `aarch64-pc-windows-msvc` from a Tier 2 Compilation Target (std) to a Tier 2 Development Platform (std+rustc+cargo+tools).
Fixes#72881
r? `@pietroalbini`
Fix -Clinker-plugin-lto with opt-levels s and z
Pass s and z as `-plugin-opt=O2` to the linker. This is what `-Os` and `-Oz` correspond to, apparently.
Fixes https://github.com/rust-lang/rust/issues/75940
Use llvm::computeLTOCacheKey to determine post-ThinLTO CGU reuse
During incremental ThinLTO compilation, we attempt to re-use the
optimized (post-ThinLTO) bitcode file for a module if it is 'safe' to do
so.
Up until now, 'safe' has meant that the set of modules that our current
modules imports from/exports to is unchanged from the previous
compilation session. See PR #67020 and PR #71131 for more details.
However, this turns out be insufficient to guarantee that it's safe
to reuse the post-LTO module (i.e. that optimizing the pre-LTO module
would produce the same result). When LLVM optimizes a module during
ThinLTO, it may look at other information from the 'module index', such
as whether a (non-imported!) global variable is used. If this
information changes between compilation runs, we may end up re-using an
optimized module that (for example) had dead-code elimination run on a
function that is now used by another module.
Fortunately, LLVM implements its own ThinLTO module cache, which is used
when ThinLTO is performed by a linker plugin (e.g. when clang is used to
compile a C proect). Using this cache directly would require extensive
refactoring of our code - but fortunately for us, LLVM provides a
function that does exactly what we need.
The function `llvm::computeLTOCacheKey` is used to compute a SHA-1 hash
from all data that might influence the result of ThinLTO on a module.
In addition to the module imports/exports that we manually track, it
also hashes information about global variables (e.g. their liveness)
which might be used during optimization. By using this function, we
shouldn't have to worry about new LLVM passes breaking our module re-use
behavior.
In LLVM, the output of this function forms part of the filename used to
store the post-ThinLTO module. To keep our current filename structure
intact, this PR just writes out the mapping 'CGU name -> Hash' to a
file. To determine if a post-LTO module should be reused, we compare
hashes from the previous session.
This should unblock PR #75199 - by sheer chance, it seems to have hit
this issue due to the particular CGU partitioning and optimization
decisions that end up getting made.
More print statementsstatements lol
Solved the basic case of eliminating check_version ifk_version if subcommand = setup
Finished v1
checking out old bootstrap.py
checked out old irrelevant files
fixed tidy
Moved VERSION from bin/main.rs to lib.rs
Fixed semicolon return issue
x.py fmt
Recognize discriminant reads as no-ops in RemoveNoopLandingPads
The cleanup blocks often contain read of discriminants. Teach
RemoveNoopLandingPads to recognize them as no-ops to remove
additional no-op landing pads.
Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches
This patch went through a couple iterations but the end result is replacing a pattern where an `AtomicUsize` (updated with many SeqCst ops) guards a `static mut` with a single `AtomicU64` that is known to use 0 as a value indicating that it is not initialized.
The code in both places exists to cache values used in the conversion of Instants to Durations on macOS, iOS, and Windows.
I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), but it's much simpler, safer, and in practice we'd expect it to be faster everywhere where Relaxed operations on AtomicU64 are cheaper than SeqCst operations on AtomicUsize, which is a lot of places.
Anyway, it also removes a bunch of unsafe code and greatly simplifies the logic, so IMO that alone would be worth it unless it was a regression.
If you want to take a look at the assembly output though, see https://godbolt.org/z/rbr6vn for x86_64, https://godbolt.org/z/cqcbqv for aarch64 (Note that this just the output of the mac side, but i'd expect the windows part to be the same and don't feel like doing another godbolt for it). There are several versions of this function in the godbolt:
- `info_new`: version in the current patch
- `info_less_new`: version in initial PR
- `info_original`: version currently in the tree
- `info_orig_but_better_orderings`: a version that just tries to change the original code's orderings from SeqCst to the (probably) minimal orderings required for soundness/correctness.
The biggest concern I have here is if we can use AtomicU64, or if there are targets that dont have it that this code supports. AFAICT: no. (If that changes in the future, it's easy enough to do something different for them)
r? `@Amanieu` because he caught a couple issues last time I tried to do a patch reducing orderings 😅
---
<details>
<summary>I rewrote this whole message so the original is inside here</summary>
I happened to notice the code we use for caching the result of mach_timebase_info uses SeqCst exclusively.
However, thinking a little more, it's actually pretty easy to avoid the static mut by packing the timebase info into an AtomicU64.
This entirely avoids needing to do the compare_exchange. The AtomicU64 can be read/written using Relaxed ops, which on current macos/ios platforms (x86_64/aarch64) have no overhead compared to direct loads/stores. This simplifies the code and makes it a lot safer too.
I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), although it should do that on both targets it applies to.
That said, it also removes a bunch of unsafe code and simplifies the logic (arguably at least — there are only two states now, initialized or not), so I think it's a net win even without concrete numbers.
If you want to take a look at the assembly output though, see below. It has the new version, the original, and a version of the original with lower Orderings (which is still worse than the version in this PR)
- godbolt.org/z/obfqf9 x86_64-apple-darwin
- godbolt.org/z/Wz5cWc aarch64-unknown-linux-gnu (godbolt can't do aarch64-apple-ios but that doesn't matter here)
A different (and more efficient) option than this would be to just use the AtomicU64 and use the knowledge that after initialization the denominator should be nonzero... That felt like it's relying on too many things I'm not confident in, so I didn't want to do that.
</details>
Clean up in intra-doc link collector
This PR makes the following changes in intra-doc links:
- clean up hard to follow closure-based logic in `check_full_res`
- fix a FIXME comment by figuring out that `true` and `false` need to be resolved separately
- refactor path resolution by extracting common code to a helper method and trying to reduce the number of unnecessary early returns
- primitive types are now defined by their symbols, not their name strings
- re-enables a commented-out test case
Closes#77267 (cc `@Stupremee)`
r? `@jyn514`
Add -Z codegen-backend dylib to deps
When the codegen-backend dylib changes, the program should be rebuilt.
---
Unfortunately I was unable to test this works locally due to running into a TLS issue when running the custom backend, `thread 'rustc' panicked at 'no ImplicitCtxt stored in tls', compiler/rustc_middle/src/ty/context.rs:1750:54`, which seems similar to https://github.com/rust-lang/rust/issues/62717 but has a completely different cause and backtrace.
`@eddyb` said to ping `@Mark-Simulacrum` about what they think about this, so, ping!
Since 63793 the discriminant_value intrinsic is safe to call. Remove
unnecessary unsafe block around calls to this intrinsic in built-in
derive macros.