Rollup of 7 pull requests
Successful merges:
- #90383 (Extend check for UnsafeCell in consts to cover unions)
- #91375 (config.rs: Add support for a per-target default_linker option.)
- #91480 (rustdoc: use smaller number of colors to distinguish items)
- #92338 (Add try_reserve and try_reserve_exact for OsString)
- #92405 (Add a couple needs-asm-support headers to tests)
- #92435 (Sync rustc_codegen_cranelift)
- #92440 (Fix mobile toggles position)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Sync rustc_codegen_cranelift
The main highlight this sync is enforcing rustfmt and lack of warnings on cg_clif's CI. I will open a separate PR to remove the cg_clif exceptions for them from this repo.
r? `@ghost`
`@rustbot` label +A-codegen +A-cranelift +T-compiler
Add try_reserve and try_reserve_exact for OsString
Add `try_reserve` and `try_reserve_exact` for OsString.
Part of https://github.com/rust-lang/rust/issues/91789
I will squash the commits after PR is ready to merge.
Signed-off-by: Xuanwo <github@xuanwo.io>
config.rs: Add support for a per-target default_linker option.
* src/bootstrap/config.rs (Target) <default_linker>: New field.
(TomlTarget): Likewise.
* src/bootstrap/compile.rs (rustc_cargo_env): Prefer a
target-specified default_linker over a global one if available.
* config.toml.example: Adjust doc.
Store liveness in interval sets for region inference
On the 100,000 line test case from https://github.com/rust-lang/rust/issues/90445, this reduces memory usage from 35 GB to 444 MB at peak (based on DHAT results, though with regular malloc), and yields a 9.4x speedup, with wall time going from 14.5 seconds to 1.5s. Performance results show that for the majority of real-world code this has little to no impact, but it's expected to generally scale better for auto-generated functions and other cases which stress this area of the compiler, as results on #90445 illustrate.
There may also be further room for improvement in future PRs making use of this data structures benefits over raw bitsets (which, at some level, are a less perfect fit for representing liveness, which is almost always composed of contiguous ranges, not point locations).
Fixes#90445.
Import `SourceFile`s from crate before decoding foreign `Span`
Fixes#92163Fixes#92014
When writing to the incremental cache, we encode all `Span`s
we encounter, regardless of whether or not their `SourceFile`
comes from the local crate, or from a foreign crate.
When we decode a `Span`, we use the `StableSourceFileId` we encoded
to locate the matching `SourceFile` in the current session. If this
id corresponds to a `SourceFile` from another crate, then we need to
have already imported that `SourceFile` into our current session.
This usually happens automatically during resolution / macro expansion,
when we try to resolve definitions from other crates. In certain cases,
however, we may try to load a `Span` from a transitive dependency
without having ever imported the `SourceFile`s from that crate, leading
to an ICE.
This PR fixes the issue by enconding the `SourceFile`'s `CrateNum`
when we encode a `Span`. During decoding, we call `imported_source_files()`
when we encounter a foreign `CrateNum`, which ensure that all
`SourceFile`s from that crate are imported into the current session.
Prevent spurious build failures and other bugs caused by parallel runs of
x.py. We back the lock with sqlite, so that we have a cross-platform locking
strategy, and which can be done nearly first in the build process (from Python),
which helps move the lock as early as possible.
Region inference contains several bitsets which are filled with large intervals
representing liveness. These can cause excessive memory usage, and are
relatively slow when growing to large sizes compared to the IntervalSet.
This is a compact, fast storage for variable-sized sets, typically consisting of
larger ranges. It is less efficient than a bitset if ranges are both small and
the domain size is small, but will still perform acceptably. With enormous
domain sizes and large ranges, the interval set performs much better, as it can
be much more densely packed in memory than the uncompressed bit set alternative.
It appears `find_max_slow` comes from the BinaryHeap docs, where the
try_reserve example is a slow implementation of find_max. It has no
relevance to this code in OsString though.