This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer
term this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM
module has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
* Management of the bytecode encoding in rlibs was centralized into one module
instead of being scattered across LTO and linking.
* Some internal refactorings on the link stage of the compiler was done to work
directly from `CompiledModule` structures instead of lists of paths.
* The trans time-graph output was tweaked a little to include a name on each
bar and inflate the size of the bars a little
rustc: Default 32 codegen units at O0
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
Point at signature on unused lint
```
warning: struct is never used: `Struct`
--> $DIR/unused-warning-point-at-signature.rs:22:1
|
22 | struct Struct {
| ^^^^^^^^^^^^^
```
Fix#33961.
Allow replacing HashMap entries
This is an obvious API hole. At the moment the only way to retrieve an entry from a `HashMap` is to get an entry to it, remove it, and then insert a new entry. This PR allows entries to be replaced.
TrustedRandomAccess specialisation for Iterator::cloned when Item: Copy.
This should fix#44424. It also provides a potential fix for more iterators using `Iterator::cloned`.
Add aarch64-unknown-linux-musl target
This adds support for the aarch64-unknown-linux-musl target in the build and CI systems.
This addresses half of issue #42520.
The new file `aarch64_unknown_linux_musl.rs` is a copy of `aarch64_unknown_linux_gnu.rs` with "gnu" replaced by "musl", and the added logic in `build-arm-musl.sh` is similarly a near-copy of the arches around it, so overall the changes were straightforward.
Testing:
```
$ sudo ./src/ci/docker/run.sh cross
...
Dist std stage2 (x86_64-unknown-linux-gnu -> aarch64-unknown-linux-musl)
Building stage2 test artifacts (x86_64-unknown-linux-gnu -> aarch64-unknown-linux-musl)
Compiling getopts v0.2.14
Compiling term v0.0.0 (file:///checkout/src/libterm)
Compiling test v0.0.0 (file:///checkout/src/libtest)
Finished release [optimized] target(s) in 16.91 secs
Copying stage2 test from stage2 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / aarch64-unknown-linux-musl)
...
Build completed successfully in 0:55:22
```
```
$ rustup toolchain link local obj/build/x86_64-unknown-linux-gnu/stage2
$ rustup default local
```
After setting the local toolchain as default, and adding this in ~/.cargo/config:
```
[target.aarch64-unknown-linux-musl]
linker = "aarch64-linux-musl-gcc"
```
...then the toolchain was able to build a working ripgrep as a test:
```
$ readelf -a target/aarch64-unknown-linux-musl/debug/rg | grep -i interpreter
$ readelf -a target/aarch64-unknown-linux-musl/debug/rg | grep NEEDED
$ file target/aarch64-unknown-linux-musl/debug/rg
target/aarch64-unknown-linux-musl/debug/rg: ELF 64-bit LSB executable, ARM aarch64, version 1 (GNU/Linux), statically linked, BuildID[sha1]=be11036b0988fac5dccc9f6487eb780b05186582, not stripped
```
Point at parameter type on E0301
On "the parameter type `T` may not live long enough" error, point to the
parameter type suggesting lifetime bindings:
```
error[E0310]: the parameter type `T` may not live long enough
--> $DIR/lifetime-doesnt-live-long-enough.rs:28:5
|
27 | struct Foo<T> {
| - help: consider adding an explicit lifetime bound `T: 'static`...
28 | foo: &'static T
| ^^^^^^^^^^^^^^^
|
note: ...so that the reference type `&'static T` does not outlive the data it points at
--> $DIR/lifetime-doesnt-live-long-enough.rs:28:5
|
28 | foo: &'static T
| ^^^^^^^^^^^^^^^
```
Fix#36700.
Initial support for `..=` syntax
#28237
This PR adds `..=` as a synonym for `...` in patterns and expressions.
Since `...` in expressions was never stable, we now issue a warning.
cc @durka
r? @aturon
Allow unused extern crate again
This is a partial revert of #42588. There is a usability concern reported in #44294 that was not considered in the discussion of the PR, so I would like to back this out of 1.21. As is, I think users would have a worse and more confusing experience with this lint enabled by default. We can re-enabled once there are better diagnostics or the case in #44294 does not trigger the lint.
Fix capacity comparison in reserve
You can otherwise end up in a situation where you don't actually resize
but still call into handle_cap_increase which then corrupts head/tail.
Closes#44800
Not totally sure the right way to write a test for this - there are some debug asserts the old bad behavior will hit but we don't build the stdlib with debug assertions by default.
r? @Gankro
use param_env on the trait_cache key
We bailed from making trans_fulfill_obligation return `Option` or `Result`, just made it less prone to crashing outside trans
r? @nikomatsakis
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
Some fixes to mir-borrowck
Make the code more closely match the NLL RFC (updated description).
(The biggest visible fix the addition of the Shallow/Deep distinction, which means mir-borrowck stops falsely thinking that StorageDeads need deep access to their input L-value.)
Add suggestions for misspelled method names
Use the syntax::util::lev_distance module to provide suggestions when a
named method cannot be found.
Part of #30197
Require rlibs for dependent crates when linking static executables
This handles the case for `CrateTypeExecutable` and `+crt_static`. I reworked the match block to avoid duplicating the `attempt_static` and error checking code again (this case would have been a copy of the `CrateTypeCdylib`/`CrateTypeStaticlib` case).
On `linux-musl` targets where `std` was built with `crt_static = false` in `config.toml`, this change brings the test suite from entirely failing to mostly passing.
This change should not affect behavior for other crate types, or for targets which do not respect `+crt_static`.
Allow writing metadata without llvm
# Todo:
* [x] Rebase
* [x] Fix eventual errors
* [x] <strike>Find some crate to write elf files</strike> (will do it later)
Cc #43842
encode region::Scope using fewer bytes
Now that region::Scope is no longer interned, its size is more important. This PR encodes region::Scope in 8 bytes instead of 12, which should speed up region inference somewhat (perf testing needed) and should improve the margins on #36799 by 64MB (that's not a lot, I did this PR mostly to speed up region inference).
This is a perf-sensitive PR. Please don't roll me up.
r? @eddyb
This is based on #44743 so I could get more accurate measurements on #36799.
In particular:
* introduce the shallow/deep distinction for read/write accesses
* use the notions of prefixes, shallow prefixes, and supporting prefixes
rather than trying to recreate the restricted sets from ast-borrowck.
* Add shallow reads of Discriminant and ArrayLength, and treat them
as artificial fields when doing prefix traversals.