Run translation and LLVM in parallel when compiling with multiple CGUs
This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.
This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with `-Ccodegen-units=N`.
In order to help tuning and debugging the work scheduler, the PR adds the `-Ztrans-time-graph` flag which spits out html files that show how work packages where scheduled:
![Building regex](https://user-images.githubusercontent.com/1825894/28679272-f6752bd8-72f2-11e7-8a6c-56207855ce95.png)
(red is translation, green is llvm)
One side effect here is that `-Ztime-passes` might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.
One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.
TODO:
- [x] Restore `-Z time-passes` output for LLVM.
- [x] Update documentation, esp. for work package scheduling.
- [x] Tune the scheduling algorithm.
cc @alexcrichton @rust-lang/compiler
Three small fixes for save-analysis
First commit does some naive deduplication of macro uses. We end up with lots of duplication here because of the weird way we get this data (we extract a use for every span generated by a macro use).
Second commit is basically a typo fix.
Third commit is a bit interesting, it partially reverts a change from #40939 where temporary variables in format! (and thus println!) got a span with the primary pointing at the value stored into the temporary (e.g., `x` in `println!("...", x)`). If `format!` had a definition it should point at the temporary in the macro def, but since it is built-in, that is not possible (for now), so `DUMMY_SP` is the best we can do (using the span in the callee really breaks save-analysis because it thinks `x` is a definition as well as a reference).
There aren't a test for this stuff because: the deduplication is filtered by any of the users of save-analysis, so it is purely an efficiency change. I couldn't actually find an example for the second commit that we have any machinery to test, and the third commit is tested by the RLS, so there will be a test once I update the RLS version and and uncomment the previously failing tests).
r? @jseyfried
Rework Rustbuild to an eagerly compiling approach
This introduces a new dependency on `serde`; I don't believe that's a problem since bootstrap is compiled with nightly/beta always so proc macros are available. Compile times are slightly longer -- about 2-3x (30 seconds vs. 10 seconds). I don't think this is too big a problem, especially since recompiling bootstrap is somewhat rare. I think we can remove the dependency on Serde if necessary, though, so let me know.
r? @alexcrichton
Update the `cargo` submodule
Notably pull in an update to the `jobserver` crate to have Cargo set the
`CARGO_MAKEFLAGS` environment variable instead of the `MAKEFLAGS` environment
variable.
cc https://github.com/rust-lang/rust/issues/42635
Notably pull in an update to the `jobserver` crate to have Cargo set the
`CARGO_MAKEFLAGS` environment variable instead of the `MAKEFLAGS` environment
variable.
Switch to rust-lang-nursery/compiler-builtins
This commit migrates the in-tree `libcompiler_builtins` to the upstream version
at https://github.com/rust-lang-nursery/compiler-builtins. The upstream version
has a number of intrinsics written in Rust and serves as an in-progress rewrite
of compiler-rt into Rust. Additionally it also contains all the existing
intrinsics defined in `libcompiler_builtins` for 128-bit integers.
It's been the intention since the beginning to make this transition but
previously it just lacked the manpower to get done. As this PR likely shows it
wasn't a trivial integration! Some highlight changes are:
* The PR rust-lang-nursery/compiler-builtins#166 contains a number of fixes
across platforms and also some refactorings to make the intrinsics easier to
read. The additional testing added there also fixed a number of integration
issues when pulling the repository into this tree.
* LTO with the compiler-builtins crate was fixed to link in the entire crate
after the LTO process as these intrinsics are excluded from LTO.
* Treatment of hidden symbols was updated as previously the
`#![compiler_builtins]` crate would mark all symbol *imports* as hidden
whereas it was only intended to mark *exports* as hidden.
rustc: Implement the #[global_allocator] attribute
This PR is an implementation of [RFC 1974] which specifies a new method of
defining a global allocator for a program. This obsoletes the old
`#![allocator]` attribute and also removes support for it.
[RFC 1974]: https://github.com/rust-lang/rfcs/pull/1974
The new `#[global_allocator]` attribute solves many issues encountered with the
`#![allocator]` attribute such as composition and restrictions on the crate
graph itself. The compiler now has much more control over the ABI of the
allocator and how it's implemented, allowing much more freedom in terms of how
this feature is implemented.
cc #27389
This PR is an implementation of [RFC 1974] which specifies a new method of
defining a global allocator for a program. This obsoletes the old
`#![allocator]` attribute and also removes support for it.
[RFC 1974]: https://github.com/rust-lang/rfcs/pull/197
The new `#[global_allocator]` attribute solves many issues encountered with the
`#![allocator]` attribute such as composition and restrictions on the crate
graph itself. The compiler now has much more control over the ABI of the
allocator and how it's implemented, allowing much more freedom in terms of how
this feature is implemented.
cc #27389
This commit migrates the in-tree `libcompiler_builtins` to the upstream version
at https://github.com/rust-lang-nursery/compiler-builtins. The upstream version
has a number of intrinsics written in Rust and serves as an in-progress rewrite
of compiler-rt into Rust. Additionally it also contains all the existing
intrinsics defined in `libcompiler_builtins` for 128-bit integers.
It's been the intention since the beginning to make this transition but
previously it just lacked the manpower to get done. As this PR likely shows it
wasn't a trivial integration! Some highlight changes are:
* The PR rust-lang-nursery/compiler-builtins#166 contains a number of fixes
across platforms and also some refactorings to make the intrinsics easier to
read. The additional testing added there also fixed a number of integration
issues when pulling the repository into this tree.
* LTO with the compiler-builtins crate was fixed to link in the entire crate
after the LTO process as these intrinsics are excluded from LTO.
* Treatment of hidden symbols was updated as previously the
`#![compiler_builtins]` crate would mark all symbol *imports* as hidden
whereas it was only intended to mark *exports* as hidden.
When writing LLVM IR output demangled fn name in comments
`--emit=llvm-ir` looks like this now:
```
; <alloc::vec::Vec<T> as core::ops::index::IndexMut<core::ops::range::RangeFull>>::index_mut
; Function Attrs: inlinehint uwtable
define internal { i8*, i64 } @"_ZN106_$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..ops..index..IndexMut$LT$core..ops..range..RangeFull$GT$$GT$9index_mut17h7f7b576609f30262E"(%"alloc::vec::Vec<u8>"* dereferenceable(24)) unnamed_addr #0 {
start:
...
```
cc https://github.com/integer32llc/rust-playground/issues/15
Integrate jobserver support to parallel codegen
This commit integrates the `jobserver` crate into the compiler. The crate was
previously integrated in to Cargo as part of rust-lang/cargo#4110. The purpose
here is to two-fold:
* Primarily the compiler can cooperate with Cargo on parallelism. When you run
`cargo build -j4` then this'll make sure that the entire build process between
Cargo/rustc won't use more than 4 cores, whereas today you'd get 4 rustc
instances which may all try to spawn lots of threads.
* Secondarily rustc/Cargo can now integrate with a foreign GNU `make` jobserver.
This means that if you call cargo/rustc from `make` or another
jobserver-compatible implementation it'll use foreign parallelism settings
instead of creating new ones locally.
As the number of parallel codegen instances in the compiler continues to grow
over time with the advent of incremental compilation it's expected that this'll
become more of a problem, so this is intended to nip concurrent concerns in the
bud by having all the tools to cooperate!
Note that while rustc has support for itself creating a jobserver it's far more
likely that rustc will always use the jobserver configured by Cargo. Cargo today
will now set a jobserver unconditionally for rustc to use.
This commit integrates the `jobserver` crate into the compiler. The crate was
previously integrated in to Cargo as part of rust-lang/cargo#4110. The purpose
here is to two-fold:
* Primarily the compiler can cooperate with Cargo on parallelism. When you run
`cargo build -j4` then this'll make sure that the entire build process between
Cargo/rustc won't use more than 4 cores, whereas today you'd get 4 rustc
instances which may all try to spawn lots of threads.
* Secondarily rustc/Cargo can now integrate with a foreign GNU `make` jobserver.
This means that if you call cargo/rustc from `make` or another
jobserver-compatible implementation it'll use foreign parallelism settings
instead of creating new ones locally.
As the number of parallel codegen instances in the compiler continues to grow
over time with the advent of incremental compilation it's expected that this'll
become more of a problem, so this is intended to nip concurrent concerns in the
bud by having all the tools to cooperate!
Note that while rustc has support for itself creating a jobserver it's far more
likely that rustc will always use the jobserver configured by Cargo. Cargo today
will now set a jobserver unconditionally for rustc to use.
This commit deletes the in-tree `getopts` crate in favor of the crates.io-based
`getopts` crate. The main difference here is with a new builder-style API, but
otherwise everything else remains relatively standard.
Update cargo/rls submodules and dependencies
Brings in a few regression fixes on the Cargo side, updates the rls to work
with the newer Cargo, and also updates other crates.io dependencies to pull in
various bug fixes and such.
Implement lazy loading of external crates' sources. Fixes#38875Fixes#38875. This is a follow-up to #42507. When a (now correctly translated) span from an external crate is referenced in a error, warning or info message, we still don't have the source code being referenced.
Since stuffing the source in the serialized metadata of an rlib is extremely wasteful, the following scheme has been implemented:
* File maps now contain a source hash that gets serialized as well.
* When a span is rendered in a message, the source hash in the corresponding file map(s) is used to try and load the source from the corresponding file on disk. If the file is not found or the hashes don't match, the failed attempt is recorded (and not retried).
* The machinery fetching source lines from file maps is augmented to use the lazily loaded external source as a secondary fallback for file maps belonging to external crates.
This required a small change to the expected stderr of one UI test (it now renders a span, where previously was none).
Further work can be done based on this - some of the machinery previously used to hide external spans is possibly obsolete and the hashing code can be reused in different places as well.
r? @eddyb