rustc_driver: expose a way to override query providers in CompileController.
This API has been a long-time coming and will probably become the main method for custom drivers (that is, binaries other than `rustc` itself that use `librustc_driver`) to adapt the compiler's behavior.
Miscellaneous changes for CI, Docker and compiletest.
This PR contains 7 independent commits that improves interaction with CI, Docker and compiletest.
1. a4e5c91cb8 — Forces a newline every 100 dots when testing in quiet mode. Prevents spurious timeouts when abusing the CI to test Android jobs.
2. 1b5aaf22e8 — Use vault.centos.org for dist-powerpc64le-linux, see #45744.
3. 33400fbbcd — Modify `src/ci/docker/run.sh` so that the docker images can be run from Docker Toolbox for Windows on Windows 7. I haven't checked the behavior of the newer Docker for Windows on Windows 10. Also, "can run" does not mean all the test can pass successfully (the UDP tests failed last time I checked)
4. d517668a08 — Don't emit a real warning the linker segfault, which affects UI tests like https://github.com/rust-lang/rust/pull/45489#issuecomment-340134944. Log it instead.
5. 51e2247948 — During run-pass, trim the output if stdout/stderr exceeds 416 KB (top 160 KB + bottom 256 KB). This is an attempt to avoid spurious failures like https://github.com/rust-lang/rust/pull/45384#issuecomment-341755788
6. 9cfdabaf3c — Force `gem update --system` before deploy. This is an attempt to prevent spurious error #44159.
7. eee10cc482 — Tries to print the crash log on macOS on failure. This is an attempt to debug #45230.
incr.comp.: Verify stability of incr. comp. hashes and clean up various other things.
The main contribution of this PR is that it adds the `-Z incremental-verify-ich` functionality. Normally, when the red-green tracking system determines that a certain query result has not changed, it does not re-compute the incr. comp. hash (ICH) for that query result because that hash is already known. `-Z incremental-verify-ich` tells the compiler to re-hash the query result and compare the new hash against the cached hash. This is a rather thorough way of
- testing hashing implementation stability,
- finding missing `[input]` annotations on `DepNodes`, and
- finding missing read-edges,
since both a missed read and a missing `[input]` annotation can lead to something being marked as green instead of red and thus will have a different hash than it should have.
Case in point, implementing this verification logic and activating it for all `src/test/incremental` tests has revealed several such oversights, all of which are fixed in this PR.
r? @nikomatsakis
Allow overriding the TLS model
This PR adds the ability to override the default "global-dynamic" TLS model with a more specific one through a target json option or a command-line option. This allows for better code generation in certain situations.
This is similar to the `-ftls-model=` option in GCC and Clang.
Shorten paths to auxiliary files created by tests
I'm hitting issues with long file paths to object files created by the test suite, similar to https://github.com/rust-lang/rust/issues/45103#issuecomment-335622075.
If we look at the object file path in https://github.com/rust-lang/rust/issues/45103 we can see that the patch contains of few components:
```
specialization-cross-crate-defaults.stage2-x86_64-pc-windows-gnu.run-pass.libaux\specialization_cross_crate_defaults.specialization_cross_crate_defaults0.rust-cgu.o
```
=>
1. specialization-cross-crate-defaults // test name, required
2. stage2 // stage disambiguator, required
3. x86_64-pc-windows-gnu // target disambiguator, required
4. run-pass // mode disambiguator, rarely required
5. libaux // suffix, can be shortened
6. specialization_cross_crate_defaults // required, there may be several libraries in the directory
7. specialization_cross_crate_defaults0 // codegen unit name, can be shortened?
8. rust-cgu // suffix, can be shortened?
9. o // object file extension
This patch addresses items `4`, `5` and `8`.
`libaux` is shortened to `aux`, `rust-cgu` is shortened to `rcgu`, mode disambiguator is omitted unless it's necessary (for pretty-printing and debuginfo tests, see 38d26d811a)
I haven't touched names of codegen units though (`specialization_cross_crate_defaults0`).
Is it useful for them to have descriptive names including the crate name, as opposed to just `0` or `cgu0` or something?
Right now symbol exports, particularly in a cdylib, are handled by
assuming that `pub extern` combined with `#[no_mangle]` means "export
this". This isn't actually what we want for some symbols that the
standard library uses to implement itself, for example symbols related
to allocation. Additionally other special symbols like
`rust_eh_personallity` have no need to be exported from cdylib crate
types (only needed in dylib crate types).
This commit updates how rustc handles these special symbols by adding to
the hardcoded logic of symbols like `rust_eh_personallity` but also
adding a new attribute, `#[rustc_std_internal_symbol]`, which forces the
export level to be considered the same as all other Rust functions
instead of looking like a C function.
The eventual goal here is to prevent functions like `__rdl_alloc` from
showing up as part of a Rust cdylib as it's just an internal
implementation detail. This then further allows such symbols to get gc'd
by the linker when creating a cdylib.
This commit moves compression of the bytecode from the `link` module to the
`write` module, namely allowing it to be (a) cached by incremental compilation
and (b) produced in parallel. The parallelization may show up as some nice wins
during normal compilation and the caching in incremental mode should be
beneficial for incremental compiles! (no more need to recompress the entire
crate's bitcode on all builds)
On MSVC targets rustc will add symbols prefixed with `_imp_` to LLVM modules to
"emulate" dllexported statics as that workaround is still in place after #27438
hasn't been solved otherwise. These statics, however, were getting gc'd by
ThinLTO accidentally which later would cause linking failures.
This commit updates the location we add such symbols to happen just before
codegen to ensure that (a) they're not eliminated by the optimizer and (b) the
optimizer doesn't even worry about them.
Closes#45347
First the `addPreservedGUID` function forgot to take care of "alias" summaries.
I'm not 100% sure what this is but the current code now matches upstream. Next
the `computeDeadSymbols` return value wasn't actually being used, but it needed
to be used! Together these should...
Closes#45195
incr.comp.: Bring back output of -Zincremental-info.
This got kind lost during the transition to red/green.
I also switched back from `eprintln!()` to `println!()` since the former never actually produced any output. I suspect this has to do with `libterm` somehow monopolizing `stderr`.
r? @nikomatsakis
Some targets, like msp430 and nvptx, don't work with multiple codegen units
right now for bugs or fundamental reasons. To expose this allow targets to
express a default.
Closes#45000
Don't panic in the coordinator thread, bubble up the failure
Fixes#43402 (take 2)
Followup to #45019, this makes the coordinator thread not panic on worker failures since they can be reported reasonably back in the main thread.
The output also now has no evidence of backtraces at all, unlike the previous PR:
```
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc -o "" x.rs
error: could not write output to : No such file or directory
error: aborting due to previous error
```
r? @alexcrichton
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat `#[inline]` functions by translating them into all codegen units that
they're needed within, marking the linkage as `internal`. This commit changes
the behavior so that in debug mode (compiling at `-O0`) rustc will instead only
translate `#[inline]` functions into *one* codegen unit, forcing all other
codegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example `String` would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of `#[inline]` functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.
Collected [data] shows that this does indeed improve the situation from [before]
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.
[data]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334880911
[before]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334583384
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
Don't unwrap work item results as the panic trace is useless
Fixes#43402 now there's no multithreaded panic printouts
Also update a comment
--------
Likely regressed in #43506, where the code was changed to panic in worker threads on error.
Unwrapping gives zero extra information since the stack trace is so short, so we may as well just surface that there was an error and exit the thread properly. Because there are then no multithreaded printouts, I think it should mean the output of the test for #26199 is deterministic and not interleaved (thanks to @philipc https://github.com/rust-lang/rust/issues/43402#issuecomment-333835271 for a hint).
Sadly the output is now:
```
thread '<unnamed>' panicked at 'aborting due to worker thread panic', src/librustc_trans/back/write.rs:1643:20
note: Run with `RUST_BACKTRACE=1` for a backtrace.
error: could not write output to : No such file or directory
error: aborting due to previous error
```
but it's an improvement over the multi-panic situation before.
r? @alexcrichton
This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer
term this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM
module has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
* Management of the bytecode encoding in rlibs was centralized into one module
instead of being scattered across LTO and linking.
* Some internal refactorings on the link stage of the compiler was done to work
directly from `CompiledModule` structures instead of lists of paths.
* The trans time-graph output was tweaked a little to include a name on each
bar and inflate the size of the bars a little
rustc: Default 32 codegen units at O0
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
incr.comp.: Move task result fingerprinting into DepGraph.
This PR
- makes the DepGraph store all `Fingerprints` of task results,
- allows `DepNode` to be marked as input nodes,
- makes HIR node hashing use the regular fingerprinting infrastructure,
- removes the now unused `IncrementalHashesMap`, and
- makes sure that `traits_in_scope_map` fingerprints are stable.
r? @nikomatsakis
cc @alexcrichton
Skip passing /natvis to lld-link until supported.
### Overview
Teaching rustc about MSVC's undocumented linker flag, /NATVIS, broke rustc's compatability with LLVM's `lld-link` frontend, as it does not recognize the flag. This pull request works around the problem by excluding `lld-link` by name. @retep998 discovered this regression.
### Possible Issues
- Other linkers that try to be compatible with the MSVC linker flavor may also be broken and in need of workarounds.
- Warning about the workaround may be overkill for a minor reduction in debug functionality.
- Depending on how long this workaround sticks around, it may eventually be preferred to version check `lld-link` instead of assuming all versions are incompatible.
### Relevant issues
* Broke in https://github.com/rust-lang/rust/pull/43221 Embed MSVC .natvis files into .pdbs and mangle debuginfo for &str, *T, and [T].
* LLVM patched in 27b9c42853 to ignore the flag instead of erroring.
r? @michaelwoerister
This commit moves the actual code generation in the compiler behind a query
keyed by a codegen unit's name. This ended up entailing quite a few internal
refactorings to enable this, along with a few cut corners:
* The `OutputFilenames` structure is now tracked in the `TyCtxt` as it affects a
whole bunch of trans and such. This is now behind a query and threaded into
the construction of the `TyCtxt`.
* The `TyCtxt` now has a channel "out the back" intended to send data to worker
threads in rustc_trans. This is used as a sort of side effect of the codegen
query but morally what's happening here is the return value of the query
(currently unit but morally a path) is only valid once the background threads
have all finished.
* Dispatching work items to the codegen threads was refactored to only rely on
data in `TyCtxt`, which mostly just involved refactoring where data was
stored, moving it from the translation thread to the controller thread's
`CodegenContext` or the like.
* A new thread locals was introduced in trans to work around the query
system. This is used in the implementation of `assert_module_sources` which
looks like an artifact of the old query system and will presumably go away
once red/green is up and running.
This commit attaches a channel to the LLVM workers to the `TyCtxt` which will
later be used during the codegen query to actually send work to LLVM workers.
Otherwise this commit is just plumbing this channel throughout the compiler to
ensure it reaches the right consumers.
This is a big map that ends up inside of a `CrateContext` during translation for
all codegen units. This means that any change to the map may end up causing an
incremental recompilation of a codegen unit! In order to reduce the amount of
dependencies here between codegen units and the actual input crate this commit
refactors dealing with exported symbols and such into various queries.
The new queries are largely based on existing queries with filled out
implementations for the local crate in addition to external crates, but the main
idea is that while translating codegen untis no unit needs the entire set of
exported symbols, instead they only need queries about particulare `DefId`
instances every now and then.
The linking stage, however, still generates a full list of all exported symbols
from all crates, but that's going to always happen unconditionally anyway, so no
news there!