Add a target option "merge-functions", and a corresponding -Z flag (works around #57356)
This commit adds a target option "merge-functions", which takes values in ("disabled", "trampolines", or "aliases" (default is "aliases")), to allow targets to opt out of the MergeFunctions LLVM pass. Additionally, the latest commit also adds an optional -Z flag, "merge-functions", which takes the same values and has precedence over the target option when both are specified.
This works around https://github.com/rust-lang/rust/issues/57356.
cc @eddyb @japaric @oli-obk @nox @nagisa
Also thanks to @denzp and @gnzlbg for discussing this on rust-cuda!
### Motivation
Basically, the problem is that the MergeFunctions pass, which rustc currently enables by default at -O2 and -O3 [1], and `extern "ptx-kernel"` functions (specific to the NVPTX target) are currently not compatible with each other. If the MergeFunctions pass is allowed to run, rustc can generate invalid PTX assembly (i.e. a PTX file that is not accepted by the native PTX assembler `ptxas`). Therefore we would like a way to opt out of the MergeFunctions pass, which is what our target option does.
### Related work
The current behavior of rustc is to enable MergeFunctions at -O2 and -O3 [1], and also to enable the use of function aliases within MergeFunctions [2] [3]. MergeFunctions seems to have some benefits, such as reducing code size and fixing a crash [4], which is why it is enabled. However, MergeFunctions both with and without function aliases is incompatible with the NVPTX target; a more detailed example for both cases is given below.
clang's "solution" is to have a "-fmerge-functions" flag that opts in to the MergeFunctions pass, but it is not enabled by default.
### Examples/more details
Consider an example Rust lib using `extern "ptx-kernel"` functions: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore.rs. If we try to compile this with nightly rustc, we get the following compiler error:
LLVM ERROR: Module has aliases, which NVPTX does not support.
This error happens because: (1) functions `foo` and `bar` have the same body, so are candidates to be merged by MergeFunctions; and (2) rustc configures MergeFunctions to generate function aliases using the "mergefunc-use-aliases" LLVM option [2] [3], but the NVPTX backend does not support those aliases.
Okay, so we can try omitting "mergefunc-use-aliases", and then rustc will happily emit PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-mergefunc-nousealiases-bad.ptx. However, this PTX is invalid! When we try to assemble it with `ptxas` (I'm on the CUDA 9.2 toolchain), we get an assembler error:
ptxas nocore-mergefunc-nousealiases-bad.ptx, line 38; error : Illegal call target, device function expected
ptxas fatal : Ptx assembly aborted due to errors
What's happening is that MergeFunctions rewrites the `bar` function to call `foo`. However, directly calling an `extern "ptx-kernel"` function from another `extern "ptx-kernel"` is wrong.
If we disable the MergeFunctions pass from running at all, rustc generates correct PTX assembly: https://github.com/peterhj/nvptx-mergefunc-bug/blob/master/nocore-nomergefunc-ok.ptx
[1] a36b960df6/src/librustc_codegen_ssa/back/write.rs (L155)
[2] a36b960df6/src/librustc_codegen_llvm/llvm_util.rs (L64)
[3] https://github.com/rust-lang/rust/pull/56358
[4] https://github.com/rust-lang/rust/pull/49479
rustc: Place wasm linker args first instead of last
This ensures that arguments passed via `-C link-arg` can override the
first ones on the command line, for example allowing configuring of the
stack size.
"trampolines", or "aliases (the default)) to allow targets to opt out of
the MergeFunctions LLVM pass. Also add a corresponding -Z option with
the same name and values.
This works around: https://github.com/rust-lang/rust/issues/57356
Motivation:
Basically, the problem is that the MergeFunctions pass, which rustc
currently enables by default at -O2 and -O3, and `extern "ptx-kernel"`
functions (specific to the NVPTX target) are currently not compatible
with each other. If the MergeFunctions pass is allowed to run, rustc can
generate invalid PTX assembly (i.e. a PTX file that is not accepted by
the native PTX assembler ptxas). Therefore we would like a way to opt
out of the MergeFunctions pass, which is what our target option does.
Related work:
The current behavior of rustc is to enable MergeFunctions at -O2 and -O3,
and also to enable the use of function aliases within MergeFunctions.
MergeFunctions both with and without function aliases is incompatible with
the NVPTX target.
clang's "solution" is to have a "-fmerge-functions" flag that opts in to
the MergeFunctions pass, but it is not enabled by default.
panic when calling MaybeUninhabited::into_inner on uninhabited type
I do this by adding an internal-only intrinsic `panic_if_uninhabited`. I have no idea what I am doing here, just mindlessly copying code around, so please review carefully!
Add support for trait-objects without a principal
The hard-error version of #56481 - should be merged after we do something about the `traitobject` crate.
Fixes#33140.
Fixes#57057.
r? @nikomatsakis
This ensures that arguments passed via `-C link-arg` can override the
first ones on the command line, for example allowing configuring of the
stack size.
Bump minimum required LLVM version to 6.0
Based on the discussion in #55842, while the overall position of Rust wrt LLVM continues to be contentious, there does seem to be a consensus that there is no need for continued support of LLVM 5. This PR bumps our version requirement to LLVM 6.0 and makes Travis run against that.
I hope that this is going to unblock #52694. If I understand correctly, while this issue still exists in LLVM 6, Ubuntu has backported the relevant patch.
r? @alexcrichton
Stabilize `linker-flavor` flag.
Part of #55396.
This commit moves the linker-flavor flag from a debugging option to a
codegen option, thus stabilizing it. There are no feature flags
associated with this flag.
r? @nagisa
This commit moves the linker-flavor flag from a debugging option to a
codegen option, thus stabilizing it. There are no feature flags
associated with this flag.
Instead of maybe storing its own sysroot and maybe deferring to the one
in `Session::opts`, just clone the latter when necessary so one is
always directly available. This removes the need for the getter.
Discard LLVM modules earlier when performing ThinLTO
Currently ThinLTO is performed by first compiling all modules (and keeping them in memory), and then serializing them into ThinLTO buffers in a separate, synchronized step. Modules are later read back from ThinLTO buffers when running the ThinLTO optimization pipeline.
We can also find the following comment in `lto.rs`:
// FIXME: right now, like with fat LTO, we serialize all in-memory
// modules before working with them and ThinLTO. We really
// shouldn't do this, however, and instead figure out how to
// extract a summary from an in-memory module and then merge that
// into the global index. It turns out that this loop is by far
// the most expensive portion of this small bit of global
// analysis!
I don't think that what is suggested here is the right approach: One of the primary benefits of using ThinLTO over ordinary LTO is that it's not necessary to keep all the modules (merged or not) in memory for the duration of the linking step.
However, we currently don't really make use of this (at least for crate-local ThinLTO), because we keep all modules in memory until the start of the LTO step. This PR changes the implementation to instead perform the serialization into ThinLTO buffers directly after the initial optimization step.
Most of the changes here are plumbing to separate out fat and thin lto handling in `write.rs`, as these now use different intermediate artifacts. For fat lto this will be in-memory modules, for thin lto it will be ThinLTO buffers.
r? @alexcrichton
Instead of only determining whether some form of LTO is necessary,
determine whether thin, fat or no LTO is necessary. I've rewritten
the conditions in a way that I think is more obvious, i.e. specified
LTO type + additional preconditions.
rustc 1.30.0's linker flavor inference is a non-backwards compat change to -Clinker
Part of #55396.
This commit modifies linker flavor inference to only remove the extension
to the linker when performing inference if that extension is a 'exe'.
r? @nagisa
cc @alexcrichton @japaric