This commit updates LLVM to fix#45511 (https://reviews.llvm.org/D39981) and
also reenables ThinLTO for libtest now that we shouldn't hit #45768. This also
opportunistically enables ThinLTO for libstd which was previously blocked
(#45661) on test failures related to debuginfo with a presumed cause of #45511.
Closes#45511
Enable TrapUnreachable in LLVM.
This patch enables LLVM's TrapUnreachable flag, which tells it to translate `unreachable` instructions into hardware trap instructions, rather than allowing control flow to "fall through" into whatever code happens to follow it in memory.
This follows up on https://github.com/rust-lang/rust/issues/28728#issuecomment-332581533. For example, for @zackw's testcase [here](https://github.com/rust-lang/rust/issues/42009#issue-228745924), the output function contains a `ud2` instead of no code, so it won't "fall through" into whatever happens to be next in memory.
(I'm also working on the problem of LLVM optimizing away infinite loops, but the patch here is useful independently.)
I tested this patch on a few different codebases, and the code size increase ranged from 0.0% to 0.1%.
The return value in these tests is just being used to generate extra
code so that it can be detected in the test script, which is just
counting lines in the assembly output.
With rustc now emitting "ud2" on unreachable code, a "return nothing"
sequence may take the same number of lines as an "unreachable" sequence
in assembly code. This test is testing that intrinsics::unreachable()
works by testing that it reduces the number of lines in the assembly
code. Fix it by adding a return value, which requires an extra
instruction in the reachable case, which provides the test what it's
looking for.
Shorten paths to auxiliary files created by tests
I'm hitting issues with long file paths to object files created by the test suite, similar to https://github.com/rust-lang/rust/issues/45103#issuecomment-335622075.
If we look at the object file path in https://github.com/rust-lang/rust/issues/45103 we can see that the patch contains of few components:
```
specialization-cross-crate-defaults.stage2-x86_64-pc-windows-gnu.run-pass.libaux\specialization_cross_crate_defaults.specialization_cross_crate_defaults0.rust-cgu.o
```
=>
1. specialization-cross-crate-defaults // test name, required
2. stage2 // stage disambiguator, required
3. x86_64-pc-windows-gnu // target disambiguator, required
4. run-pass // mode disambiguator, rarely required
5. libaux // suffix, can be shortened
6. specialization_cross_crate_defaults // required, there may be several libraries in the directory
7. specialization_cross_crate_defaults0 // codegen unit name, can be shortened?
8. rust-cgu // suffix, can be shortened?
9. o // object file extension
This patch addresses items `4`, `5` and `8`.
`libaux` is shortened to `aux`, `rust-cgu` is shortened to `rcgu`, mode disambiguator is omitted unless it's necessary (for pretty-printing and debuginfo tests, see 38d26d811a)
I haven't touched names of codegen units though (`specialization_cross_crate_defaults0`).
Is it useful for them to have descriptive names including the crate name, as opposed to just `0` or `cgu0` or something?
Right now symbol exports, particularly in a cdylib, are handled by
assuming that `pub extern` combined with `#[no_mangle]` means "export
this". This isn't actually what we want for some symbols that the
standard library uses to implement itself, for example symbols related
to allocation. Additionally other special symbols like
`rust_eh_personallity` have no need to be exported from cdylib crate
types (only needed in dylib crate types).
This commit updates how rustc handles these special symbols by adding to
the hardcoded logic of symbols like `rust_eh_personallity` but also
adding a new attribute, `#[rustc_std_internal_symbol]`, which forces the
export level to be considered the same as all other Rust functions
instead of looking like a C function.
The eventual goal here is to prevent functions like `__rdl_alloc` from
showing up as part of a Rust cdylib as it's just an internal
implementation detail. This then further allows such symbols to get gc'd
by the linker when creating a cdylib.
rustc: Handle #[inline(always)] at -O0
This commit updates the handling of `#[inline(always)]` functions at -O0 to
ensure that it's always inlined regardless of the number of codegen units used.
Closes#45201
This commit updates the handling of `#[inline(always)]` functions at -O0 to
ensure that it's always inlined regardless of the number of codegen units used.
Closes#45201
Fix data-layout field in x86_64-unknown-linux-gnu.json test file
The current data-layout causes the following error:
> rustc: /checkout/src/llvm/lib/CodeGen/MachineFunction.cpp:151: void llvm::MachineFunction::init(): Assertion `Target.isCompatibleDataLayout(getDataLayout()) && "Can't create a MachineFunction using a Module with a " "Target-incompatible DataLayout attached\n"' failed.
The new value was generated according to [this comment by @japaric](https://github.com/rust-lang/rust/issues/31367#issuecomment-213595571).
This commit tweaks the behavior of inlining functions into multiple codegen
units when rustc is compiling in debug mode. Today rustc will unconditionally
treat `#[inline]` functions by translating them into all codegen units that
they're needed within, marking the linkage as `internal`. This commit changes
the behavior so that in debug mode (compiling at `-O0`) rustc will instead only
translate `#[inline]` functions into *one* codegen unit, forcing all other
codegen units to reference this one copy.
The goal here is to improve debug compile times by reducing the amount of
translation that happens on behalf of multiple codegen units. It was discovered
in #44941 that increasing the number of codegen units had the adverse side
effect of increasing the overal work done by the compiler, and the suspicion
here was that the compiler was inlining, translating, and codegen'ing more
functions with more codegen units (for example `String` would be basically
inlined into all codegen units if used). The strategy in this commit should
reduce the cost of `#[inline]` functions to being equivalent to one codegen
unit, which is only translating and codegen'ing inline functions once.
Collected [data] shows that this does indeed improve the situation from [before]
as the overall cpu-clock time increases at a much slower rate and when pinned to
one core rustc does not consume significantly more wall clock time than with one
codegen unit.
One caveat of this commit is that the symbol names for inlined functions that
are only translated once needed some slight tweaking. These inline functions
could be translated into multiple crates and we need to make sure the symbols
don't collideA so the crate name/disambiguator is mixed in to the symbol name
hash in these situations.
[data]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334880911
[before]: https://github.com/rust-lang/rust/issues/44941#issuecomment-334583384
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
Fix native main() signature on 64bit
Hello,
in LLVM-IR produced by rustc on x86_64-linux-gnu, the native main() function had incorrect types for the function result and argc parameter: i64, while it should be i32 (really c_int). See also #20064, #29633.
So I've attempted a fix here. I tested it by checking the LLVM IR produced with --target x86_64-unknown-linux-gnu and i686-unknown-linux-gnu. Also I tried running the tests (`./x.py test`), however I'm getting two failures with and without the patch, which I'm guessing is unrelated.
This commit changes the default of rustc to use 32 codegen units when compiling
in debug mode, typically an opt-level=0 compilation. Since their inception
codegen units have matured quite a bit, gaining features such as:
* Parallel translation and codegen enabling codegen units to get worked on even
more quickly.
* Deterministic and reliable partitioning through the same infrastructure as
incremental compilation.
* Global rate limiting through the `jobserver` crate to avoid overloading the
system.
The largest benefit of codegen units has forever been faster compilation through
parallel processing of modules on the LLVM side of things, using all the cores
available on build machines that typically have many available. Some downsides
have been fixed through the features above, but the major downside remaining is
that using codegen units reduces opportunities for inlining and optimization.
This, however, doesn't matter much during debug builds!
In this commit the default number of codegen units for debug builds has been
raised from 1 to 32. This should enable most `cargo build` compiles that are
bottlenecked on translation and/or code generation to immediately see speedups
through parallelization on available cores.
Work is being done to *always* enable multiple codegen units (and therefore
parallel codegen) but it requires #44841 at least to be landed and stabilized,
but stay tuned if you're interested in that aspect!
This commit removes the `dep_graph` field from the `Session` type according to
issue #44390. Most of the fallout here was relatively straightforward and the
`prepare_session_directory` function was rejiggered a bit to reuse the results
in the later-called `load_dep_graph` function.
Closes#44390
Add `TargetOptions::min_global_align`, with s390x at 16-bit
The SystemZ `LALR` instruction provides PC-relative addressing for globals,
but only to *even* addresses, so other compilers make sure that such
globals are always 2-byte aligned. In Clang, this is modeled with
`TargetInfo::MinGlobalAlign`, and `TargetOptions::min_global_align` now
serves the same purpose for rustc.
In Clang, the only targets that set this are SystemZ, Lanai, and NVPTX, and
the latter two don't have targets in rust master.
Fixes#44411.
r? @eddyb
Fix sanitizer tests on buggy kernels
Travis recently pushed an update to the Linux environments, namely the kernels
that we're running on. This in turn caused some of the sanitizer tests we run to
fail. We also apparently weren't the first to hit these failures! Detailed in
google/sanitizers#837 these tests were failing due to a specific commit in the
kernel which has since been backed out, but for now work around the buggy kernel
that's deployed on Travis and eventually we should be able to remove these
flags.
Travis recently pushed an update to the Linux environments, namely the kernels
that we're running on. This in turn caused some of the sanitizer tests we run to
fail. We also apparently weren't the first to hit these failures! Detailed in
google/sanitizers#837 these tests were failing due to a specific commit in the
kernel which has since been backed out, but for now work around the buggy kernel
that's deployed on Travis and eventually we should be able to remove these
flags.
This commit adds logic to the compiler to attempt to handle super long linker
invocations by falling back to the `@`-file syntax if the invoked command is too
large. Each OS has a limit on how many arguments and how large the arguments can
be when spawning a new process, and linkers tend to be one of those programs
that can hit the limit!
The logic implemented here is to unconditionally attempt to spawn a linker and
then if it fails to spawn with an error from the OS that indicates the command
line is too big we attempt a fallback. The fallback is roughly the same for all
linkers where an argument pointing to a file, prepended with `@`, is passed.
This file then contains all the various arguments that we want to pass to the
linker.
Closes#41190
These fixes all have to do with the 64-bit PowerPC ELF ABI for big-endian
targets. The ELF v2 ABI for powerpc64le already worked well.
- Return after marking return aggregates indirect. Fixes#42757.
- Pass one-member float aggregates as direct argument values.
- Aggregate arguments less than 64-bit must be written in the least-
significant bits of the parameter space.
- Larger aggregates are instead padded at the tail.
(i.e. filling MSBs, padding the remaining LSBs.)
New tests were also added for the single-float aggregate, and a 3-byte
aggregate to check that it's filled into LSBs. Overall, at least these
formerly-failing tests now pass on powerpc64:
- run-make/extern-fn-struct-passing-abi
- run-make/extern-fn-with-packed-struct
- run-pass/extern-pass-TwoU16s.rs
- run-pass/extern-pass-TwoU8s.rs
- run-pass/struct-return.rs
This is a follow-up to f189d7a6937 and 9d11b089ad1. While `-z ignore`
is what needs to be passed to the Solaris linker, because gcc is used as
the default linker, both that form and `-Wl,-z -Wl,ignore` (including
extra double quotes) need to be taken into account, which explains the
more complex regular expression.