Enable TrapUnreachable in LLVM.
This patch enables LLVM's TrapUnreachable flag, which tells it to translate `unreachable` instructions into hardware trap instructions, rather than allowing control flow to "fall through" into whatever code happens to follow it in memory.
This follows up on https://github.com/rust-lang/rust/issues/28728#issuecomment-332581533. For example, for @zackw's testcase [here](https://github.com/rust-lang/rust/issues/42009#issue-228745924), the output function contains a `ud2` instead of no code, so it won't "fall through" into whatever happens to be next in memory.
(I'm also working on the problem of LLVM optimizing away infinite loops, but the patch here is useful independently.)
I tested this patch on a few different codebases, and the code size increase ranged from 0.0% to 0.1%.
Enable LLVM's TrapUnreachable flag, which tells it to translate
`unreachable` instructions into hardware trap instructions, rather
than allowing control flow to "fall through" into whatever code
happens to follow it in memory.
First the `addPreservedGUID` function forgot to take care of "alias" summaries.
I'm not 100% sure what this is but the current code now matches upstream. Next
the `computeDeadSymbols` return value wasn't actually being used, but it needed
to be used! Together these should...
Closes#45195
Band-aid fix to stop race conditions in llvm errors
This is a big hammer, but should be effective at completely removing a
few issues, including inconsistent error messages and segfaults when
LLVM workers race to report results
`LLVM_THREAD_LOCAL` has been present in LLVM since 8 months before 3.7
(the earliest supported LLVM version that Rust can use)
Maybe fixes#43402 (third time lucky?)
r? @alexcrichton
------
You can see that in 5f578dfad0/src/librustc_trans/back/write.rs (L75-L100) there's a small window where the static global error message (made thread local in this PR) could be altered by another thread.
Note that we can't use `thread_local` because gcc 4.7 (permitted according to the readme) does not support it.
Maybe ideally all the functions should be modified to not use a global, but this PR makes things deterministic at least. My only hesitation is whether errors are checked in different threads to where they occur, but I figure that's probably unlikely (and is less bad than racing code).
As an aside, segfault evidence before this patch when I was doing some debugging:
```
$ while grep 'No such file or directory' log2; do RUST_LOG=debug ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc -o "" y.rs >log2 2>&1; done
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
Segmentation fault (core dumped)
error: could not write output to : No such file or directory
error: could not write output to : No such file or directory
```
This is a big hammer, but should be effective at completely removing a
few issues, including inconsistent error messages and segfaults when
LLVM workers race to report results
LLVM_THREAD_LOCAL has been present in LLVM since 8 months before 3.7
(the earliest supported LLVM version that Rust can use)
Maybe fixes#43402 (third time lucky?)
This commit is an implementation of LLVM's ThinLTO for consumption in rustc
itself. Currently today LTO works by merging all relevant LLVM modules into one
and then running optimization passes. "Thin" LTO operates differently by having
more sharded work and allowing parallelism opportunities between optimizing
codegen units. Further down the road Thin LTO also allows *incremental* LTO
which should enable even faster release builds without compromising on the
performance we have today.
This commit uses a `-Z thinlto` flag to gate whether ThinLTO is enabled. It then
also implements two forms of ThinLTO:
* In one mode we'll *only* perform ThinLTO over the codegen units produced in a
single compilation. That is, we won't load upstream rlibs, but we'll instead
just perform ThinLTO amongst all codegen units produced by the compiler for
the local crate. This is intended to emulate a desired end point where we have
codegen units turned on by default for all crates and ThinLTO allows us to do
this without performance loss.
* In anther mode, like full LTO today, we'll optimize all upstream dependencies
in "thin" mode. Unlike today, however, this LTO step is fully parallelized so
should finish much more quickly.
There's a good bit of comments about what the implementation is doing and where
it came from, but the tl;dr; is that currently most of the support here is
copied from upstream LLVM. This code duplication is done for a number of
reasons:
* Controlling parallelism means we can use the existing jobserver support to
avoid overloading machines.
* We will likely want a slightly different form of incremental caching which
integrates with our own incremental strategy, but this is yet to be
determined.
* This buys us some flexibility about when/where we run ThinLTO, as well as
having it tailored to fit our needs for the time being.
* Finally this allows us to reuse some artifacts such as our `TargetMachine`
creation, where all our options we used today aren't necessarily supported by
upstream LLVM yet.
My hope is that we can get some experience with this copy/paste in tree and then
eventually upstream some work to LLVM itself to avoid the duplication while
still ensuring our needs are met. Otherwise I fear that maintaining these
bindings may be quite costly over the years with LLVM updates!
This commit is a refactoring of the LTO backend in Rust to support compilations
with multiple codegen units. The immediate result of this PR is to remove the
artificial error emitted by rustc about `-C lto -C codegen-units-8`, but longer
term this is intended to lay the groundwork for LTO with incremental compilation
and ultimately be the underpinning of ThinLTO support.
The problem here that needed solving is that when rustc is producing multiple
codegen units in one compilation LTO needs to merge them all together.
Previously only upstream dependencies were merged and it was inherently relied
on that there was only one local codegen unit. Supporting this involved
refactoring the optimization backend architecture for rustc, namely splitting
the `optimize_and_codegen` function into `optimize` and `codegen`. After an LLVM
module has been optimized it may be blocked and queued up for LTO, and only
after LTO are modules code generated.
Non-LTO compilations should look the same as they do today backend-wise, we'll
spin up a thread for each codegen unit and optimize/codegen in that thread. LTO
compilations will, however, send the LLVM module back to the coordinator thread
once optimizations have finished. When all LLVM modules have finished optimizing
the coordinator will invoke the LTO backend, producing a further list of LLVM
modules. Currently this is always a list of one LLVM module. The coordinator
then spawns further work to run LTO and code generation passes over each module.
In the course of this refactoring a number of other pieces were refactored:
* Management of the bytecode encoding in rlibs was centralized into one module
instead of being scattered across LTO and linking.
* Some internal refactorings on the link stage of the compiler was done to work
directly from `CompiledModule` structures instead of lists of paths.
* The trans time-graph output was tweaked a little to include a name on each
bar and inflate the size of the bars a little
Commit c4710203c098b in #43492 make `LLVMRustHasFeature` "more robust"
by using `getFeatureTable()`. However, this function is specific to
Rust's own LLVM fork, not upstream LLVM-4.0, so we need to use
`#if LLVM_RUSTLLVM` to guard this call.
The function should accept feature strings that old LLVM might not
support.
Simplify the code using the same approach used by
LLVMRustPrintTargetFeatures.
Dummify the function for non 4.0 LLVM and update the tests accordingly.
Update Rust LLVM bindings for LLVM 5.0
This is the initial set of changes to update the rust llvm bindings for 5.0. The llvm commits necessitating these changes are linked from the tracking issue, #43370.
This still does not work on 32-bit archs because of an LLVM limitation,
but this is only an optimization, so let's push it on 64-bit only for now.
Fixes#37945
rustc: Implement the #[global_allocator] attribute
This PR is an implementation of [RFC 1974] which specifies a new method of
defining a global allocator for a program. This obsoletes the old
`#![allocator]` attribute and also removes support for it.
[RFC 1974]: https://github.com/rust-lang/rfcs/pull/1974
The new `#[global_allocator]` attribute solves many issues encountered with the
`#![allocator]` attribute such as composition and restrictions on the crate
graph itself. The compiler now has much more control over the ABI of the
allocator and how it's implemented, allowing much more freedom in terms of how
this feature is implemented.
cc #27389
This PR is an implementation of [RFC 1974] which specifies a new method of
defining a global allocator for a program. This obsoletes the old
`#![allocator]` attribute and also removes support for it.
[RFC 1974]: https://github.com/rust-lang/rfcs/pull/197
The new `#[global_allocator]` attribute solves many issues encountered with the
`#![allocator]` attribute such as composition and restrictions on the crate
graph itself. The compiler now has much more control over the ABI of the
allocator and how it's implemented, allowing much more freedom in terms of how
this feature is implemented.
cc #27389
So ARM had quite a few codegen bugs on LLVM 4.0 which are fixed on LLVM
trunk. This backports 5 of them:
r297871 - ARM: avoid clobbering register in v6 jump-table expansion.
- fixesrust-lang/rust#42248
r294949 - [Thumb-1] TBB generation: spot redefinitions of index
r295816 - [ARM] Fix constant islands pass.
r300870 - [Thumb-1] Fix corner cases for compressed jump tables
r302650 - [IfConversion] Add missing check in
IfConversion/canFallThroughTo
- unblocks rust-lang/rust#39409
Add RWPI/ROPI relocation model support
This PR adds support for using LLVM 4's ROPI and RWPI relocation models for ARM.
ROPI (Read-Only Position Independence) and RWPI (Read-Write Position Independence) are two new relocation models in LLVM for the ARM backend ([LLVM changset](https://reviews.llvm.org/rL278015)). The motivation is that these are the specific strategies we use in userspace [Tock](https://www.tockos.org) apps, so supporting this is an important step (perhaps the final step, but can't confirm yet) in enabling userspace Rust processes.
## Explanation
ROPI makes all code and immutable accesses PC relative, but not assumed to be overriden at runtime (so for example, jumps are always relative).
RWPI uses a base register (`r9`) that stores the addresses of the GOT in memory so the runtime (e.g. a kernel) only adjusts r9 tell running code where the GOT is.
## Complications adding support in Rust
While this landed in LLVM master back in August, the header files in `llvm-c` have not been updated yet to reflect it. Rust replicates that header file's version of the `LLVMRelocMode` enum as the Rust enum `llvm::RelocMode` and uses an implicit cast in the ffi to translate from Rust's notion of the relocation model to the LLVM library's notion.
My workaround for this currently is to replace the `LLVMRelocMode` argument to `LLVMTargetMachineRef` with an int and using the hardcoded int representation of the `RelocMode` enum. This is A Bad Idea(tm), but I think very nearly the right thing.
Would a better alternative be to patch rust-llvm to support these enum variants (also a fairly trivial change)?
When -Z profile is passed, the GCDAProfiling LLVM pass is added
to the pipeline, which uses debug information to instrument the IR.
After compiling with -Z profile, the $(OUT_DIR)/$(CRATE_NAME).gcno
file is created, containing initial profiling information.
After running the program built, the $(OUT_DIR)/$(CRATE_NAME).gcda
file is created, containing branch counters.
The created *.gcno and *.gcda files can be processed using
the "llvm-cov gcov" and "lcov" tools. The profiling data LLVM
generates does not faithfully follow the GCC's format for *.gcno
and *.gcda files, and so it will probably not work with other tools
(such as gcov itself) that consume these files.
Replaces the llvm-c exposed LLVMRelocMode, which does not include all
relocation model variants, with a LLVMRustRelocMode modeled after
LLVMRustCodeMode.