add a test case for issue #32031
I propose a test case to finish the fix for issue #32031. Please review this commit thoroughly, as I have never written a codegen test before.
r? @eddyb
This commit is an implementation of [RFC 1513] which allows applications to
alter the behavior of panics at compile time. A new compiler flag, `-C panic`,
is added and accepts the values `unwind` or `panic`, with the default being
`unwind`. This model affects how code is generated for the local crate, skipping
generation of landing pads with `-C panic=abort`.
[RFC 1513]: https://github.com/rust-lang/rfcs/blob/master/text/1513-less-unwinding.md
Panic implementations are then provided by crates tagged with
`#![panic_runtime]` and lazily required by crates with
`#![needs_panic_runtime]`. The panic strategy (`-C panic` value) of the panic
runtime must match the final product, and if the panic strategy is not `abort`
then the entire DAG must have the same panic strategy.
With the `-C panic=abort` strategy, users can expect a stable method to disable
generation of landing pads, improving optimization in niche scenarios,
decreasing compile time, and decreasing output binary size. With the `-C
panic=unwind` strategy users can expect the existing ability to isolate failure
in Rust code from the outside world.
Organizationally, this commit dismantles the `sys_common::unwind` module in
favor of some bits moving part of it to `libpanic_unwind` and the rest into the
`panicking` module in libstd. The custom panic runtime support is pretty similar
to the custom allocator support with the only major difference being how the
panic runtime is injected (takes the `-C panic` flag into account).
`fast` a.k.a UnsafeAlgebra is the flag for enabling all "unsafe"
(according to llvm) float optimizations.
See LangRef for more information http://llvm.org/docs/LangRef.html#fast-math-flags
Providing these operations with less precise associativity rules (for
example) is useful to numerical applications.
For example, the summation loop:
let sum = 0.;
for element in data {
sum += *element;
}
Using the default floating point semantics, this loop expresses the
floats must be added in a sequence, one after another. This constraint
is usually completely unintended, and it means that no autovectorization
is possible.
LLVM's memory dependence analysis doesn't properly account for calls
that could unwind and thus effectively act as a branching point. This
can lead to stores that are only visible when the call unwinds being
removed, possibly leading to calls to drop() functions with b0rked
memory contents.
As there is no fix for this in LLVM yet and we want to keep
compatibility to current LLVM versions anyways, we have to workaround
this bug by omitting the noalias attribute on &mut function arguments.
Benchmarks suggest that the performance loss by this change is very
small.
Thanks to @RalfJung for pushing me towards not removing too many
noalias annotations and @alexcrichton for helping out with the test for
this bug.
Fixes#29485
If a new cleanup is added to a cleanup scope, the cached exits for that
scope are cleared, so all previous cleanups have to be translated
again. In the worst case this means that we get N distinct landing pads
where the last one has N cleanups, then N-1 and so on.
As new cleanups are to be executed before older ones, we can instead
cache the number of already translated cleanups in addition to the
block that contains them, and then only translate new ones, if any and
then jump to the cached ones, getting away with linear growth instead.
For the crate in #31381 this reduces the compile time for an optimized
build from >20 minutes (I cancelled the build at that point) to about 11
seconds. Testing a few crates that come with rustc show compile time
improvements somewhere between 1 and 8%. The "big" winner being
rustc_platform_intrinsics which features code similar to that in #31381.
Fixes#31381
Since fat pointers do not qualify as structural types, they got copied
using load_ty and store_ty, which means that we load an FCA and use
extractvalue to get the components of the fat pointer. This breaks
certain optimizations in LLVM.
Found via apasel422/ref_count#13
For enum variants, the default alignment for a specific variant might be
lower than the alignment of the enum type itself. In such cases we, for
example, generate memcpy calls with an alignment that's higher than the
alignment of the constant we copy from.
To avoid that, we need to explicitly set the required alignment on
constants.
Fixes#28912.
By putting an "unreachable" instruction into the default arm of a switch
instruction we can let LLVM know that the match is exhaustive, allowing
for better optimizations.
For example, this match:
```rust
pub enum Enum {
One,
Two,
Three,
}
impl Enum {
pub fn get_disc(self) -> u8 {
match self {
Enum::One => 0,
Enum::Two => 1,
Enum::Three => 2,
}
}
}
```
Currently compiles to this on x86_64:
```asm
.cfi_startproc
movzbl %dil, %ecx
cmpl $1, %ecx
setne %al
testb %cl, %cl
je .LBB0_2
incb %al
movb %al, %dil
.LBB0_2:
movb %dil, %al
retq
.Lfunc_end0:
```
But with this change we get:
```asm
.cfi_startproc
movb %dil, %al
retq
.Lfunc_end0:
```
Currently, we're generating adjustments, for example, to get from &[u8]
to &[u8], which is unneeded and kicks us out of trans_into() into
trans() which means an additional stack slot and copy in the unoptimized
code.
Unwinding across an FFI boundary is undefined behaviour, so we can mark
all external function as nounwind. The obvious exception are those
functions that actually perform the unwinding.
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.
Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.
This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:
text data filename
5671479 3941461 before/librustc-d8ace771.so
5447663 3905745 after/librustc-d8ace771.so
1944425 2394024 before/libstd-d8ace771.so
1896769 2387610 after/libstd-d8ace771.so
I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.
Fixes#22924
Cc #22891 (at least for fat pointers the code is good now)
Unlike coercing from reference to unsafe pointer, coercing between two
unsafe pointers doesn't need an AutoDerefRef, because there is no region
that regionck would need to know about.
In unoptimized libcore, this reduces the number of "auto_deref" allocas
from 174 to 4.
This commit updates the LLVM submodule in use to the current HEAD of the LLVM
repository. This is primarily being done to start picking up unwinding support
for MSVC, which is currently unimplemented in the revision of LLVM we are using.
Along the way a few changes had to be made:
* As usual, lots of C++ debuginfo bindings in LLVM changed, so there were some
significant changes to our RustWrapper.cpp
* As usual, some pass management changed in LLVM, so clang was re-scrutinized to
ensure that we're doing the same thing as clang.
* Some optimization options are now passed directly into the
`PassManagerBuilder` instead of through CLI switches to LLVM.
* The `NoFramePointerElim` option was removed from LLVM, favoring instead the
`no-frame-pointer-elim` function attribute instead.
Additionally, LLVM has picked up some new optimizations which required fixing an
existing soundness hole in the IR we generate. It appears that the current LLVM
we use does not expose this hole. When an enum is moved, the previous slot in
memory is overwritten with a bit pattern corresponding to "dropped". When the
drop glue for this slot is run, however, the switch on the discriminant can
often start executing the `unreachable` block of the switch due to the
discriminant now being outside the normal range. This was patched over locally
for now by having the `unreachable` block just change to a `ret void`.
Unlike coercing from reference to unsafe pointer, coercing between two
unsafe pointers doesn't need an AutoDerefRef, because there is no region
that regionck would need to know about.
In unoptimized libcore, this reduces the number of "auto_deref" allocas
from 174 to 4.
The current codegen tests only compare IR line counts between similar
rust and C programs, the latter getting compiled with clang. That looked
like a good idea back then, but actually things like lifetime intrinsics
mean that less IR isn't always better, so the metric isn't really
helpful.
Instead, we can start doing tests that check specific aspects of the
generated IR, like attributes or metadata. To do that, we can use LLVM's
FileCheck tool which has a number of useful features for such tests.
To start off, I created some tests for a few things that were recently
added and/or broken.
This breaks code like:
struct Foo {
...
}
pub fn make_foo() -> Foo {
...
}
Change this code to:
pub struct Foo { // note `pub`
...
}
pub fn make_foo() -> Foo {
...
}
The `visible_private_types` lint has been removed, since it is now an
error to attempt to expose a private type in a public API. In its place
a `#[feature(visible_private_types)]` gate has been added.
Closes#16463.
RFC #48.
[breaking-change]
When there is only a single store to the ret slot that dominates the
load that gets the value for the "ret" instruction, we can elide the
ret slot and directly return the operand of the dominating store
instruction. This is the same thing that clang does, except for a
special case that doesn't seem to affect us.
Fixes#8238