The major change here is in the tiny commit at the end and makes it so that we no longer emit lifetime intrinsics for allocas for function arguments. They are live for the whole function anyway, so the intrinsics add no value. This makes the resulting IR more clear, and reduces the peak memory usage and LLVM times by about 1-4%, depending on the crate.
The remaining changes are just preparatory cleanups and fixes for missing lifetime intrinsics.
Combining them seemed like a good idea at the time, but turns out that
handling lifetimes separately makes it somewhat easier to handle cases
where we don't want the intrinsics, and let's you see more easily where
the start/end pairs are.
The functions is useful for all kinds of fat pointers, but get_len()
just feels so wrong for trait object fat pointers. Let's use get_meta()
because that's rather neutral.
Instead of the actual return type, we're currently passing the function
type to get_extern_fn(). The only reason this doesn't explode is because
get_extern_fn() actually doesn't care about the actual return type, just
about it being converging or not.
This is trickier than it sounds (because the DST code was written
assuming that one could divide the sized and unsized portions of a
type strictly into a sized prefix and unsized suffix, when it reality
it is more like a sized prefix and sized suffix that surround the
single unsized field).
I chose to put in a pretty hack-ish approach to this because
drop-flags are scheduled to go away anyway, so its not really worth
the effort to to make an infrastructure that sounds as general as the
previous paragraph indicates.
Also, I have written notes of other fixes that need to be made to
really finish fixing #27023, namely more work needs to be done to
account for alignment when computing the size of a value.
Added code to maintain these hints at runtime, and to conditionalize
drop-filling and calls to destructors.
In this early stage, we are using hints, so we are always free to
leave out a flag for a path -- then we just pass `None` as the
dropflag hint in the corresponding schedule cleanup call. But, once a
path has a hint, we must at least maintain it: i.e. if the hint
exists, we must ensure it is never set to "moved" if the data in
question might actually have been initialized. It remains sound to
conservatively set the hint to "initialized" as long as the true
drop-flag embedded in the value itself is up-to-date.
----
Here are some high-level details I want to point out:
* We maintain the hint in Lvalue::post_store, marking the lvalue as
moved. (But also continue drop-filling if necessary.)
* We update the hint on ExprAssign.
* We pass along the hint in once closures that capture-by-move.
* You only call `drop_ty` for state that does not have an associated hint.
If you have a hint, you must call `drop_ty_core` instead.
(Originally I passed the hint into `drop_ty` as well, to make the
connection to a hint more apparent, but the vast majority of
current calls to `drop_ty` are in contexts where no hint is
available, so it just seemed like noise in the resulting diff.)
This PR was originally going to be a "let's start running tests on MSVC" PR, but it didn't quite get to that point. It instead gets us ~80% of the way there! The steps taken in this PR are:
* Landing pads are turned on by default for 64-bit MSVC. The LLVM support is "good enough" with the caveat the destructor glue is now marked noinline. This was recommended [on the associated bug](https://llvm.org/bugs/show_bug.cgi?id=23884) as a stopgap until LLVM has a better representation for exception handling in MSVC. The consequence of this is that MSVC will have a bit of a perf hit, but there are possible routes we can take if this workaround sticks around for too long.
* The linker (`link.exe`) is now looked up in the Windows Registry if it's not otherwise available in the environment. This improves using the compiler outside of a VS shell (e.g. in a MSYS shell or in a vanilla cmd.exe shell). This also makes cross compiles via Cargo "just work" when crossing between 32 and 64 bit!
* TLS destructors were fixed to start running on MSVC (they previously weren't running at all)
* A few assorted `run-pass` tests were fixed.
* The dependency on the `rust_builtin` library was removed entirely for MSVC to try to prevent any `cl.exe` compiled objects get into the standard library. This should help us later remove any dependence on the CRT by the standard library.
* I re-added `rust_try_msvc_32.ll` for 32-bit MSVC and ensured that landing pads were turned off by default there as well.
Despite landing pads being enabled, there are still *many* failing tests on MSVC. The two major classes I've identified so far are:
* Spurious aborts. It appears that when optimizations are enabled that landing pads aren't always lined up properly, and sometimes an exception being thrown can't find the catch block down the stack, causing the program to abort. I've been working to reduce this test case but haven't been met with great success just yet.
* Parallel codegen does not work on MSVC. Our current strategy is to take the N object files emitted by the N codegen threads and use `ld -r` to assemble them into *one* object file. The MSVC linker, however, does not have this ability, and this will need to be rearchitected to work on MSVC.
I will fix parallel codegen in a future PR, and I'll also be watching LLVM closely to see if the aborts... disappear!
This commit turns on landing pads for MSVC by default, which means that we'll
now be running cleanups for values on the stack when an exception is thrown.
This commit "fixes" the previously seen LLVM abort by attaching the `noinline`
attribute to all generated drop glue to prevent landing pads from being inlined
into other landing pads.
The performance of MSVC is highly likely to decrease from this commit, but there
are various routes we can taken in the future if this ends up staying for quite
a while, such as generating a shim function only called from landing pads which
calls the actual drop glue, and this shim is marked noinline.
For now, however, this patch enables MSVC to successfully bootstrap itself!
This has a number of advantages compared to creating a copy in memory
and passing a pointer. The obvious one is that we don't have to put the
data into memory but can keep it in registers. Since we're currently
passing a pointer anyway (instead of using e.g. a known offset on the
stack, which is what the `byval` attribute would achieve), we only use a
single additional register for each fat pointer, but save at least two
pointers worth of stack in exchange (sometimes more because more than
one copy gets eliminated). On archs that pass arguments on the stack, we
save a pointer worth of stack even without considering the omitted
copies.
Additionally, LLVM can optimize the code a lot better, to a large degree
due to the fact that lots of copies are gone or can be optimized away.
Additionally, we can now emit attributes like nonnull on the data and/or
vtable pointers contained in the fat pointer, potentially allowing for
even more optimizations.
This results in LLVM passes being about 3-7% faster (depending on the
crate), and the resulting code is also a few percent smaller, for
example:
text data filename
5671479 3941461 before/librustc-d8ace771.so
5447663 3905745 after/librustc-d8ace771.so
1944425 2394024 before/libstd-d8ace771.so
1896769 2387610 after/libstd-d8ace771.so
I had to remove a call in the backtrace-debuginfo test, because LLVM can
now merge the tails of some blocks when optimizations are turned on,
which can't correctly preserve line info.
Fixes#22924
Cc #22891 (at least for fat pointers the code is good now)
* segfault due to not copying drop flag when coercing
* fat pointer casts
* segfault due to not checking drop flag properly
* debuginfo for DST smart pointers
* unreachable code in drop glue
This addresses to-do in my code, and simplifies this method a lot to boot.
(The necessary enum dispatch has now effectively been shifted entirely
into the scheduled cleanup code for the enum contents.)
without invoking the Drop::drop implementation.
This is necessary for dealing with an enum that switches own `self` to
a different variant while running its destructor.
Fix#23611.
We provide tools to tell what exact symbols to emit for any fn or static, but
don’t quite check if that won’t cause any issues later on. Some of the issues
include LLVM mangling our names again and our names pointing to wrong locations,
us generating dumb foreign call wrappers, linker errors, extern functions
resolving to different symbols altogether (extern {fn fail();} fail(); in some
cases calling fail1()), etc.
Before the commit we had a function called note_unique_llvm_symbol, so it is
clear somebody was aware of the issue at some point, but the function was barely
used, mostly in irrelevant locations.
Along with working on it I took liberty to start refactoring trans/base into
a few smaller modules. The refactoring is incomplete and I hope I will find some
motivation to carry on with it.
This is possibly a [breaking-change] because it makes dumbly written code
properly invalid.
Refactored code so that the drop-flag values for initialized
(`DTOR_NEEDED`) versus dropped (`DTOR_DONE`) are given explicit names.
Add `mem::dropped()` (which with `DTOR_DONE == 0` is semantically the
same as `mem::zeroed`, but the point is that it abstracts away from
the particular choice of value for `DTOR_DONE`).
Filling-drop needs to use something other than `ptr::read_and_zero`,
so I added such a function: `ptr::read_and_drop`. But, libraries
should not use it if they can otherwise avoid it.
Fixes to tests to accommodate filling-drop.