Resurrect: rustc_target: Add alignment to indirectly-passed by-value types, correcting the alignment of byval on x86 in the process.
Same as #111551, which I [accidentally closed](https://github.com/rust-lang/rust/pull/111551#issuecomment-1571222612) :/
---
This resurrects PR #103830, which has sat idle for a while.
Beyond #103830, this also:
- fixes byval alignment for types containing vectors on Darwin (see `tests/codegen/align-byval-vector.rs`)
- fixes byval alignment for overaligned types on x86 Windows (see `tests/codegen/align-byval.rs`)
- fixes ABI for types with 128bit requested alignment on ARM64 Linux (see `tests/codegen/aarch64-struct-align-128.rs`)
r? `@nikic`
---
`@pcwalton's` original PR description is reproduced below:
Commit 88e4d2c from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.
The problem is summarized in [this comment] by `@eddyb.` Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.
As a side effect, this should fix#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.
[this comment]: #80822 (comment)
alignment of `byval` on x86 in the process.
Commit 88e4d2c291 from five years ago removed
support for alignment on indirectly-passed arguments because of problems with
the `i686-pc-windows-msvc` target. Unfortunately, the `memcpy` optimizations I
recently added to LLVM 16 depend on this to forward `memcpy`s. This commit
attempts to fix the problems with `byval` parameters on that target and now
correctly adds the `align` attribute.
The problem is summarized in [this comment] by @eddyb. Briefly, 32-bit x86 has
special alignment rules for `byval` parameters: for the most part, their
alignment is forced to 4. This is not well-documented anywhere but in the Clang
source. I looked at the logic in Clang `TargetInfo.cpp` and tried to replicate
it here. The relevant methods in that file are
`X86_32ABIInfo::getIndirectResult()` and
`X86_32ABIInfo::getTypeStackAlignInBytes()`. The `align` parameter attribute
for `byval` parameters in LLVM must match the platform ABI, or miscompilations
will occur. Note that this doesn't use the approach suggested by eddyb, because
I felt it was overkill to store the alignment in `on_stack` when special
handling is really only needed for 32-bit x86.
As a side effect, this should fix#80127, because it will make the `align`
parameter attribute for `byval` parameters match the platform ABI on LLVM
x86-64.
[this comment]: https://github.com/rust-lang/rust/pull/80822#issuecomment-829985417
Add Tests for native wasm exceptions
### Motivation
In PR #111322, I added support for native WASM exceptions. I was asked by ``@davidtwco`` to add some tests for it in a follow up PR, which seems like a very good idea.
This PR adds three tests for this feature:
* codegen: ensure the correct LLVM instructions are used
* assembly: ensure the correct WASM instructions are used
* run-make: ensure the exception handling works; the WASM code is run using a small nodejs script which demonstrates the exception handling
### Complications
There are a few changes beside adding the tests, which were necessary
* Tests for the wasm32-unknown-unknown target are (as far as I know) only run on `test-various`. Its docker image uses nodejs-15, which is very old. Experimental support for wasm-exceptions was added in nodejs16. In nodejs 18.12 (LTS), they are stable.
- --> increase nodejs to 18.12 in `test-various`
* codegen/assembly tests are not performed for the wasm32-unknown-unknown target yet
- --> add those to `test-various` as well
Due to the last point, some tests are run which have not run before (assembly+codegen tests for wasm32-unknown-unknown). I added `// ignore wasm32-bare` for those which failed
### Local testing
I run all tests locally using both `test-various` and `wasm32`. As far as I know, none of the other systems run any test for wasm32 targets.
Various impl trait in assoc tys cleanups
r? `@compiler-errors`
All commits except for the last are pure refactorings. 274dab5bd658c97886a8987340bf50ae57900c39 allows struct fields to participate in deciding whether a function has an opaque in its signature.
best reviewed commit by commit
[libs] Simplify `unchecked_{shl,shr}`
There's no need for the `const_eval_select` dance here. And while I originally wrote the `.try_into().unwrap_unchecked()` implementation here, it's kinda a mess in MIR -- this new one is substantially simpler, as shown by the old one being above the inlining threshold but the new one being below it in the `mir-opt/inline/unchecked_shifts` tests.
We don't need `u32::checked_shl` doing a dance through both `Result` *and* `Option` 🙃
There's no need for the `const_eval_select` dance here. And while I originally wrote the `.try_into().unwrap_unchecked()` implementation here, it's kinda a mess in MIR -- this new one is substantially simpler, as shown by the old one being above the inlining threshold but the new one being below it.
Fix codegen test suite for bare-metal-like targets
For Ferrocene I needed to run the test suite for custom target with no unwinding and static relocation. Running the tests uncovered ~20 failures due to the test suite not accounting for these options. This PR fixes them by:
* Fixing `CHECK`s to account for functions having extra LLVM IR attributes (in this case `nounwind`).
* Fixing `CHECK`s to account for the `dso_local` LLVM IR modifier, which is [added to every item when relocation is static](f3d597b31c/compiler/rustc_codegen_llvm/src/mono_item.rs (L139-L142)).
* Fixing `CHECK`s to account for missing `uwtables` attributes.
* Added the `needs-unwind` attributes for tests that are designed to check unwinding.
There is no part of Rust CI that checks this unfortunately, and testing whether the PR works locally is kinda hard because you need a target with std enabled but no unwinding and static relocations. Still, this works in my local testing, and if future PRs accidentally break this Ferrocene will take care of sending followup PRs.
Enable ScalarReplacementOfAggregates in optimized builds
Like MatchBranchSimplification, this pass is known to produce significant runtime improvements in Cranelift artifacts, and I believe based on the perf runs here that the primary effect of this pass is to empower MatchBranchSimplification. ScalarReplacementOfAggregates on its own has little effect on anything, but when this was rebased up to include https://github.com/rust-lang/rust/pull/112001 we started seeing significant and majority-positive results.
Based on the fact that we see most of the regressions in debug builds (https://github.com/rust-lang/rust/pull/112002#issuecomment-1566270144) and some rather significant ones in cycles and wall time, I'm only enabling this in optimized builds at the moment.
These tend to have special handling in a bunch of places anyway, so the variant helps remember that. And I think it's easier to grok than non-Scalar Aggregates sometimes being `Immediates` (like I got wrong and caused 109992). As a minor bonus, it means we don't need to generate poison LLVM values for them to pass around in `OperandValue::Immediate`s.
Only rewrite valtree-constants to patterns and keep other constants opaque
Now that we can reliably fall back to comparing constants with `PartialEq::eq` to the match scrutinee, we can
1. eagerly try to convert constants to valtrees
2. then deeply convert the valtree to a pattern
3. if the to-valtree conversion failed, create an "opaque constant" pattern.
This PR specifically avoids any behavioral changes or major cleanups. What we can now do as follow ups is
* move the two remaining call sites to `destructure_mir_constant` off that query
* make valtree to pattern conversion infallible
* this needs to be done after careful analysis of the effects. There may be user visible changes from that.
based on https://github.com/rust-lang/rust/pull/111768