simd_shuffle intrinsic: allow argument to be passed as vector
See https://github.com/rust-lang/rust/issues/128738 for context.
I'd like to get rid of [this hack](6c0b89dfac/compiler/rustc_codegen_ssa/src/mir/block.rs (L922-L935)). https://github.com/rust-lang/rust/pull/128537 almost lets us do that since constant SIMD vectors will then be passed as immediate arguments. However, simd_shuffle for some reason actually takes an *array* as argument, not a vector, so the hack is still required to ensure that the array becomes an immediate (which then later stages of codegen convert into a vector, as that's what LLVM needs).
This PR prepares simd_shuffle to also support a vector as the `idx` argument. Once this lands, stdarch can hopefully be updated to pass `idx` as a vector, and then support for arrays can be removed, which finally lets us get rid of that hack.
Shrink `TyKind::FnPtr`.
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and `FnHeader`, which can be packed more efficiently. This reduces the size of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms. This reduces peak memory usage by a few percent on some benchmarks. It also reduces cache misses and page faults similarly, though this doesn't translate to clear cycles or wall-time improvements on CI.
r? `@compiler-errors`
nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
I rarely need a debug build of cg_clif, and even if I actually need one,
using package overrides in Cargo.toml to only do a debug build for the
rustc_codegen_cranelift crate works much better.
By splitting the `FnSig` within `TyKind::FnPtr` into `FnSigTys` and
`FnHeader`, which can be packed more efficiently. This reduces the size
of the hot `TyKind` type from 32 bytes to 24 bytes on 64-bit platforms.
This reduces peak memory usage by a few percent on some benchmarks. It
also reduces cache misses and page faults similarly, though this doesn't
translate to clear cycles or wall-time improvements on CI.