Use `load`+`store` instead of `memcpy` for small integer arrays
I was inspired by #98892 to see whether, rather than making `mem::swap` do something smart in the library, we could update MIR assignments like `*_1 = *_2` to do something smarter than `memcpy` for sufficiently-small types that doing it inline is going to be better than a `memcpy` call in assembly anyway. After all, special code may help `mem::swap`, but if the "obvious" MIR can just result in the correct thing that helps everything -- other code like `mem::replace`, people doing it manually, and just passing around by value in general -- as well as makes MIR inlining happier since it doesn't need to deal with all the complicated library code if it just sees a couple assignments.
LLVM will turn the short, known-length `memcpy`s into direct instructions in the backend, but that's too late for it to be able to remove `alloca`s. In general, replacing `memcpy`s with typed instructions is hard in the middle-end -- even for `memcpy.inline` where it knows it won't be a function call -- is hard [due to poison propagation issues](https://rust-lang.zulipchat.com/#narrow/stream/187780-t-compiler.2Fwg-llvm/topic/memcpy.20vs.20load-store.20for.20MIR.20assignments/near/360376712). So because we know more about the type invariants -- these are typed copies -- rustc can emit something more specific, allowing LLVM to `mem2reg` away the `alloca`s in some situations.
#52051 previously did something like this in the library for `mem::swap`, but it ended up regressing during enabling mir inlining (cbbf06b0cd), so this has been suboptimal on stable for ≈5 releases now.
The code in this PR is narrowly targeted at just integer arrays in LLVM, but works via a new method on the [`LayoutTypeMethods`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/trait.LayoutTypeMethods.html) trait, so specific backends based on cg_ssa can enable this for more situations over time, as we find them. I don't want to try to bite off too much in this PR, though. (Transparent newtypes and simple things like the 3×usize `String` would be obvious candidates for a follow-up.)
Codegen demonstrations: <https://llvm.godbolt.org/z/fK8hT9aqv>
Before:
```llvm
define void `@swap_rgb48_old(ptr` noalias nocapture noundef align 2 dereferenceable(6) %x, ptr noalias nocapture noundef align 2 dereferenceable(6) %y) unnamed_addr #1 {
%a.i = alloca [3 x i16], align 2
call void `@llvm.lifetime.start.p0(i64` 6, ptr nonnull %a.i)
call void `@llvm.memcpy.p0.p0.i64(ptr` noundef nonnull align 2 dereferenceable(6) %a.i, ptr noundef nonnull align 2 dereferenceable(6) %x, i64 6, i1 false)
tail call void `@llvm.memcpy.p0.p0.i64(ptr` noundef nonnull align 2 dereferenceable(6) %x, ptr noundef nonnull align 2 dereferenceable(6) %y, i64 6, i1 false)
call void `@llvm.memcpy.p0.p0.i64(ptr` noundef nonnull align 2 dereferenceable(6) %y, ptr noundef nonnull align 2 dereferenceable(6) %a.i, i64 6, i1 false)
call void `@llvm.lifetime.end.p0(i64` 6, ptr nonnull %a.i)
ret void
}
```
Note it going to stack:
```nasm
swap_rgb48_old: # `@swap_rgb48_old`
movzx eax, word ptr [rdi + 4]
mov word ptr [rsp - 4], ax
mov eax, dword ptr [rdi]
mov dword ptr [rsp - 8], eax
movzx eax, word ptr [rsi + 4]
mov word ptr [rdi + 4], ax
mov eax, dword ptr [rsi]
mov dword ptr [rdi], eax
movzx eax, word ptr [rsp - 4]
mov word ptr [rsi + 4], ax
mov eax, dword ptr [rsp - 8]
mov dword ptr [rsi], eax
ret
```
Now:
```llvm
define void `@swap_rgb48(ptr` noalias nocapture noundef align 2 dereferenceable(6) %x, ptr noalias nocapture noundef align 2 dereferenceable(6) %y) unnamed_addr #0 {
start:
%0 = load <3 x i16>, ptr %x, align 2
%1 = load <3 x i16>, ptr %y, align 2
store <3 x i16> %1, ptr %x, align 2
store <3 x i16> %0, ptr %y, align 2
ret void
}
```
still lowers to `dword`+`word` operations, but has no stack traffic:
```nasm
swap_rgb48: # `@swap_rgb48`
mov eax, dword ptr [rdi]
movzx ecx, word ptr [rdi + 4]
movzx edx, word ptr [rsi + 4]
mov r8d, dword ptr [rsi]
mov dword ptr [rdi], r8d
mov word ptr [rdi + 4], dx
mov word ptr [rsi + 4], cx
mov dword ptr [rsi], eax
ret
```
And as a demonstration that this isn't just `mem::swap`, a `mem::replace` on a small array (since replace doesn't use swap since #83022), which used to be `memcpy`s in LLVM changes in IR
```llvm
define void `@replace_short_array(ptr` noalias nocapture noundef sret([3 x i32]) dereferenceable(12) %0, ptr noalias noundef align 4 dereferenceable(12) %r, ptr noalias nocapture noundef readonly dereferenceable(12) %v) unnamed_addr #0 {
start:
%1 = load <3 x i32>, ptr %r, align 4
store <3 x i32> %1, ptr %0, align 4
%2 = load <3 x i32>, ptr %v, align 4
store <3 x i32> %2, ptr %r, align 4
ret void
}
```
but that lowers to reasonable `dword`+`qword` instructions still
```nasm
replace_short_array: # `@replace_short_array`
mov rax, rdi
mov rcx, qword ptr [rsi]
mov edi, dword ptr [rsi + 8]
mov dword ptr [rax + 8], edi
mov qword ptr [rax], rcx
mov rcx, qword ptr [rdx]
mov edx, dword ptr [rdx + 8]
mov dword ptr [rsi + 8], edx
mov qword ptr [rsi], rcx
ret
```
Rollup of 6 pull requests
Successful merges:
- #112081 (Avoid ICE on `#![doc(test(...)]` with literal parameter)
- #112196 (Resolve vars in result from `scrape_region_constraints`)
- #112303 (Normalize in infcx instead of globally for `Option::as_deref` suggestion)
- #112316 (Ensure space is inserted after keyword in `unused_delims`)
- #112318 (Merge method, type and const object safety checks)
- #112322 (Don't mention `IMPLIED_BOUNDS_ENTAILMENT` if signatures reference error)
Failed merges:
- #112251 (rustdoc: convert `if let Some()` that always matches to variable)
r? `@ghost`
`@rustbot` modify labels: rollup
Merge method, type and const object safety checks
cc `@spastorino` and `@compiler-errors` on the first commit. I believe it to be correct, as the field is only `Some` for assoc types, so just checking the field without checking the assoc kind to be `Type` is fine.
The second commit avoids going through all associated items thrice and just goes over all of them once, running the object safety checks per assoc item kind.
Normalize in infcx instead of globally for `Option::as_deref` suggestion
fixes#112293
The projection may contain inference variables. These inference variables are local to the local inference context. Using `tcx.normalize_erasing_regions` doesn't work here because this method is global and does not have access to the inference context. It's therefore unable to deal with the inference variables. We normalize in the local inference context instead, which knowns about the inference variables.
The test looks a little different than the issue example, I made it more minimal and verified that it still ICEs on nightly.
Also contains a drive-by fix to properly compare the types.
r? `@compiler-errors`
Resolve vars in result from `scrape_region_constraints`
Since we perform `type_op::Normalize` in the local infcx when the new solver is enabled, vars aren't necessarily resolved, which triggers this ICE:
f85ab544df/compiler/rustc_infer/src/infer/nll_relate/mod.rs (L481)
There are more tests that go from ICE -> pass due to this change, but I just added revisions to a few for CI.
r? `@lcnr`
Group rfcs tests
This moves all RFC tests to `tests/ui/rfcs/rfc-NNNN-title-title-title/...`
I had to rename some tests due to conflicts, but otherwise this is just a move.
Ignore fluent message reordering in `git blame`
This commit reordered most of our Fluent message files. Since `git blame` can be useful in tracking mistakes made while adapting to translatable diagnostics, ignore this commit in `blame`.
r? `@jyn514`
Remove ExtendElement, ExtendWith, extend_with
Related to #104624, broken up into two commits. The first removes wrapper trait ExtendWith and its only implementer struct ExtendElement. The second may have perf issues so may be reverted/removed if no alternate fix is found; it should be profiled.
r? `@scottmcm`
Remove unneeded `Buffer` allocations when `&mut fmt::Write` can be used directly
With the recent changes, `wrap_item` can now directly take `&mut Write`, which makes some `Buffer` creations unneeded.
r? `@notriddle`
Makes the Python pretty printer library source'able from within
GDB after spawn.
This makes `rust-gdb` not the required approach. It also provides the possibility
for GUI:s that wrap GDB, to debug Rust using the pretty printer library; as well
as other projects such as for instance Pernosco, which previously was not possible.
This won't introduce any new unexpected behaviors for users of `rust-gdb`