Avoid using the `copy_nonoverlapping` wrapper through `mem::replace`.
This is a much simpler way to achieve the pre-#86003 behavior of `mem::replace` not needing dynamically-sized `memcpy`s (at least before inlining), than re-doing #81238 (which needs #86699 or something similar).
I didn't notice it until recently, but `ptr::write` already explicitly avoided using the wrapper, while `ptr::read` just called the wrapper (and was the reason for us observing any behavior change from #86003 in Rust-GPU).
<hr/>
The codegen test I've added fails without the change to `core::ptr::read` like this (ignore the `v0` mangling, I was using a worktree with it turned on by default, for this):
```llvm
13: ; core::intrinsics::copy_nonoverlapping::<u8>
14: ; Function Attrs: inlinehint nonlazybind uwtable
15: define internal void `@_RINvNtCscK5tvALCJol_4core10intrinsics19copy_nonoverlappinghECsaS4X3EinRE8_25mem_replace_direct_memcpy(i8*` %src, i8* %dst, i64 %count) unnamed_addr #0 {
16: start:
17: %0 = mul i64 %count, 1
18: call void `@llvm.memcpy.p0i8.p0i8.i64(i8*` align 1 %dst, i8* align 1 %src, i64 %0, i1 false)
not:17 !~~~~~~~~~~~~~~~~~~~~~ error: no match expected
19: ret void
20: }
```
With the `core::ptr::read` change, `core::intrinsics::copy_nonoverlapping` doesn't get instantiated and the test passes.
<hr/>
r? `@m-ou-se` cc `@nagisa` (codegen test) `@oli-obk` / `@RalfJung` (miri diagnostic changes)