rust/tests/codegen
bors 2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc
Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
2023-02-13 10:18:48 +00:00
..
auxiliary
avr abi: add AddressSpace field to Primitive::Pointer 2023-01-22 23:41:39 -05:00
dllimports Add more codegen tests 2023-01-17 16:23:22 +01:00
instrument-xray Test XRay only for supported targets 2023-02-09 12:29:43 +09:00
intrinsics Add more codegen tests 2023-01-17 16:23:22 +01:00
non-terminate
remap_path_prefix
riscv-abi Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
simd-intrinsic
src-hash-algorithm
unwind-abis
abi-efiapi.rs Stabilize abi_efiapi feature 2023-01-11 20:42:13 -05:00
abi-main-signature-16bit-c-int.rs
abi-main-signature-32bit-c-int.rs
abi-repr-ext.rs
abi-sysv64.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
abi-x86_64_sysv.rs
abi-x86-interrupt.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
adjustments.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
align-enum.rs
align-fn.rs
align-struct.rs
alloc-optimisation.rs
array-clone.rs
array-equality.rs
array-map.rs Allow canonicalizing the array::map loop in trusted cases 2023-02-04 16:44:51 -08:00
asm-clobber_abi.rs
asm-clobbers.rs
asm-may_unwind.rs
asm-multiple-options.rs
asm-options.rs
asm-powerpc-clobbers.rs
asm-sanitize-llvm.rs
asm-target-clobbers.rs
async-fn-debug-awaitee-field.rs
async-fn-debug-msvc.rs
async-fn-debug.rs
atomic-operations.rs
autovectorize-f32x4.rs Add another autovectorization codegen test using array zip-map 2023-02-04 16:44:53 -08:00
binary-search-index-no-bound-check.rs
bool-cmp.rs
box-maybe-uninit-llvm14.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
box-maybe-uninit.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
bpf-alu32.rs
branch-protection.rs
c-variadic-copy.rs
c-variadic-opt.rs
c-variadic.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
call-llvm-intrinsics.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
call-metadata.rs
catch-unwind.rs
cdylib-external-inline-fns.rs
cf-protection.rs
cfguard-checks.rs
cfguard-disabled.rs
cfguard-nochecks.rs
cfguard-non-msvc.rs
codemodels.rs
coercions.rs
cold-call-declare-and-call.rs
comparison-operators-newtype.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
consts.rs
dealloc-no-unwind.rs
debug-alignment.rs
debug-column-msvc.rs
debug-column.rs
debug-compile-unit-path.rs
debug-linkage-name.rs
debug-vtable.rs Don't merge vtables when full debuginfo is enabled. 2023-01-27 15:29:04 +00:00
debuginfo-generic-closure-env-names.rs
deduced-param-attrs.rs
default-requires-uwtable.rs
drop.rs
dst-vtable-align-nonzero.rs
dst-vtable-size-range.rs
enum-bounds-check-derived-idx.rs
enum-bounds-check-issue-13926.rs
enum-bounds-check-issue-82871.rs
enum-bounds-check.rs
enum-debug-clike.rs
enum-debug-niche-2.rs
enum-debug-niche.rs
enum-debug-tagged.rs
enum-discriminant-value.rs
enum-match.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
export-no-mangle.rs
external-no-mangle-fns.rs
external-no-mangle-statics.rs
fastcall-inreg.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
fatptr.rs
fewer-names.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
ffi-const.rs
ffi-out-of-bounds-loads.rs
ffi-pure.rs
ffi-returns-twice.rs
float_math.rs
fn-impl-trait-self.rs
foo.s
force-frame-pointers.rs
force-no-unwind-tables.rs
force-unwind-tables.rs
frame-pointer.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
function-arguments-noopt.rs make PointerKind directly reflect pointer types 2023-02-06 11:46:32 +01:00
function-arguments.rs also do not add noalias on not-Unpin Box 2023-02-06 12:17:41 +01:00
gdb_debug_script_load.rs
generator-debug-msvc.rs
generator-debug.rs
generic-debug.rs
global_asm_include.rs
global_asm_x2.rs
global_asm.rs
i686-macosx-deployment-target.rs
i686-no-macosx-deployment-target.rs
inline-always-works-always.rs
inline-debuginfo.rs
inline-hint.rs
instrument-coverage.rs
instrument-mcount.rs
integer-cmp.rs
integer-overflow.rs
internalize-closures.rs
intrinsic-no-unnamed-attr.rs
issue-13018.rs
issue-15953.rs
issue-27130.rs
issue-32031.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
issue-32364.rs
issue-34634.rs
issue-34947-pow-i32.rs
issue-37945.rs
issue-44056-macos-tls-align.rs
issue-45222.rs
issue-45466.rs
issue-45964-bounds-check-slice-pos.rs replace manual ptr arithmetic with ptr_sub 2023-01-15 17:38:05 +01:00
issue-47278.rs
issue-47442.rs
issue-56267-2.rs
issue-56267.rs
issue-56927.rs
issue-58881.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
issue-59352.rs
issue-69101-bounds-check.rs
issue-73031.rs
issue-73338-effecient-cmp.rs
issue-73396-bounds-check-after-position.rs
issue-73827-bounds-check-index-in-subexpr.rs
issue-75525-bounds-checks.rs
issue-75546.rs
issue-75659.rs Support true and false as boolean flag params 2023-01-18 20:46:36 +01:00
issue-77812.rs
issue-81408-dllimport-thinlto-windows.rs
issue-84268.rs
issue-85872-multiple-reverse.rs
issue-86106.rs
issue-96274.rs
issue-96497-slice-size-nowrap.rs bump failing assembly & codegen tests from LLVM 14 to LLVM 15 2023-01-17 20:02:01 +01:00
issue-98156-const-arg-temp-lifetime.rs
issue-98294-get-mut-copy-from-slice-opt.rs
issue-103285-ptr-addr-overflow-check.rs
issue-103840.rs
issue-105386-ub-in-debuginfo.rs
iter-repeat-n-trivial-drop.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
layout-size-checks.rs
lifetime_start_end.rs
link_section.rs
link-dead-code.rs
loads.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
local-generics-in-exe-internalized.rs
lto-removes-invokes.rs
mainsubprogram.rs
mainsubprogramstart.rs
match-optimized.rs
match-optimizes-away.rs
match-unoptimized.rs
mem-replace-direct-memcpy.rs
merge-functions.rs
mir_zst_stores.rs
mir-inlined-line-numbers.rs
move-operands.rs Add a regression test for argument copies with DestinationPropagation 2023-01-11 10:27:06 -05:00
naked-functions.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
naked-nocoverage.rs
naked-noinline.rs
no-assumes-on-casts.rs
no-dllimport-w-cross-lang-lto.rs
no-jump-tables.rs
no-plt.rs
noalias-box-off.rs
noalias-box.rs
noalias-flag.rs
noalias-refcell.rs
noalias-rwlockreadguard.rs
noalias-unpin.rs
noreturn-uninhabited.rs
noreturnflag.rs
nounwind.rs
nrvo.rs
optimize-attr-1.rs
option-nonzero-eq.rs Implement SpecOptionPartialEq for cmp::Ordering 2023-01-18 19:19:28 -08:00
packed.rs
panic-abort-windows.rs
panic-in-drop-abort.rs
panic-unwind-default-uwtable.rs
personality_lifetimes.rs
pgo-counter-bias.rs
pgo-instrumentation.rs
pic-relocation-model.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
pie-relocation-model.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
README.md
refs.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
repeat-trusted-len.rs
repr-transparent-aggregates-1.rs
repr-transparent-aggregates-2.rs
repr-transparent-aggregates-3.rs
repr-transparent-sysv64.rs
repr-transparent.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
sanitizer_memtag_attr_check.rs
sanitizer_scs_attr_check.rs
sanitizer-cfi-add-canonical-jump-tables-flag.rs
sanitizer-cfi-emit-type-checks.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
sanitizer-cfi-emit-type-metadata-id-itanium-cxx-abi.rs
sanitizer-cfi-emit-type-metadata-itanium-cxx-abi.rs
sanitizer-kcfi-add-kcfi-flag.rs
sanitizer-kcfi-emit-kcfi-operand-bundle-itanium-cxx-abi.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
sanitizer-memory-track-orgins.rs
sanitizer-no-sanitize-inlining.rs
sanitizer-no-sanitize.rs
sanitizer-recover.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
scalar-pair-bool.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
set-discriminant-invalid.rs
simd_arith_offset.rs
simd-wide-sum.rs
slice_as_from_ptr_range.rs
slice-as_chunks.rs
slice-init.rs
slice-iter-len-eq-zero.rs
slice-position-bounds-check.rs
slice-ref-equality.rs
slice-reverse.rs
slice-windows-no-bounds-check.rs
some-abis-do-extend-params-to-32-bits.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
some-global-nonnull.rs
sparc-struct-abi.rs
sse42-implies-crc32.rs
stack-probes-call.rs
stack-probes-inline.rs
stack-protector.rs
static-relocation-model-msvc.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
staticlib-external-inline-fns.rs
stores.rs
swap-large-types.rs
swap-simd-types.rs
swap-small-types.rs
target-cpu-on-functions.rs
target-feature-overrides.rs
thread-local.rs
to_vec.rs
transmute-scalar.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
try_identity.rs
try_question_mark_nop.rs
tune-cpu-on-functions.rs
tuple-layout-opt.rs Add more codegen tests 2023-01-17 16:23:22 +01:00
unchecked_shifts.rs
unchecked-float-casts.rs
uninit-consts.rs
union-abi.rs
unpadded-simd.rs
unwind-and-panic-abort.rs
unwind-extern-exports.rs
unwind-extern-imports.rs
used_with_arg.rs
var-names.rs Put noundef on all scalars that don't allow uninit 2023-01-17 08:14:35 +01:00
vec-calloc-llvm14.rs
vec-calloc.rs Auto merge of #106989 - clubby789:is-zero-num, r=scottmcm 2023-01-19 08:04:26 +00:00
vec-in-place.rs
vec-iter-collect-len.rs
vec-optimizes-away.rs
vec-shrink-panik.rs
vecdeque_no_panic.rs
virtual-function-elimination-32bit.rs
virtual-function-elimination.rs
wasm_casts_trapping.rs
x86_64-macosx-deployment-target.rs
x86_64-no-macosx-deployment-target.rs
zip.rs
zst-offset.rs Add more codegen tests 2023-01-17 16:23:22 +01:00

The files here use the LLVM FileCheck framework, documented at https://llvm.org/docs/CommandGuide/FileCheck.html.

One extension worth noting is the use of revisions as custom prefixes for FileCheck. If your codegen test has different behavior based on the chosen target or different compiler flags that you want to exercise, you can use a revisions annotation, like so:

// revisions: aaa bbb
// [bbb] compile-flags: --flags-for-bbb

After specifying those variations, you can write different expected, or explicitly unexpected output by using <prefix>-SAME: and <prefix>-NOT:, like so:

// CHECK: expected code
// aaa-SAME: emitted-only-for-aaa
// aaa-NOT:                        emitted-only-for-bbb
// bbb-NOT:  emitted-only-for-aaa
// bbb-SAME:                       emitted-only-for-bbb