rust/codegen at bd176ee591cd391835bfbcb3409a743bac2128ca - rust

History

bors 2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc

Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)

2023-02-13 10:18:48 +00:00

auxiliary

…

avr

abi: add AddressSpace field to Primitive::Pointer

2023-01-22 23:41:39 -05:00

dllimports

Add more codegen tests

2023-01-17 16:23:22 +01:00

instrument-xray

Test XRay only for supported targets

2023-02-09 12:29:43 +09:00

intrinsics

Add more codegen tests

2023-01-17 16:23:22 +01:00

non-terminate

…

remap_path_prefix

…

riscv-abi

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

simd-intrinsic

…

src-hash-algorithm

…

unwind-abis

…

abi-efiapi.rs

Stabilize abi_efiapi feature

2023-01-11 20:42:13 -05:00

abi-main-signature-16bit-c-int.rs

…

abi-main-signature-32bit-c-int.rs

…

abi-repr-ext.rs

…

abi-sysv64.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

abi-x86_64_sysv.rs

…

abi-x86-interrupt.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

adjustments.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

align-enum.rs

…

align-fn.rs

…

align-struct.rs

…

alloc-optimisation.rs

…

array-clone.rs

…

array-equality.rs

…

array-map.rs

Allow canonicalizing the array::map loop in trusted cases

2023-02-04 16:44:51 -08:00

asm-clobber_abi.rs

…

asm-clobbers.rs

…

asm-may_unwind.rs

…

asm-multiple-options.rs

…

asm-options.rs

…

asm-powerpc-clobbers.rs

…

asm-sanitize-llvm.rs

…

asm-target-clobbers.rs

…

async-fn-debug-awaitee-field.rs

…

async-fn-debug-msvc.rs

…

async-fn-debug.rs

…

atomic-operations.rs

…

autovectorize-f32x4.rs

Add another autovectorization codegen test using array zip-map

2023-02-04 16:44:53 -08:00

binary-search-index-no-bound-check.rs

…

bool-cmp.rs

…

box-maybe-uninit-llvm14.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

box-maybe-uninit.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

bpf-alu32.rs

…

branch-protection.rs

…

c-variadic-copy.rs

…

c-variadic-opt.rs

…

c-variadic.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

call-llvm-intrinsics.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

call-metadata.rs

…

catch-unwind.rs

…

cdylib-external-inline-fns.rs

…

cf-protection.rs

…

cfguard-checks.rs

…

cfguard-disabled.rs

…

cfguard-nochecks.rs

…

cfguard-non-msvc.rs

…

codemodels.rs

…

coercions.rs

…

cold-call-declare-and-call.rs

…

comparison-operators-newtype.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

consts.rs

…

dealloc-no-unwind.rs

…

debug-alignment.rs

…

debug-column-msvc.rs

…

debug-column.rs

…

debug-compile-unit-path.rs

…

debug-linkage-name.rs

…

debug-vtable.rs

Don't merge vtables when full debuginfo is enabled.

2023-01-27 15:29:04 +00:00

debuginfo-generic-closure-env-names.rs

…

deduced-param-attrs.rs

…

default-requires-uwtable.rs

…

drop.rs

…

dst-vtable-align-nonzero.rs

…

dst-vtable-size-range.rs

…

enum-bounds-check-derived-idx.rs

…

enum-bounds-check-issue-13926.rs

…

enum-bounds-check-issue-82871.rs

…

enum-bounds-check.rs

…

enum-debug-clike.rs

…

enum-debug-niche-2.rs

…

enum-debug-niche.rs

…

enum-debug-tagged.rs

…

enum-discriminant-value.rs

…

enum-match.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

export-no-mangle.rs

…

external-no-mangle-fns.rs

…

external-no-mangle-statics.rs

…

fastcall-inreg.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

fatptr.rs

…

fewer-names.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

ffi-const.rs

…

ffi-out-of-bounds-loads.rs

…

ffi-pure.rs

…

ffi-returns-twice.rs

…

float_math.rs

…

fn-impl-trait-self.rs

…

foo.s

…

force-frame-pointers.rs

…

force-no-unwind-tables.rs

…

force-unwind-tables.rs

…

frame-pointer.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

function-arguments-noopt.rs

make PointerKind directly reflect pointer types

2023-02-06 11:46:32 +01:00

function-arguments.rs

also do not add noalias on not-Unpin Box

2023-02-06 12:17:41 +01:00

gdb_debug_script_load.rs

…

generator-debug-msvc.rs

…

generator-debug.rs

…

generic-debug.rs

…

global_asm_include.rs

…

global_asm_x2.rs

…

global_asm.rs

…

i686-macosx-deployment-target.rs

…

i686-no-macosx-deployment-target.rs

…

inline-always-works-always.rs

…

inline-debuginfo.rs

…

inline-hint.rs

…

instrument-coverage.rs

…

instrument-mcount.rs

…

integer-cmp.rs

…

integer-overflow.rs

…

internalize-closures.rs

…

intrinsic-no-unnamed-attr.rs

…

issue-13018.rs

…

issue-15953.rs

…

issue-27130.rs

…

issue-32031.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

issue-32364.rs

…

issue-34634.rs

…

issue-34947-pow-i32.rs

…

issue-37945.rs

…

issue-44056-macos-tls-align.rs

…

issue-45222.rs

…

issue-45466.rs

…

issue-45964-bounds-check-slice-pos.rs

replace manual ptr arithmetic with ptr_sub

2023-01-15 17:38:05 +01:00

issue-47278.rs

…

issue-47442.rs

…

issue-56267-2.rs

…

issue-56267.rs

…

issue-56927.rs

…

issue-58881.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

issue-59352.rs

…

issue-69101-bounds-check.rs

…

issue-73031.rs

…

issue-73338-effecient-cmp.rs

…

issue-73396-bounds-check-after-position.rs

…

issue-73827-bounds-check-index-in-subexpr.rs

…

issue-75525-bounds-checks.rs

…

issue-75546.rs

…

issue-75659.rs

Support true and false as boolean flag params

2023-01-18 20:46:36 +01:00

issue-77812.rs

…

issue-81408-dllimport-thinlto-windows.rs

…

issue-84268.rs

…

issue-85872-multiple-reverse.rs

…

issue-86106.rs

…

issue-96274.rs

…

issue-96497-slice-size-nowrap.rs

bump failing assembly & codegen tests from LLVM 14 to LLVM 15

2023-01-17 20:02:01 +01:00

issue-98156-const-arg-temp-lifetime.rs

…

issue-98294-get-mut-copy-from-slice-opt.rs

…

issue-103285-ptr-addr-overflow-check.rs

…

issue-103840.rs

…

issue-105386-ub-in-debuginfo.rs

…

iter-repeat-n-trivial-drop.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

layout-size-checks.rs

…

lifetime_start_end.rs

…

link_section.rs

…

link-dead-code.rs

…

loads.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

local-generics-in-exe-internalized.rs

…

lto-removes-invokes.rs

…

mainsubprogram.rs

…

mainsubprogramstart.rs

…

match-optimized.rs

…

match-optimizes-away.rs

…

match-unoptimized.rs

…

mem-replace-direct-memcpy.rs

…

merge-functions.rs

…

mir_zst_stores.rs

…

mir-inlined-line-numbers.rs

…

move-operands.rs

Add a regression test for argument copies with DestinationPropagation

2023-01-11 10:27:06 -05:00

naked-functions.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

naked-nocoverage.rs

…

naked-noinline.rs

…

no-assumes-on-casts.rs

…

no-dllimport-w-cross-lang-lto.rs

…

no-jump-tables.rs

…

no-plt.rs

…

noalias-box-off.rs

…

noalias-box.rs

…

noalias-flag.rs

…

noalias-refcell.rs

…

noalias-rwlockreadguard.rs

…

noalias-unpin.rs

…

noreturn-uninhabited.rs

…

noreturnflag.rs

…

nounwind.rs

…

nrvo.rs

…

optimize-attr-1.rs

…

option-nonzero-eq.rs

Implement SpecOptionPartialEq for cmp::Ordering

2023-01-18 19:19:28 -08:00

packed.rs

…

panic-abort-windows.rs

…

panic-in-drop-abort.rs

…

panic-unwind-default-uwtable.rs

…

personality_lifetimes.rs

…

pgo-counter-bias.rs

…

pgo-instrumentation.rs

…

pic-relocation-model.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

pie-relocation-model.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

README.md

…

refs.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

repeat-trusted-len.rs

…

repr-transparent-aggregates-1.rs

…

repr-transparent-aggregates-2.rs

…

repr-transparent-aggregates-3.rs

…

repr-transparent-sysv64.rs

…

repr-transparent.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

sanitizer_memtag_attr_check.rs

…

sanitizer_scs_attr_check.rs

…

sanitizer-cfi-add-canonical-jump-tables-flag.rs

…

sanitizer-cfi-emit-type-checks.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

sanitizer-cfi-emit-type-metadata-id-itanium-cxx-abi.rs

…

sanitizer-cfi-emit-type-metadata-itanium-cxx-abi.rs

…

sanitizer-kcfi-add-kcfi-flag.rs

…

sanitizer-kcfi-emit-kcfi-operand-bundle-itanium-cxx-abi.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

sanitizer-memory-track-orgins.rs

…

sanitizer-no-sanitize-inlining.rs

…

sanitizer-no-sanitize.rs

…

sanitizer-recover.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

scalar-pair-bool.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

set-discriminant-invalid.rs

…

simd_arith_offset.rs

…

simd-wide-sum.rs

…

slice_as_from_ptr_range.rs

…

slice-as_chunks.rs

…

slice-init.rs

…

slice-iter-len-eq-zero.rs

…

slice-position-bounds-check.rs

…

slice-ref-equality.rs

…

slice-reverse.rs

…

slice-windows-no-bounds-check.rs

…

some-abis-do-extend-params-to-32-bits.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

some-global-nonnull.rs

…

sparc-struct-abi.rs

…

sse42-implies-crc32.rs

…

stack-probes-call.rs

…

stack-probes-inline.rs

…

stack-protector.rs

…

static-relocation-model-msvc.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

staticlib-external-inline-fns.rs

…

stores.rs

…

swap-large-types.rs

…

swap-simd-types.rs

…

swap-small-types.rs

…

target-cpu-on-functions.rs

…

target-feature-overrides.rs

…

thread-local.rs

…

to_vec.rs

…

transmute-scalar.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

try_identity.rs

…

try_question_mark_nop.rs

…

tune-cpu-on-functions.rs

…

tuple-layout-opt.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

unchecked_shifts.rs

…

unchecked-float-casts.rs

…

uninit-consts.rs

…

union-abi.rs

…

unpadded-simd.rs

…

unwind-and-panic-abort.rs

…

unwind-extern-exports.rs

…

unwind-extern-imports.rs

…

used_with_arg.rs

…

var-names.rs

Put noundef on all scalars that don't allow uninit

2023-01-17 08:14:35 +01:00

vec-calloc-llvm14.rs

…

vec-calloc.rs

Auto merge of #106989 - clubby789:is-zero-num, r=scottmcm

2023-01-19 08:04:26 +00:00

vec-in-place.rs

…

vec-iter-collect-len.rs

…

vec-optimizes-away.rs

…

vec-shrink-panik.rs

…

vecdeque_no_panic.rs

…

virtual-function-elimination-32bit.rs

…

virtual-function-elimination.rs

…

wasm_casts_trapping.rs

…

x86_64-macosx-deployment-target.rs

…

x86_64-no-macosx-deployment-target.rs

…

zip.rs

…

zst-offset.rs

Add more codegen tests

2023-01-17 16:23:22 +01:00

README.md

The files here use the LLVM FileCheck framework, documented at https://llvm.org/docs/CommandGuide/FileCheck.html.

One extension worth noting is the use of revisions as custom prefixes for FileCheck. If your codegen test has different behavior based on the chosen target or different compiler flags that you want to exercise, you can use a revisions annotation, like so:

// revisions: aaa bbb
// [bbb] compile-flags: --flags-for-bbb

After specifying those variations, you can write different expected, or explicitly unexpected output by using <prefix>-SAME: and <prefix>-NOT:, like so:

// CHECK: expected code
// aaa-SAME: emitted-only-for-aaa
// aaa-NOT:                        emitted-only-for-bbb
// bbb-NOT:  emitted-only-for-aaa
// bbb-SAME:                       emitted-only-for-bbb