rust/codegen at 2d91939bb7130a8e6c092a290b7d37f654e3c23c - rust

History

bors 2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc Improve the `array::map` codegen The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](`7c46fb2111/library/core/src/array/mod.rs (L865-L912)`) function, I had some ideas that ended up working better than I'd expected. There's three main ideas in here, split over three commits: 1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily 2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant) 3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward. (No libs-api changes; everything should be completely implementation-detail-internal.) It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞 r? `@thomcc` As a simple example, this test ```rust pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] { x.map(\|x\| 13 * x + 7) } ``` On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808` ```llvm start: %array.i.i.i.i = alloca [64 x i32], align 4 %_3.sroa.5.i.i.i = alloca [65 x i32], align 4 %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8 ``` (and yes, that's a 65-element array `alloca` despite 64-element input and output) But with this PR it's only `sub rsp, 520` ```llvm start: %array.i.i.i.i.i.i = alloca [64 x i32], align 4 %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4 ``` Similarly, the loop it emits on nightly is scalar-only and horrifying ```nasm .LBB0_1: mov esi, 64 mov edi, 0 cmp rdx, 64 je .LBB0_3 lea rsi, [rdx + 1] mov qword ptr [rsp + 784], rsi mov r8d, dword ptr [rsp + 4rdx + 528] mov edi, 1 lea edx, [r8 + 2r8] lea r8d, [r8 + 4rdx] add r8d, 7 .LBB0_3: test edi, edi je .LBB0_11 mov dword ptr [rsp + 4rcx + 272], r8d cmp rsi, 64 jne .LBB0_6 xor r8d, r8d mov edx, 64 test r8d, r8d jne .LBB0_8 jmp .LBB0_11 .LBB0_6: lea rdx, [rsi + 1] mov qword ptr [rsp + 784], rdx mov edi, dword ptr [rsp + 4rsi + 528] mov r8d, 1 lea esi, [rdi + 2rdi] lea edi, [rdi + 4rsi] add edi, 7 test r8d, r8d je .LBB0_11 .LBB0_8: mov dword ptr [rsp + 4rcx + 276], edi add rcx, 2 cmp rcx, 64 jne .LBB0_1 ``` whereas with this PR it's unrolled and vectorized ```nasm vpmulld ymm1, ymm0, ymmword ptr [rsp + 64] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 328], ymm1 vpmulld ymm1, ymm0, ymmword ptr [rsp + 96] vpaddd ymm1, ymm1, ymm2 vmovdqu ymmword ptr [rsp + 360], ymm1 ``` (though sadly still stack-to-stack)		2023-02-13 10:18:48 +00:00
..
auxiliary
avr	abi: add `AddressSpace` field to `Primitive::Pointer`	2023-01-22 23:41:39 -05:00
dllimports	Add more codegen tests	2023-01-17 16:23:22 +01:00
instrument-xray	Test XRay only for supported targets	2023-02-09 12:29:43 +09:00
intrinsics	Add more codegen tests	2023-01-17 16:23:22 +01:00
non-terminate
remap_path_prefix
riscv-abi	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
simd-intrinsic
src-hash-algorithm
unwind-abis
abi-efiapi.rs	Stabilize `abi_efiapi` feature	2023-01-11 20:42:13 -05:00
abi-main-signature-16bit-c-int.rs
abi-main-signature-32bit-c-int.rs
abi-repr-ext.rs
abi-sysv64.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
abi-x86_64_sysv.rs
abi-x86-interrupt.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
adjustments.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
align-enum.rs
align-fn.rs
align-struct.rs
alloc-optimisation.rs
array-clone.rs
array-equality.rs
array-map.rs	Allow canonicalizing the `array::map` loop in trusted cases	2023-02-04 16:44:51 -08:00
asm-clobber_abi.rs
asm-clobbers.rs
asm-may_unwind.rs
asm-multiple-options.rs
asm-options.rs
asm-powerpc-clobbers.rs
asm-sanitize-llvm.rs
asm-target-clobbers.rs
async-fn-debug-awaitee-field.rs
async-fn-debug-msvc.rs
async-fn-debug.rs
atomic-operations.rs
autovectorize-f32x4.rs	Add another autovectorization codegen test using array zip-map	2023-02-04 16:44:53 -08:00
binary-search-index-no-bound-check.rs
bool-cmp.rs
box-maybe-uninit-llvm14.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
box-maybe-uninit.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
bpf-alu32.rs
branch-protection.rs
c-variadic-copy.rs
c-variadic-opt.rs
c-variadic.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
call-llvm-intrinsics.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
call-metadata.rs
catch-unwind.rs
cdylib-external-inline-fns.rs
cf-protection.rs
cfguard-checks.rs
cfguard-disabled.rs
cfguard-nochecks.rs
cfguard-non-msvc.rs
codemodels.rs
coercions.rs
cold-call-declare-and-call.rs
comparison-operators-newtype.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
consts.rs
dealloc-no-unwind.rs
debug-alignment.rs
debug-column-msvc.rs
debug-column.rs
debug-compile-unit-path.rs
debug-linkage-name.rs
debug-vtable.rs	Don't merge vtables when full debuginfo is enabled.	2023-01-27 15:29:04 +00:00
debuginfo-generic-closure-env-names.rs
deduced-param-attrs.rs
default-requires-uwtable.rs
drop.rs
dst-vtable-align-nonzero.rs
dst-vtable-size-range.rs
enum-bounds-check-derived-idx.rs
enum-bounds-check-issue-13926.rs
enum-bounds-check-issue-82871.rs
enum-bounds-check.rs
enum-debug-clike.rs
enum-debug-niche-2.rs
enum-debug-niche.rs
enum-debug-tagged.rs
enum-discriminant-value.rs
enum-match.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
export-no-mangle.rs
external-no-mangle-fns.rs
external-no-mangle-statics.rs
fastcall-inreg.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
fatptr.rs
fewer-names.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
ffi-const.rs
ffi-out-of-bounds-loads.rs
ffi-pure.rs
ffi-returns-twice.rs
float_math.rs
fn-impl-trait-self.rs
foo.s
force-frame-pointers.rs
force-no-unwind-tables.rs
force-unwind-tables.rs
frame-pointer.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
function-arguments-noopt.rs	make PointerKind directly reflect pointer types	2023-02-06 11:46:32 +01:00
function-arguments.rs	also do not add noalias on not-Unpin Box	2023-02-06 12:17:41 +01:00
gdb_debug_script_load.rs
generator-debug-msvc.rs
generator-debug.rs
generic-debug.rs
global_asm_include.rs
global_asm_x2.rs
global_asm.rs
i686-macosx-deployment-target.rs
i686-no-macosx-deployment-target.rs
inline-always-works-always.rs
inline-debuginfo.rs
inline-hint.rs
instrument-coverage.rs
instrument-mcount.rs
integer-cmp.rs
integer-overflow.rs
internalize-closures.rs
intrinsic-no-unnamed-attr.rs
issue-13018.rs
issue-15953.rs
issue-27130.rs
issue-32031.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
issue-32364.rs
issue-34634.rs
issue-34947-pow-i32.rs
issue-37945.rs
issue-44056-macos-tls-align.rs
issue-45222.rs
issue-45466.rs
issue-45964-bounds-check-slice-pos.rs	replace manual ptr arithmetic with ptr_sub	2023-01-15 17:38:05 +01:00
issue-47278.rs
issue-47442.rs
issue-56267-2.rs
issue-56267.rs
issue-56927.rs
issue-58881.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
issue-59352.rs
issue-69101-bounds-check.rs
issue-73031.rs
issue-73338-effecient-cmp.rs
issue-73396-bounds-check-after-position.rs
issue-73827-bounds-check-index-in-subexpr.rs
issue-75525-bounds-checks.rs
issue-75546.rs
issue-75659.rs	Support `true` and `false` as boolean flag params	2023-01-18 20:46:36 +01:00
issue-77812.rs
issue-81408-dllimport-thinlto-windows.rs
issue-84268.rs
issue-85872-multiple-reverse.rs
issue-86106.rs
issue-96274.rs
issue-96497-slice-size-nowrap.rs	bump failing assembly & codegen tests from LLVM 14 to LLVM 15	2023-01-17 20:02:01 +01:00
issue-98156-const-arg-temp-lifetime.rs
issue-98294-get-mut-copy-from-slice-opt.rs
issue-103285-ptr-addr-overflow-check.rs
issue-103840.rs
issue-105386-ub-in-debuginfo.rs
iter-repeat-n-trivial-drop.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
layout-size-checks.rs
lifetime_start_end.rs
link_section.rs
link-dead-code.rs
loads.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
local-generics-in-exe-internalized.rs
lto-removes-invokes.rs
mainsubprogram.rs
mainsubprogramstart.rs
match-optimized.rs
match-optimizes-away.rs
match-unoptimized.rs
mem-replace-direct-memcpy.rs
merge-functions.rs
mir_zst_stores.rs
mir-inlined-line-numbers.rs
move-operands.rs	Add a regression test for argument copies with DestinationPropagation	2023-01-11 10:27:06 -05:00
naked-functions.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
naked-nocoverage.rs
naked-noinline.rs
no-assumes-on-casts.rs
no-dllimport-w-cross-lang-lto.rs
no-jump-tables.rs
no-plt.rs
noalias-box-off.rs
noalias-box.rs
noalias-flag.rs
noalias-refcell.rs
noalias-rwlockreadguard.rs
noalias-unpin.rs
noreturn-uninhabited.rs
noreturnflag.rs
nounwind.rs
nrvo.rs
optimize-attr-1.rs
option-nonzero-eq.rs	Implement `SpecOptionPartialEq` for `cmp::Ordering`	2023-01-18 19:19:28 -08:00
packed.rs
panic-abort-windows.rs
panic-in-drop-abort.rs
panic-unwind-default-uwtable.rs
personality_lifetimes.rs
pgo-counter-bias.rs
pgo-instrumentation.rs
pic-relocation-model.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
pie-relocation-model.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
README.md
refs.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
repeat-trusted-len.rs
repr-transparent-aggregates-1.rs
repr-transparent-aggregates-2.rs
repr-transparent-aggregates-3.rs
repr-transparent-sysv64.rs
repr-transparent.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
sanitizer_memtag_attr_check.rs
sanitizer_scs_attr_check.rs
sanitizer-cfi-add-canonical-jump-tables-flag.rs
sanitizer-cfi-emit-type-checks.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
sanitizer-cfi-emit-type-metadata-id-itanium-cxx-abi.rs
sanitizer-cfi-emit-type-metadata-itanium-cxx-abi.rs
sanitizer-kcfi-add-kcfi-flag.rs
sanitizer-kcfi-emit-kcfi-operand-bundle-itanium-cxx-abi.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
sanitizer-memory-track-orgins.rs
sanitizer-no-sanitize-inlining.rs
sanitizer-no-sanitize.rs
sanitizer-recover.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
scalar-pair-bool.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
set-discriminant-invalid.rs
simd_arith_offset.rs
simd-wide-sum.rs
slice_as_from_ptr_range.rs
slice-as_chunks.rs
slice-init.rs
slice-iter-len-eq-zero.rs
slice-position-bounds-check.rs
slice-ref-equality.rs
slice-reverse.rs
slice-windows-no-bounds-check.rs
some-abis-do-extend-params-to-32-bits.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
some-global-nonnull.rs
sparc-struct-abi.rs
sse42-implies-crc32.rs
stack-probes-call.rs
stack-probes-inline.rs
stack-protector.rs
static-relocation-model-msvc.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
staticlib-external-inline-fns.rs
stores.rs
swap-large-types.rs
swap-simd-types.rs
swap-small-types.rs
target-cpu-on-functions.rs
target-feature-overrides.rs
thread-local.rs
to_vec.rs
transmute-scalar.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
try_identity.rs
try_question_mark_nop.rs
tune-cpu-on-functions.rs
tuple-layout-opt.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00
unchecked_shifts.rs
unchecked-float-casts.rs
uninit-consts.rs
union-abi.rs
unpadded-simd.rs
unwind-and-panic-abort.rs
unwind-extern-exports.rs
unwind-extern-imports.rs
used_with_arg.rs
var-names.rs	Put `noundef` on all scalars that don't allow uninit	2023-01-17 08:14:35 +01:00
vec-calloc-llvm14.rs
vec-calloc.rs	Auto merge of #106989 - clubby789:is-zero-num, r=scottmcm	2023-01-19 08:04:26 +00:00
vec-in-place.rs
vec-iter-collect-len.rs
vec-optimizes-away.rs
vec-shrink-panik.rs
vecdeque_no_panic.rs
virtual-function-elimination-32bit.rs
virtual-function-elimination.rs
wasm_casts_trapping.rs
x86_64-macosx-deployment-target.rs
x86_64-no-macosx-deployment-target.rs
zip.rs
zst-offset.rs	Add more codegen tests	2023-01-17 16:23:22 +01:00

README.md

The files here use the LLVM FileCheck framework, documented at https://llvm.org/docs/CommandGuide/FileCheck.html.

One extension worth noting is the use of revisions as custom prefixes for FileCheck. If your codegen test has different behavior based on the chosen target or different compiler flags that you want to exercise, you can use a revisions annotation, like so:

// revisions: aaa bbb
// [bbb] compile-flags: --flags-for-bbb

After specifying those variations, you can write different expected, or explicitly unexpected output by using <prefix>-SAME: and <prefix>-NOT:, like so:

// CHECK: expected code
// aaa-SAME: emitted-only-for-aaa
// aaa-NOT:                        emitted-only-for-bbb
// bbb-NOT:  emitted-only-for-aaa
// bbb-SAME:                       emitted-only-for-bbb