0d410be23c
optimize zipping over array iterators Fixes #115339 (somewhat) the new assembly: ```asm zip_arrays: .cfi_startproc vmovups (%rdx), %ymm0 leaq 32(%rsi), %rcx vxorps %xmm1, %xmm1, %xmm1 vmovups %xmm1, -24(%rsp) movq $0, -8(%rsp) movq %rsi, -88(%rsp) movq %rdi, %rax movq %rcx, -80(%rsp) vmovups %ymm0, -72(%rsp) movq $0, -40(%rsp) movq $32, -32(%rsp) movq -24(%rsp), %rcx vmovups (%rsi,%rcx), %ymm0 vorps -72(%rsp,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rsi,%rcx) vmovups (%rsi), %ymm0 vmovups %ymm0, (%rdi) vzeroupper retq ``` This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)? The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl. |
||
---|---|---|
.. | ||
asm | ||
auxiliary | ||
libs | ||
nvptx-kernel-abi | ||
stack-protector | ||
aarch64-naked-fn-no-bti-prolog.rs | ||
aarch64-pointer-auth.rs | ||
align_offset.rs | ||
closure-inherit-target-feature.rs | ||
dwarf5.rs | ||
is_aligned.rs | ||
niche-prefer-zero.rs | ||
nvptx-arch-default.rs | ||
nvptx-arch-emit-asm.rs | ||
nvptx-arch-link-arg.rs | ||
nvptx-arch-target-cpu.rs | ||
nvptx-atomics.rs | ||
nvptx-internalizing.rs | ||
nvptx-linking-binary.rs | ||
nvptx-linking-cdylib.rs | ||
nvptx-safe-naming.rs | ||
option-nonzero-eq.rs | ||
panic-no-unwind-no-uwtable.rs | ||
panic-unwind-no-uwtable.rs | ||
pic-relocation-model.rs | ||
pie-relocation-model.rs | ||
slice-is_ascii.rs | ||
sparc-struct-abi.rs | ||
static-relocation-model.rs | ||
strict_provenance.rs | ||
target-feature-multiple.rs | ||
wasm_exceptions.rs | ||
x86_64-array-pair-load-store-merge.rs | ||
x86_64-floating-point-clamp.rs | ||
x86_64-fortanix-unknown-sgx-lvi-generic-load.rs | ||
x86_64-fortanix-unknown-sgx-lvi-generic-ret.rs | ||
x86_64-fortanix-unknown-sgx-lvi-inline-assembly.rs | ||
x86_64-naked-fn-no-cet-prolog.rs | ||
x86_64-no-jump-tables.rs | ||
x86_64-sse_crc.rs | ||
x86-stack-probes.rs |