0d410be23c
optimize zipping over array iterators Fixes #115339 (somewhat) the new assembly: ```asm zip_arrays: .cfi_startproc vmovups (%rdx), %ymm0 leaq 32(%rsi), %rcx vxorps %xmm1, %xmm1, %xmm1 vmovups %xmm1, -24(%rsp) movq $0, -8(%rsp) movq %rsi, -88(%rsp) movq %rdi, %rax movq %rcx, -80(%rsp) vmovups %ymm0, -72(%rsp) movq $0, -40(%rsp) movq $32, -32(%rsp) movq -24(%rsp), %rcx vmovups (%rsi,%rcx), %ymm0 vorps -72(%rsp,%rcx), %ymm0, %ymm0 vmovups %ymm0, (%rsi,%rcx) vmovups (%rsi), %ymm0 vmovups %ymm0, (%rdi) vzeroupper retq ``` This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)? The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl. |
||
---|---|---|
.. | ||
fmt | ||
hash | ||
iter | ||
net | ||
num | ||
ops | ||
panic | ||
alloc.rs | ||
any.rs | ||
array.rs | ||
ascii.rs | ||
asserting.rs | ||
atomic.rs | ||
bool.rs | ||
cell.rs | ||
char.rs | ||
clone.rs | ||
cmp.rs | ||
const_ptr.rs | ||
convert.rs | ||
error.rs | ||
future.rs | ||
intrinsics.rs | ||
lazy.rs | ||
lib.rs | ||
macros.rs | ||
manually_drop.rs | ||
mem.rs | ||
nonzero.rs | ||
ops.rs | ||
option.rs | ||
panic.rs | ||
pattern.rs | ||
pin_macro.rs | ||
pin.rs | ||
ptr.rs | ||
result.rs | ||
simd.rs | ||
slice.rs | ||
str_lossy.rs | ||
str.rs | ||
task.rs | ||
time.rs | ||
tuple.rs | ||
unicode.rs | ||
waker.rs |