optimize zipping over array iterators
Fixes#115339 (somewhat)
the new assembly:
```asm
zip_arrays:
.cfi_startproc
vmovups (%rdx), %ymm0
leaq 32(%rsi), %rcx
vxorps %xmm1, %xmm1, %xmm1
vmovups %xmm1, -24(%rsp)
movq $0, -8(%rsp)
movq %rsi, -88(%rsp)
movq %rdi, %rax
movq %rcx, -80(%rsp)
vmovups %ymm0, -72(%rsp)
movq $0, -40(%rsp)
movq $32, -32(%rsp)
movq -24(%rsp), %rcx
vmovups (%rsi,%rcx), %ymm0
vorps -72(%rsp,%rcx), %ymm0, %ymm0
vmovups %ymm0, (%rsi,%rcx)
vmovups (%rsi), %ymm0
vmovups %ymm0, (%rdi)
vzeroupper
retq
```
This is still longer than the slice version given in the issue but at least it eliminates the terrible `vpextrb`/`orb` chain. I guess this is due to excessive memcpys again (haven't looked at the llvmir)?
The `TrustedLen` specialization is a drive-by change since I had to do something for the default impl anyway to be able to specialize the `TrustedRandomAccessNoCoerce` impl.