Optimize `slice_iter.copied().next_chunk()`
```
OLD:
test iter::bench_copied_array_chunks ... bench: 371 ns/iter (+/- 7)
NEW:
test iter::bench_copied_array_chunks ... bench: 31 ns/iter (+/- 0)
```
The default `next_chunk` implementation suffers from having to assemble the array byte by byte via `next()`, checking the `Option<&T>` and then dereferencing `&T`. The specialization copies the chunk directly from the slice.