Optimize `array::IntoIter`
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`.
This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better.
I added one simple demonstration of the improvement as a codegen test.
(`vec::IntoIter` uses pointers instead of indexes, so doesn't have this problem, but that only works because its elements are boxed. `array::IntoIter` can't use pointers because that would keep it from being movable.)