Commit Graph

46 Commits

Author SHA1 Message Date
bors
2d91939bb7 Auto merge of #107634 - scottmcm:array-drain, r=thomcc
Improve the `array::map` codegen

The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.

There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`).  This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.

(No libs-api changes; everything should be completely implementation-detail-internal.)

It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before.  And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞

r? `@thomcc`

As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
    x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
  %array.i.i.i.i = alloca [64 x i32], align 4
  %_3.sroa.5.i.i.i = alloca [65 x i32], align 4
  %_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)

But with this PR it's only `sub rsp, 520`
```llvm
start:
  %array.i.i.i.i.i.i = alloca [64 x i32], align 4
  %array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```

Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
        mov     esi, 64
        mov     edi, 0
        cmp     rdx, 64
        je      .LBB0_3
        lea     rsi, [rdx + 1]
        mov     qword ptr [rsp + 784], rsi
        mov     r8d, dword ptr [rsp + 4*rdx + 528]
        mov     edi, 1
        lea     edx, [r8 + 2*r8]
        lea     r8d, [r8 + 4*rdx]
        add     r8d, 7
.LBB0_3:
        test    edi, edi
        je      .LBB0_11
        mov     dword ptr [rsp + 4*rcx + 272], r8d
        cmp     rsi, 64
        jne     .LBB0_6
        xor     r8d, r8d
        mov     edx, 64
        test    r8d, r8d
        jne     .LBB0_8
        jmp     .LBB0_11
.LBB0_6:
        lea     rdx, [rsi + 1]
        mov     qword ptr [rsp + 784], rdx
        mov     edi, dword ptr [rsp + 4*rsi + 528]
        mov     r8d, 1
        lea     esi, [rdi + 2*rdi]
        lea     edi, [rdi + 4*rsi]
        add     edi, 7
        test    r8d, r8d
        je      .LBB0_11
.LBB0_8:
        mov     dword ptr [rsp + 4*rcx + 276], edi
        add     rcx, 2
        cmp     rcx, 64
        jne     .LBB0_1
```

whereas with this PR it's unrolled and vectorized
```nasm
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 64]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 328], ymm1
	vpmulld	ymm1, ymm0, ymmword ptr [rsp + 96]
	vpaddd	ymm1, ymm1, ymm2
	vmovdqu	ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
2023-02-13 10:18:48 +00:00
Scott McMurray
5bc328fdef Allow canonicalizing the array::map loop in trusted cases 2023-02-04 16:44:51 -08:00
Lukas Markeffsky
76e216f29b Use associated items of char instead of freestanding items in core::char 2023-01-14 11:58:41 +01:00
Scott McMurray
9d68a1a74c Tune RepeatWith::try_fold and Take::for_each and Vec::extend_trusted 2022-11-24 19:14:19 -08:00
Scott McMurray
d62b903892 VecDeque::resize should re-use the buffer in the passed-in element
Today it always copies it for *every* appended element, but one of those clones is avoidable.
2022-11-15 00:53:26 -08:00
The 8472
43c353fff7 simplification: do not process the ArrayChunks remainder in fold() 2022-11-07 21:44:25 +01:00
Matthias Krüger
6deca5f067
Rollup merge of #100220 - scottmcm:fix-by-ref-sized, r=joshtriplett
Properly forward `ByRefSized::fold` to the inner iterator

cc ``@timvermeulen,`` who noticed this mistake in https://github.com/rust-lang/rust/pull/100214#issuecomment-1207317625
2022-08-24 18:20:08 +02:00
bors
6c943bad02 Auto merge of #99541 - timvermeulen:flatten_cleanup, r=the8472
Refactor iteration logic in the `Flatten` and `FlatMap` iterators

The `Flatten` and `FlatMap` iterators both delegate to `FlattenCompat`:
```rust
struct FlattenCompat<I, U> {
    iter: Fuse<I>,
    frontiter: Option<U>,
    backiter: Option<U>,
}
```
Every individual iterator method that `FlattenCompat` implements needs to carefully manage this state, checking whether the `frontiter` and `backiter` are present, and storing the current iterator appropriately if iteration is aborted. This has led to methods such as `next`, `advance_by`, and `try_fold` all having similar code for managing the iterator's state.

I have extracted this common logic of iterating the inner iterators with the option to exit early into a `iter_try_fold` method:
```rust
impl<I, U> FlattenCompat<I, U>
where
    I: Iterator<Item: IntoIterator<IntoIter = U>>,
{
    fn iter_try_fold<Acc, Fold, R>(&mut self, acc: Acc, fold: Fold) -> R
    where
        Fold: FnMut(Acc, &mut U) -> R,
        R: Try<Output = Acc>,
    { ... }
}
```
It passes each of the inner iterators to the given function as long as it keep succeeding. It takes care of managing `FlattenCompat`'s state, so that the actual `Iterator` methods don't need to. The resulting code that makes use of this abstraction is much more straightforward:
```rust
fn next(&mut self) -> Option<U::Item> {
    #[inline]
    fn next<U: Iterator>((): (), iter: &mut U) -> ControlFlow<U::Item> {
        match iter.next() {
            None => ControlFlow::CONTINUE,
            Some(x) => ControlFlow::Break(x),
        }
    }

    self.iter_try_fold((), next).break_value()
}
```
Note that despite being implemented in terms of `iter_try_fold`, `next` is still able to benefit from `U`'s `next` method. It therefore does not take the performance hit that implementing `next` directly in terms of `Self::try_fold` causes (in some benchmarks).

This PR also adds `iter_try_rfold` which captures the shared logic of `try_rfold` and `advance_back_by`, as well as `iter_fold` and `iter_rfold` for folding without early exits (used by `fold`, `rfold`, `count`, and `last`).

Benchmark results:
```
                                             before                after
bench_flat_map_sum                       423,255 ns/iter      414,338 ns/iter
bench_flat_map_ref_sum                 1,942,139 ns/iter    2,216,643 ns/iter
bench_flat_map_chain_sum               1,616,840 ns/iter    1,246,445 ns/iter
bench_flat_map_chain_ref_sum           4,348,110 ns/iter    3,574,775 ns/iter
bench_flat_map_chain_option_sum          780,037 ns/iter      780,679 ns/iter
bench_flat_map_chain_option_ref_sum    2,056,458 ns/iter      834,932 ns/iter
```

I added the last two benchmarks specifically to demonstrate an extreme case where `FlatMap::next` can benefit from custom internal iteration of the outer iterator, so take it with a grain of salt. We should probably do a perf run to see if the changes to `next` are worth it in practice.
2022-08-19 02:34:30 +00:00
Scott McMurray
7680c8b690 Properly forward ByRefSized::fold to the inner iterator 2022-08-14 22:55:30 -07:00
austinabell
00bc9e8ac4
fix(iter::skip): Optimize next and nth implementations of Skip 2022-08-14 13:25:13 -04:00
Tim Vermeulen
3f7004920c Move fold logic to iter_fold method and reuse it in count and last 2022-08-05 03:43:39 +02:00
Maybe Waffle
4db628a801 Remove incorrect impl TrustedLen for ArrayChunks
As explained in the review of the previous attempt to add `ArrayChunks`,
adapters that shrink the length can't implement `TrustedLen`.
2022-08-01 19:16:24 +04:00
Ross MacArthur
f5485181ca Use array::IntoIter for the ArrayChunks remainder 2022-08-01 16:39:30 +04:00
Ross MacArthur
ca3d1010bb Add Iterator::array_chunks() 2022-08-01 16:39:27 +04:00
Tim Vermeulen
e52837c362 Add note to test about Unfuse 2022-07-18 21:53:35 +02:00
Tim Vermeulen
50c612faef Fix Skip::next for non-fused inner iterators 2022-07-18 21:10:47 +02:00
Ross MacArthur
bbdff1fff4
Add Iterator::next_chunk 2022-06-21 08:57:02 +02:00
est31
cdb8e64bc7 Use Box::new() instead of box syntax in core tests 2022-05-29 01:44:11 +02:00
Matthias Krüger
c183d4a510
Rollup merge of #94115 - scottmcm:iter-process-by-ref, r=yaahc
Let `try_collect` take advantage of `try_fold` overrides

No public API changes.

With this change, `try_collect` (#94047) is no longer going through the `impl Iterator for &mut impl Iterator`, and thus will be able to use `try_fold` overrides instead of being forced through `next` for every element.

Here's the test added, to see that it fails before this PR (once a new enough nightly is out): https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=462f2896f2fed2c238ee63ca1a7e7c56

This might as well go to the same person as my last `try_process` PR  (#93572), so
r? ``@yaahc``
2022-03-18 21:50:44 +01:00
bors
21b0325c68 Auto merge of #94738 - Urgau:rustbuild-check-cfg-values, r=Mark-Simulacrum
Enable conditional checking of values in the Rust codebase

This pull-request enable conditional checking of (well known) values in the Rust codebase.

Well known values were added in https://github.com/rust-lang/rust/pull/94362. All the `target_*` values are taken from all the built-in targets which is why some extra values were needed do be added as they are not (yet ?) defined in any built-in targets.

r? `@Mark-Simulacrum`
2022-03-13 18:34:00 +00:00
Scott McMurray
7ef74bc8b9 Let try_collect take advantage of try_fold overrides
Without this change it's going through `&mut impl Iterator`, which handles `?Sized` and thus currently can't forward generics.

Here's the test added, to see that it fails before this PR (once a new enough nightly is out): https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=462f2896f2fed2c238ee63ca1a7e7c56
2022-03-10 00:16:06 -08:00
Loïc BRANSTETT
e3ea59ada5 Remove unexpected #[cfg(target_pointer_width = "8")] in tests 2022-03-09 00:30:17 +01:00
fren_gor
04b3162764
Add collect_into 2022-02-20 01:57:32 +01:00
Arthur Lafrance
47d5196a00 Add a try_collect() helper method to Iterator
Tweaked `try_collect()` to accept more `Try` types

Updated feature attribute for tracking issue
2022-02-16 14:26:39 -08:00
tamaron
83242897fb add tests 2022-02-02 23:07:02 +09:00
Lucas Kent
08829853d3 eplace usages of vec![].into_iter with [].into_iter 2022-01-09 14:09:25 +11:00
Mara Bos
1acb44f03c Use IntoIterator for array impl everywhere. 2021-12-04 19:40:33 +01:00
kit
aef59e4fb8 Add a try_reduce method to the Iterator trait 2021-12-04 15:17:14 +11:00
The8472
3f9b26dc64 Fix Iterator::advance_by contract inconsistency
The `advance_by(n)` docs state that in the error case `Err(k)` that k is always less than n.
It also states that `advance_by(0)` may return `Err(0)` to indicate an exhausted iterator.
These statements are inconsistent.
Since only one implementation (Skip) actually made use of that I changed it to return Ok(()) in that case too.

While adding some tests I also found a bug in `Take::advance_back_by`.
2021-11-19 13:00:23 +01:00
The8472
2c6e67105e implement advance_(back_)_by on more iterators 2021-09-30 21:23:28 +02:00
Frank Steffahn
8d2bb9389a Consistent spelling of "adapter" in the standard library
Change all occurrences of "(A|a)daptor" to "(A|a)dapter".
2021-07-30 17:23:07 +02:00
The8472
8dd903cc77 implement ConstSizeIntoIterator for &[T;N] in addition to [T;N]
Due to #20400 the corresponding TrustedLen impls need a helper trait
instead of directly adding `Item = &[T;N]` bounds.
Since TrustedLen is a public trait this in turn means
the helper trait needs to be public. Since it's just a workaround
for a compiler deficit it's marked hidden, unstable and unsafe.
2021-07-16 20:38:42 +02:00
The8472
bd1c39dc6c implement TrustedLen for Flatten/FlatMap if the U: IntoIterator == [T; N]
This only works if arrays are passed directly instead of array iterators
because we need to be sure that they have not been advanced before
Flatten does its size calculation.
2021-07-15 22:59:30 +02:00
The8472
b4734b7c38 disable test on platforms that don't support unwinding 2021-06-20 12:20:05 +02:00
The8472
8b518542d0 fix panic-safety in specialized Zip::next_back
This was unsound since a panic in a.next_back() would result in the
length not being updated which would then lead to the same element
being revisited in the side-effect preserving code.
2021-06-19 02:20:51 +02:00
Muhammad Mominul Huque
507d97b26e Update expressions where we can use array's IntoIterator implementation 2021-06-02 16:09:04 +06:00
Mara Bos
8dc0ae24bc Remove Option::{unwrap_none, expect_none}. 2021-03-14 12:54:34 +01:00
Giacomo Stevanato
c1bfb9a78d Add relevant test 2021-03-05 19:09:23 +01:00
Mara
ee796c6523
Rollup merge of #82289 - SkiFire13:fix-issue-82282, r=m-ou-se
Fix underflow in specialized ZipImpl::size_hint

Fixes #82282
2021-03-05 10:57:19 +01:00
Giacomo Stevanato
8b9ac4d415 Add test for underflow in specialized Zip's size_hint 2021-03-03 21:16:08 +01:00
Ryan Levick
ee65416f0d Fix core tests 2021-03-03 11:22:49 +01:00
Giacomo Stevanato
f241c10223 Improve flatten-fuse tests 2021-01-23 21:33:38 +01:00
Daniel Conley
0c78500426 library/core/tests/iter documentation and cleanup 2021-01-22 17:57:08 -05:00
Daniel Conley
bc830a274b library/core/tests/iter rearrange & add back missed doc comments 2021-01-22 17:57:07 -05:00
Daniel Conley
1e3a2def67 library/core/test/iter add newlines between tests 2021-01-22 16:58:21 -05:00
Daniel Conley
3ce97000e1 library/core/test/iter.rs split attempt 2 2021-01-21 19:36:32 -05:00