rust/tests/codegen/simd/simd-wide-sum.rs

//@ revisions: llvm mir-opt3
//@ compile-flags: -C opt-level=3 -Z merge-functions=disabled --edition=2021
//@ only-x86_64
//@ [mir-opt3]compile-flags: -Zmir-opt-level=3
//@ [mir-opt3]build-pass

// mir-opt3 is a regression test for https://github.com/rust-lang/rust/issues/98016

#![crate_type = "lib"]
#![feature(portable_simd)]

use std::simd::prelude::*;
const N: usize = 16;

#[no_mangle]
// CHECK-LABEL: @wider_reduce_simd
pub fn wider_reduce_simd(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <16 x i8>
    // CHECK-SAME: to <16 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>
    let x: Simd<u16, N> = x.cast();
    x.reduce_sum()
}

#[no_mangle]
// CHECK-LABEL: @wider_reduce_loop
pub fn wider_reduce_loop(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <16 x i8>
    // CHECK-SAME: to <16 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>
    let mut sum = 0_u16;
    for i in 0..N {
        sum += u16::from(x[i]);
    }
    sum
}

#[no_mangle]
// CHECK-LABEL: @wider_reduce_iter
pub fn wider_reduce_iter(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <16 x i8>
    // CHECK-SAME: to <16 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>
    x.as_array().iter().copied().map(u16::from).sum()
}

// This iterator one is the most interesting, as it's the one
// which used to not auto-vectorize due to a suboptimality in the
// `<array::IntoIter as Iterator>::fold` implementation.

#[no_mangle]
// CHECK-LABEL: @wider_reduce_into_iter
pub fn wider_reduce_into_iter(x: Simd<u8, N>) -> u16 {
    // CHECK: zext <16 x i8>
    // CHECK-SAME: to <16 x i16>
    // CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>
    x.to_array().into_iter().map(u16::from).sum()
}
[AUTO_GENERATED] Migrate compiletest to use `ui_test`-style `//@` directives 2024-02-22 06:10:29 -06:00			`//@ revisions: llvm mir-opt3`
			`//@ compile-flags: -C opt-level=3 -Z merge-functions=disabled --edition=2021`
			`//@ only-x86_64`
			`//@ [mir-opt3]compile-flags: -Zmir-opt-level=3`
			`//@ [mir-opt3]build-pass`
Add mir-opt3 rev to simd-wide-sum test 2023-07-11 19:43:51 -05:00
			`// mir-opt3 is a regression test for https://github.com/rust-lang/rust/issues/98016`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00
			`#![crate_type = "lib"]`
			`#![feature(portable_simd)]`

Update std::simd usage and test outputs 2023-11-19 18:04:06 -06:00			`use std::simd::prelude::*;`
Increasing the SIMD size improves the vectorization possibilities Change the simd-wide-sum.rs to pass the LLVM main branching test. 2023-09-20 23:29:20 -05:00			`const N: usize = 16;`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00
			`#[no_mangle]`
			`// CHECK-LABEL: @wider_reduce_simd`
			`pub fn wider_reduce_simd(x: Simd<u8, N>) -> u16 {`
Increasing the SIMD size improves the vectorization possibilities Change the simd-wide-sum.rs to pass the LLVM main branching test. 2023-09-20 23:29:20 -05:00			`// CHECK: zext <16 x i8>`
			`// CHECK-SAME: to <16 x i16>`
			`// CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00			`let x: Simd<u16, N> = x.cast();`
			`x.reduce_sum()`
			`}`

			`#[no_mangle]`
			`// CHECK-LABEL: @wider_reduce_loop`
			`pub fn wider_reduce_loop(x: Simd<u8, N>) -> u16 {`
Increasing the SIMD size improves the vectorization possibilities Change the simd-wide-sum.rs to pass the LLVM main branching test. 2023-09-20 23:29:20 -05:00			`// CHECK: zext <16 x i8>`
			`// CHECK-SAME: to <16 x i16>`
			`// CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00			`let mut sum = 0_u16;`
			`for i in 0..N {`
			`sum += u16::from(x[i]);`
			`}`
			`sum`
			`}`

			`#[no_mangle]`
			`// CHECK-LABEL: @wider_reduce_iter`
			`pub fn wider_reduce_iter(x: Simd<u8, N>) -> u16 {`
Increasing the SIMD size improves the vectorization possibilities Change the simd-wide-sum.rs to pass the LLVM main branching test. 2023-09-20 23:29:20 -05:00			`// CHECK: zext <16 x i8>`
			`// CHECK-SAME: to <16 x i16>`
			`// CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00			`x.as_array().iter().copied().map(u16::from).sum()`
			`}`

			`// This iterator one is the most interesting, as it's the one`
			`// which used to not auto-vectorize due to a suboptimality in the`
			// `<array::IntoIter as Iterator>::fold` implementation.

			`#[no_mangle]`
			`// CHECK-LABEL: @wider_reduce_into_iter`
			`pub fn wider_reduce_into_iter(x: Simd<u8, N>) -> u16 {`
Restore the test checks for `wider_reduce_into_iter` The current minimum support is for LLVM 17. 2024-03-28 08:28:32 -05:00			`// CHECK: zext <16 x i8>`
			`// CHECK-SAME: to <16 x i16>`
			`// CHECK: call i16 @llvm.vector.reduce.add.v16i16(<16 x i16>`
Fix `array::IntoIter::fold` to use the optimized `Range::fold` It was using `Iterator::by_ref` in the implementation, which ended up pessimizing it enough that, for example, it didn't vectorize when we tried it in the <https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Reducing.20sum.20into.20wider.20types> conversation. Demonstration that the codegen test doesn't pass on the current nightly: <https://rust.godbolt.org/z/Taxev5eMn> 2022-04-02 16:29:41 -05:00			`x.to_array().into_iter().map(u16::from).sum()`
			`}`