There are a few tests that depend on some target features **not** being
enabled by default, and usually they are correct with the default x86-64
target CPU. However, in downstream builds we have modified the default
to fit our distros -- `x86-64-v2` in RHEL 9 and `x86-64-v3` in RHEL 10
-- and the latter especially trips tests that expect not to have AVX.
These cases are few enough that we can just set them back explicitly.
The tests show that the code generation currently uses the least
significant bits of <iX x N> vector masks when converting to <i1 xN>.
This leads to an additional left shift operation in the assembly for
x86, since mask operations on x86 operate based on the most significant
bit. On aarch64 the left shift is followed by a comparison against zero,
which repeats the sign bit across the whole lane.
The exception, which does not introduce an unneeded shift, is
simd_bitmask, because the code generation already shifts before
truncating.
By using the "C" calling convention the tests should be stable regarding
changes in register allocation, but it is possible that future llvm
updates will require updating some of the checks.
This additional instruction would be removed by the fix in #104693,
which uses the most significant bit for all mask operations.