Work around an issue where usize and isize can sometimes (but not
always) get canonicalized to their corresponding integer type. This
causes shuffle_vector to panic, since the types of the vectors it got
passed aren't the same.
Also insert a cast on the mask element, since we might get passed a
signed integer of any size, not just i32. For now, we always cast to
i32.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
If we're running against a patched libgccjit, use an algorithm similar
to what LLVM uses for this intrinsic. Otherwise, fallback to a
per-element bitreverse.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
The simd intrinsic handler was delegating implementation of `simd_frem`
to `Builder::frem`, which wasn't able to handle vector-typed inputs. To
fix this, teach this method how to handle vector inputs.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
Implements lane-local byte swapping through vector shuffles. While this
is more setup than non-vector shuffles, this implementation can shuffle
multiple integers concurrently.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
Don't fall back on breaking apart the popcount operation if 128-bit
integers are natively supported.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
In the current implementation, the gcc backend of rustc currently emits the
following for a function that implements popcount for a u32 (x86_64 targeting
AVX2, using standard unix calling convention):
popcount:
mov eax, edi
and edi, 1431655765
shr eax
and eax, 1431655765
add edi, eax
mov edx, edi
and edi, 858993459
shr edx, 2
and edx, 858993459
add edx, edi
mov eax, edx
and edx, 252645135
shr eax, 4
and eax, 252645135
add eax, edx
mov edx, eax
and eax, 16711935
shr edx, 8
and edx, 16711935
add edx, eax
movzx eax, dx
shr edx, 16
add eax, edx
ret
Rather than using this implementation, gcc could be told to use Wenger's
algorithm. This would give the same function the following implementation:
popcount:
xor eax, eax
xor edx, edx
popcnt eax, edi
test edi, edi
cmove eax, edx
ret
This patch implements the popcount operation in terms of Wenger's algorithm in
all cases.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
c6e6ecb1afea9695a42d0f148ce153536b279eb5 added it to some of the
compiler's crates, but avoided adding it to all of them to reduce
bit-rot. This commit adds to more.