If we're running against a patched libgccjit, use an algorithm similar
to what LLVM uses for this intrinsic. Otherwise, fallback to a
per-element bitreverse.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
The simd intrinsic handler was delegating implementation of `simd_frem`
to `Builder::frem`, which wasn't able to handle vector-typed inputs. To
fix this, teach this method how to handle vector inputs.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
Implements lane-local byte swapping through vector shuffles. While this
is more setup than non-vector shuffles, this implementation can shuffle
multiple integers concurrently.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
Don't fall back on breaking apart the popcount operation if 128-bit
integers are natively supported.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>
In the current implementation, the gcc backend of rustc currently emits the
following for a function that implements popcount for a u32 (x86_64 targeting
AVX2, using standard unix calling convention):
popcount:
mov eax, edi
and edi, 1431655765
shr eax
and eax, 1431655765
add edi, eax
mov edx, edi
and edi, 858993459
shr edx, 2
and edx, 858993459
add edx, edi
mov eax, edx
and edx, 252645135
shr eax, 4
and eax, 252645135
add eax, edx
mov edx, eax
and eax, 16711935
shr edx, 8
and edx, 16711935
add edx, eax
movzx eax, dx
shr edx, 16
add eax, edx
ret
Rather than using this implementation, gcc could be told to use Wenger's
algorithm. This would give the same function the following implementation:
popcount:
xor eax, eax
xor edx, edx
popcnt eax, edi
test edi, edi
cmove eax, edx
ret
This patch implements the popcount operation in terms of Wenger's algorithm in
all cases.
Signed-off-by: Andy Sadler <andrewsadler122@gmail.com>