Add intrinsics & target features for rd{rand,seed}
One question is whether or not we want to map feature name `rdrnd` to `rdrand` instead.
EDIT: as for use case, I would like to port my rdrand crate from inline assembly to these intrinsics.
- `--emit=asm --target=nvptx64-nvidia-cuda` can be used to turn a crate
into a PTX module (a `.s` file).
- intrinsics like `__syncthreads` and `blockIdx.x` are exposed as
`"platform-intrinsics"`.
- "cabi" has been implemented for the nvptx and nvptx64 architectures.
i.e. `extern "C"` works.
- a new ABI, `"ptx-kernel"`. That can be used to generate "global"
functions. Example: `extern "ptx-kernel" fn kernel() { .. }`. All
other functions are "device" functions.
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.
The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.
Closes#28273
This defines the `_mm256_blendv_pd` and `_mm256_blendv_ps` intrinsics.
The `_mm256_blend_pd` and `_mm256_blend_ps` intrinsics are not available
as LLVM intrinsics. In Clang they are implemented using the
shufflevector builtin.
Intel reference: https://software.intel.com/en-us/node/524070.
This defines `_mm256_broadcast_ps` and `_mm256_broadcast_pd`. The `_ss`
and `_sd` variants are not supported by LLVM. In Clang these intrinsics
are implemented as inline functions in C++.
Intel reference: https://software.intel.com/en-us/node/514144.
Note: the argument type should really be "0hPc" (a pointer to a vector
of half the width), but internally the LLVM intrinsic takes a pointer to
a signed integer, and for any other type LLVM will complain. This means
that a transmute is required to call these intrinsics.
The AVX2 broadcast intrinsics `_mm256_broadcastss_ps` and
`_mm256_broadcastsd_pd` are not available as LLVM intrinsics. In Clang
they are implemented using the shufflevector builtin.
The file it generates had been modified, but instead the generator
should have been modified, and the file regenerated. This merges the
modifications into the template in the generator.
This defines the following intrinsics for 128 and 256 bit vectors of f32
and f64:
* `fmadd`
* `fmaddsub`
* `fmsub`
* `fmsubadd`
* `fnmadd`
* `fnmsub`
The `_sd` and `_ss` variants are not included yet.
Intel intrinsic reference: https://software.intel.com/en-us/node/523929
The intrinsics there are listed under AVX2, but in the Intel Intrinsic
Guide they are part of the "FMA" technology, and LLVM puts them under
FMA, not AVX2.
Commit 9104a902c052c1ad7fd5c1245cb1e03f88aa2f70 fixed the generated
files, but that change would be lost (or require additional manual
intervention) if they are re-generated of if new architectures are
added.
cc #28273
This also involved adding `[TYPE;N]` syntax and aggregate indexing
support to the generator script: it's the only way to be able to have a
parameterised intrinsic that returns an aggregate, since one can't refer
to previous elements of the current aggregate (and that was harder to
implement).
I believe everything that doesn't take a constant integer up to SSE4.2
should now be correct (I don't have any reason to believe that those
that do take constant integers are wrong; they're just more complicated
and I just haven't tested them in detail).
This python script will consume an appropriately formatted JSON file and
output either a Rust file for use in librustc_platform_intrinsics, or an
extern block for importing the intrinsics in an external library.
The --help flag has details.