38 Commits

Author SHA1 Message Date
Michael Wu
cc4efd1370 Add support for Hexagon v60 HVX intrinsics 2017-05-07 15:07:36 -04:00
bors
61b93bd811 Auto merge of #38561 - nagisa:rdrandseed, r=alexcrichton
Add intrinsics & target features for rd{rand,seed}

One question is whether or not we want to map feature name `rdrnd` to `rdrand` instead.

EDIT: as for use case, I would like to port my rdrand crate from inline assembly to these intrinsics.
2017-02-14 01:26:10 +00:00
Jorge Aparicio
18d49288d5 PTX support
- `--emit=asm --target=nvptx64-nvidia-cuda` can be used to turn a crate
  into a PTX module (a `.s` file).

- intrinsics like `__syncthreads` and `blockIdx.x` are exposed as
  `"platform-intrinsics"`.

- "cabi" has been implemented for the nvptx and nvptx64 architectures.
  i.e. `extern "C"` works.

- a new ABI, `"ptx-kernel"`. That can be used to generate "global"
  functions. Example: `extern "ptx-kernel" fn kernel() { .. }`. All
  other functions are "device" functions.
2016-12-26 21:06:23 -05:00
Simonas Kazlauskas
b2cf6df875 Add intrinsics & target features for rd{rand,seed} 2016-12-22 23:53:30 +02:00
Eitan Adler
1a4a723dda remove useless semicolon from python 2016-09-17 23:10:12 -07:00
Eitan Adler
733fe1d25c make functions static where possible 2016-09-17 23:08:31 -07:00
Eitan Adler
8de97dddfd simplify python code 2016-09-17 22:52:00 -07:00
gnzlbg
483bec790b Add target_features for the bit manipulation instruction sets: BMI 1.0, BMI 2.0, and TBM. 2016-06-22 17:11:17 +02:00
gnzlbg
10cbc37cdd Add intrinsics for x86 bit manipulation instruction sets: BMI 1.0, BMI 2.0, and TBM. 2016-06-22 16:34:10 +02:00
gnzlbg
152055451c Allow different instruction set prefixes within the same architecture 2016-06-22 16:30:55 +02:00
gnzlbg
cb4f54dc04 Add usage examples to the documentation of etc/platform-intrinsics/generator.py 2016-06-22 14:20:37 +02:00
Eduard Burtescu
0abd3139db rustc_platform_intrinsics: remove unused rustc dependency. 2016-03-29 19:36:01 +03:00
Alex Crichton
87ede2da54 rustc: Improve compile time of platform intrinsics
This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273
2016-03-15 17:32:34 -07:00
Ruud van Asseldonk
e1489caf0b Define AVX blend intrinsics
This defines the `_mm256_blendv_pd` and `_mm256_blendv_ps` intrinsics.
The `_mm256_blend_pd` and `_mm256_blend_ps` intrinsics are not available
as LLVM intrinsics. In Clang they are implemented using the
shufflevector builtin.

Intel reference: https://software.intel.com/en-us/node/524070.
2016-03-13 15:04:14 +01:00
Ruud van Asseldonk
ddfe9b6d7d Define AVX comparison intrinsics
This defines `_mm256_cmp_pd` and `_mm256_cmp_ps`.

Intel reference: https://software.intel.com/en-us/node/524075.
2016-03-13 15:04:14 +01:00
Ruud van Asseldonk
51b5300b3f Define AVX conversion intrinsics
This defines the following intrinsics:

 * `_mm256_cvtepi32_pd`
 * `_mm256_cvtepi32_ps`
 * `_mm256_cvtpd_epi32`
 * `_mm256_cvtpd_ps`
 * `_mm256_cvtps_epi32`
 * `_mm256_cvtps_pd`
 * `_mm256_cvttpd_epi32`
 * `_mm256_cvttps_epi32`

Intel reference: https://software.intel.com/en-us/node/514130.
2016-03-09 01:18:46 +01:00
Ruud van Asseldonk
37efeae886 Define AVX broadcast intrinsics
This defines `_mm256_broadcast_ps` and `_mm256_broadcast_pd`. The `_ss`
and `_sd` variants are not supported by LLVM. In Clang these intrinsics
are implemented as inline functions in C++.

Intel reference: https://software.intel.com/en-us/node/514144.

Note: the argument type should really be "0hPc" (a pointer to a vector
of half the width), but internally the LLVM intrinsic takes a pointer to
a signed integer, and for any other type LLVM will complain. This means
that a transmute is required to call these intrinsics.

The AVX2 broadcast intrinsics `_mm256_broadcastss_ps` and
`_mm256_broadcastsd_pd` are not available as LLVM intrinsics. In Clang
they are implemented using the shufflevector builtin.
2016-03-09 01:18:46 +01:00
Ruud van Asseldonk
0ce0cf1c87 Update platform intrinsic generator script
The file it generates had been modified, but instead the generator
should have been modified, and the file regenerated. This merges the
modifications into the template in the generator.
2016-03-05 16:35:57 +01:00
Ruud van Asseldonk
8872163b32 Define x86 fused multiply-add intrinsics
This defines the following intrinsics for 128 and 256 bit vectors of f32
and f64:

 * `fmadd`
 * `fmaddsub`
 * `fmsub`
 * `fmsubadd`
 * `fnmadd`
 * `fnmsub`

The `_sd` and `_ss` variants are not included yet.

Intel intrinsic reference: https://software.intel.com/en-us/node/523929

The intrinsics there are listed under AVX2, but in the Intel Intrinsic
Guide they are part of the "FMA" technology, and LLVM puts them under
FMA, not AVX2.
2016-03-05 16:33:11 +01:00
Andrea Canciani
9aa1289a67 Add a comment to explain the #[inline(never)] annotation
and regenerate the platform intrinsics source files.
2015-09-12 17:05:29 +02:00
Andrea Canciani
9ef62a4490 Fix generator.py to avoid pathological inlining
Commit 9104a902c052c1ad7fd5c1245cb1e03f88aa2f70 fixed the generated
files, but that change would be lost (or require additional manual
intervention) if they are re-generated of if new architectures are
added.

cc #28273
2015-09-12 09:28:53 +02:00
Huon Wilson
67aa4c775a Add some fancier AArch64 load/store instructions. 2015-09-04 09:14:13 -07:00
Huon Wilson
7241ae9112 Support return aggregates in platform intrinsics.
This also involved adding `[TYPE;N]` syntax and aggregate indexing
support to the generator script: it's the only way to be able to have a
parameterised intrinsic that returns an aggregate, since one can't refer
to previous elements of the current aggregate (and that was harder to
implement).
2015-09-04 09:14:13 -07:00
Huon Wilson
c19e7b629b Add various pointer & void-using x86 intrinsics. 2015-09-04 09:14:13 -07:00
Huon Wilson
2b45a9ab54 Support bitcasts in platform intrinsic generator. 2015-09-04 09:14:13 -07:00
Huon Wilson
62e346af4b Support void in platform intrinsic generator. 2015-09-04 09:14:13 -07:00
Huon Wilson
add04307f9 Support non-return value references in platform intrinsic generator. 2015-09-04 09:14:13 -07:00
Huon Wilson
d12135a70d Add support for pointers to generator.py. 2015-09-04 09:14:12 -07:00
Huon Wilson
787a21fe7c Fix some typos in SSE-AVX intrinsics.
I believe everything that doesn't take a constant integer up to SSE4.2
should now be correct (I don't have any reason to believe that those
that do take constant integers are wrong; they're just more complicated
and I just haven't tested them in detail).
2015-08-31 18:33:55 -07:00
Huon Wilson
29dcff3aa2 Support different scalar integer widths in Rust v. LLVM.
Some x86 C intrinsics are declared to take `int ...` (i.e. exposed in
Rust as `i32`), but LLVM implements them by taking `i8` instead.
2015-08-29 20:11:23 -07:00
Huon Wilson
daf8bdca57 Fix typos in some x86 and arm intrinsics. 2015-08-29 20:11:23 -07:00
Huon Wilson
3e9b726576 Style the generator script more PEP8y. 2015-08-29 19:26:48 -07:00
Huon Wilson
24416a2151 Autogenerate most x86 platform intrinsics. 2015-08-29 15:36:17 -07:00
Huon Wilson
5a167bdb4c Allow unused imports in the generator. 2015-08-29 15:36:17 -07:00
Huon Wilson
bea3f096ee Add support for arbitrary metadata for numbers and widths.
This means that each platform has total control over the formatting info
it needs.
2015-08-29 15:36:16 -07:00
Huon Wilson
083f613044 Autogenerate most ARM platform intrinsics. 2015-08-29 15:36:16 -07:00
Huon Wilson
3ef610b627 Autogenerate most AArch64 platform intrinsics. 2015-08-29 15:36:16 -07:00
Huon Wilson
73811917f4 Add the platform intrinsic generator script.
This python script will consume an appropriately formatted JSON file and
output either a Rust file for use in librustc_platform_intrinsics, or an
extern block for importing the intrinsics in an external library.

The --help flag has details.
2015-08-29 15:36:16 -07:00