This adds support for fused multiply-add and multiply-subtract vector intrinsics for 128 and 256-bit vectors of `f32` and `f64`. These correspond to the intrinsics [listed here](https://software.intel.com/en-us/node/523929) except for the `_ss` and `_sd` variants. The intrinsics added are:
* `fmadd`
* `fmaddsub`
* `fmsub`
* `fmsubadd`
* `fnmadd`
* `fnmsub`
The “fma” target feature must be enabled by passing `-C target-feature=+fma` to rustc when using these, otherwise LLVM will complain.
I verified locally that the `x86_mm256_fmadd_ps` and `x86_mm256_fmsub_ps` work.