`fast-math` implies things like functions not being able to accept as an
argument or return as a result, say, `inf` which made these functions
confusingly named or behaving incorrectly, depending on how you
interpret it. Since the time when these intrinsics have been implemented
the intrinsics user's (stdsimd) approach has changed significantly and
so now it is required that these intrinsics operate normally rather than
in "whatever" way.
Fixes#84268
LLVM supports many functions from math.h in its IR. Many of these have
single-instruction variants on various platforms. So, let's add them so
std::arch can use them.
Yes, exact comparison is intentional: rounding must always return a
valid integer-equal value, except for inf/NAN.
Categorize and explain target features support
There are 3 different uses of the `-C target-feature` args passed to rustc:
1. All of the features are passed to LLVM, which uses them to configure code-generation. This is sort-of stabilized since 1.0 though LLVM does change/add/remove target features regularly.
2. Target features which are in [the compiler's allowlist](69e1d22ddb/compiler/rustc_codegen_ssa/src/target_features.rs (L12-L34)) can be used in `cfg!(target_feature)` etc. These may have different names than in LLVM and are renamed before passing them to LLVM.
3. Target features which are in the allowlist and which are stabilized or feature-gate-enabled can be used in `#[target_feature]`.
It can be confusing that `rustc --print target-features` just prints out the LLVM features without separating out the rustc features or even mentioning that the dichotomy exists.
This improves the situation by separating out the rustc and LLVM target features and adding a brief explanation about the difference.
Abbreviated Example Output:
```
$ rustc --print target-features
Features supported by rustc for this target:
adx - Support ADX instructions.
aes - Enable AES instructions.
...
xsaves - Support xsaves instructions.
crt-static - Enables libraries with C Run-time Libraries(CRT) to be statically linked.
Code-generation features supported by LLVM for this target:
16bit-mode - 16-bit mode (i8086).
32bit-mode - 32-bit mode (80386).
...
x87 - Enable X87 float instructions.
xop - Enable XOP instructions.
Use +feature to enable a feature, or -feature to disable it.
For example, rustc -C target-cpu=mycpu -C target-feature=+feature1,-feature2
Code-generation features cannot be used in cfg or #[target_feature],
and may be renamed or removed in a future version of LLVM or rustc.
```
Motivated by #83975.
CC https://github.com/rust-lang/rust/issues/49653
This commit implements the idea of a new ABI for the WebAssembly target,
one called `"wasm"`. This ABI is entirely of my own invention
and has no current precedent, but I think that the addition of this ABI
might help solve a number of issues with the WebAssembly targets.
When `wasm32-unknown-unknown` was first added to Rust I naively
"implemented an abi" for the target. I then went to write `wasm-bindgen`
which accidentally relied on details of this ABI. Turns out the ABI
definition didn't match C, which is causing issues for C/Rust interop.
Currently the compiler has a "wasm32 bindgen compat" ABI which is the
original implementation I added, and it's purely there for, well,
`wasm-bindgen`.
Another issue with the WebAssembly target is that it's not clear to me
when and if the default C ABI will change to account for WebAssembly's
multi-value feature (a feature that allows functions to return multiple
values). Even if this does happen, though, it seems like the C ABI will
be guided based on the performance of WebAssembly code and will likely
not match even what the current wasm-bindgen-compat ABI is today. This
leaves a hole in Rust's expressivity in binding WebAssembly where given
a particular import type, Rust may not be able to import that signature
with an updated C ABI for multi-value.
To fix these issues I had the idea of a new ABI for WebAssembly, one
called `wasm`. The definition of this ABI is "what you write
maps straight to wasm". The goal here is that whatever you write down in
the parameter list or in the return values goes straight into the
function's signature in the WebAssembly file. This special ABI is for
intentionally matching the ABI of an imported function from the
environment or exporting a function with the right signature.
With the addition of a new ABI, this enables rustc to:
* Eventually remove the "wasm-bindgen compat hack". Once this
ABI is stable wasm-bindgen can switch to using it everywhere.
Afterwards the wasm32-unknown-unknown target can have its default ABI
updated to match C.
* Expose the ability to precisely match an ABI signature for a
WebAssembly function, regardless of what the C ABI that clang chooses
turns out to be.
* Continue to evolve the definition of the default C ABI to match what
clang does on all targets, since the purpose of that ABI will be
explicitly matching C rather than generating particular function
imports/exports.
Naturally this is implemented as an unstable feature initially, but it
would be nice for this to get stabilized (if it works) in the near-ish
future to remove the wasm32-unknown-unknown incompatibility with the C
ABI. Doing this, however, requires the feature to be on stable because
wasm-bindgen works with stable Rust.
Allow specifying alignment for functions
Fixes#75072
This allows the user to specify alignment for functions, which can be useful for low level work where functions need to necessarily be aligned to a specific value.
I believe the error cases not covered in the match are caught earlier based on my testing so I had them just return `None`.
Set dso_local for hidden, private and local items
This should probably have no real effect in most cases, as e.g. `hidden`
visibility already implies `dso_local` (or at least LLVM IR does not
preserve the `dso_local` setting if the item is already `hidden`), but
it should fix `-Crelocation-model=static` and improve codegen in
executables.
Note that this PR does not exhaustively port the logic in [clang], only the
portion that is necessary to fix a regression from LLVM 12 that relates to
`-Crelocation_model=static`.
Fixes#83335
[clang]: 3001d080c8/clang/lib/CodeGen/CodeGenModule.cpp (L945-L1039)
Use FromStr trait for number option parsing
Replace `parse_uint` with generic `parse_number` based on `FromStr`.
Use it for parsing inlining threshold to avoid casting later.
Allow clobbering unsupported registers in asm!
Previously registers could only be marked as clobbered if the target feature for that register was enabled. This restriction is now removed.
cc #81092
r? ``@nagisa``
Translate counters from Rust 1-based to LLVM 0-based counter ids
A colleague contacted me and asked why Rust's counters start at 1, when
Clangs appear to start at 0. There is a reason why Rust's internal
counters start at 1 (see the docs), and I tried to keep them consistent
when codegenned to LLVM's coverage mapping format. LLVM should be
tolerant of missing counters, but as my colleague pointed out,
`llvm-cov` will silently fail to generate a coverage report for a
function based on LLVM's assumption that the counters are 0-based.
See:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp#L170
Apparently, if, for example, a function has no branches, it would have
exactly 1 counter. `CounterValues.size()` would be 1, and (with the
1-based index), the counter ID would be 1. This would fail the check
and abort reporting coverage for the function.
It turns out that by correcting for this during coverage map generation,
by subtracting 1 from the Rust Counter ID (both when generating the
counter increment intrinsic call, and when adding counters to the map),
some uncovered functions (including in tests) now appear covered! This
corrects the coverage for a few tests!
r? `@tmandry`
FYI: `@wesleywiser`
A colleague contacted me and asked why Rust's counters start at 1, when
Clangs appear to start at 0. There is a reason why Rust's internal
counters start at 1 (see the docs), and I tried to keep them consistent
when codegenned to LLVM's coverage mapping format. LLVM should be
tolerant of missing counters, but as my colleague pointed out,
`llvm-cov` will silently fail to generate a coverage report for a
function based on LLVM's assumption that the counters are 0-based.
See:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/ProfileData/Coverage/CoverageMapping.cpp#L170
Apparently, if, for example, a function has no branches, it would have
exactly 1 counter. `CounterValues.size()` would be 1, and (with the
1-based index), the counter ID would be 1. This would fail the check
and abort reporting coverage for the function.
It turns out that by correcting for this during coverage map generation,
by subtracting 1 from the Rust Counter ID (both when generating the
counter increment intrinsic call, and when adding counters to the map),
some uncovered functions (including in tests) now appear covered! This
corrects the coverage for a few tests!
This should have no real effect in most cases, as e.g. `hidden`
visibility already implies `dso_local` (or at least LLVM IR does not
preserve the `dso_local` setting if the item is already `hidden`), but
it should fix `-Crelocation-model=static` and improve codegen in
executables.
Note that this PR does not exhaustively port the logic in [clang]. Only
the obviously correct portion and what is necessary to fix a regression
from LLVM 12 that relates to `-Crelocation_model=static`.
Fixes#83335
[clang]: 3001d080c8/clang/lib/CodeGen/CodeGenModule.cpp (L945-L1039)
Run LLVM coverage instrumentation passes before optimization passes
This matches the behavior of Clang and allows us to remove several
hacks which were needed to ensure functions weren't optimized away
before reaching the instrumentation pass.
Fixes#83429
cc `@richkadel`
r? `@tmandry`
This matches the behavior of Clang and allows us to remove several
hacks which were needed to ensure functions weren't optimized away
before reaching the instrumentation pass.
- Add back `HirIdVec`, with a comment that it will soon be used.
- Add back `*_region` functions, with a comment they may soon be used.
- Remove `-Z borrowck_stats` completely. It didn't do anything.
- Remove `make_nop` completely.
- Add back `current_loc`, which is used by an out-of-tree tool.
- Fix style nits
- Remove `AtomicCell` with `cfg(parallel_compiler)` for consistency.
Found with https://github.com/est31/warnalyzer.
Dubious changes:
- Is anyone else using rustc_apfloat? I feel weird completely deleting
x87 support.
- Maybe some of the dead code in rustc_data_structures, in case someone
wants to use it in the future?
- Don't change rustc_serialize
I plan to scrap most of the json module in the near future (see
https://github.com/rust-lang/compiler-team/issues/418) and fixing the
tests needed more work than I expected.
TODO: check if any of the comments on the deleted code should be kept.
Import small cold functions
The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be code generated in at least one
CGU and available for import.
For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.
In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, no matter how trivial the
implementation.
Increase slightly the default multiplier from 0 to 0.1.
r? `@ghost`
coverage bug fixes and optimization support
Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to
address multiple, somewhat related issues.
Fixed a significant flaw in prior coverage solution: Every counter
generated a new counter variable, but there should have only been one
counter variable per function. This appears to have bloated .profraw
files significantly. (For a small program, it increased the size by
about 40%. I have not tested large programs, but there is anecdotal
evidence that profraw files were way too large. This is a good fix,
regardless, but hopefully it also addresses related issues.
Fixes: #82144
Invalid LLVM coverage data produced when compiled with -C opt-level=1
Existing tests now work up to at least `opt-level=3`. This required a
detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR
when compiled with coverage, and a lot of trial and error with codegen
adjustments.
The biggest hurdle was figuring out how to continue to support coverage
results for unused functions and generics. Rust's coverage results have
three advantages over Clang's coverage results:
1. Rust's coverage map does not include any overlapping code regions,
making coverage counting unambiguous.
2. Rust generates coverage results (showing zero counts) for all unused
functions, including generics. (Clang does not generate coverage for
uninstantiated template functions.)
3. Rust's unused functions produce minimal stubbed functions in LLVM IR,
sufficient for including in the coverage results; while Clang must
generate the complete LLVM IR for each unused function, even though
it will never be called.
This PR removes the previous hack of attempting to inject coverage into
some other existing function instance, and generates dedicated instances
for each unused function. This change, and a few other adjustments
(similar to what is required for `-C link-dead-code`, but with lower
impact), makes it possible to support LLVM optimizations.
Fixes: #79651
Coverage report: "Unexecuted instantiation:..." for a generic function
from multiple crates
Fixed by removing the aforementioned hack. Some "Unexecuted
instantiation" notices are unavoidable, as explained in the
`used_crate.rs` test, but `-Zinstrument-coverage` has new options to
back off support for either unused generics, or all unused functions,
which avoids the notice, at the cost of less coverage of unused
functions.
Fixes: #82875
Invalid LLVM coverage data produced with crate brotli_decompressor
Fixed by disabling the LLVM function attribute that forces inlining, if
`-Z instrument-coverage` is enabled. This attribute is applied to
Rust functions with `#[inline(always)], and in some cases, the forced
inlining breaks coverage instrumentation and reports.
FYI: `@wesleywiser`
r? `@tmandry`
The frontend shouldn't be deciding whether or not to use mutable
noalias attributes, as this is a pure LLVM concern. Only provide
the necessary information and do the actual decision in
codegen_llvm.
* Use Markdown list syntax and unindent a bit to prevent Markdown
interpreting the nested lists as code blocks
* A few more small typographical cleanups
Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to
address multiple, somewhat related issues.
Fixed a significant flaw in prior coverage solution: Every counter
generated a new counter variable, but there should have only been one
counter variable per function. This appears to have bloated .profraw
files significantly. (For a small program, it increased the size by
about 40%. I have not tested large programs, but there is anecdotal
evidence that profraw files were way too large. This is a good fix,
regardless, but hopefully it also addresses related issues.
Fixes: #82144
Invalid LLVM coverage data produced when compiled with -C opt-level=1
Existing tests now work up to at least `opt-level=3`. This required a
detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR
when compiled with coverage, and a lot of trial and error with codegen
adjustments.
The biggest hurdle was figuring out how to continue to support coverage
results for unused functions and generics. Rust's coverage results have
three advantages over Clang's coverage results:
1. Rust's coverage map does not include any overlapping code regions,
making coverage counting unambiguous.
2. Rust generates coverage results (showing zero counts) for all unused
functions, including generics. (Clang does not generate coverage for
uninstantiated template functions.)
3. Rust's unused functions produce minimal stubbed functions in LLVM IR,
sufficient for including in the coverage results; while Clang must
generate the complete LLVM IR for each unused function, even though
it will never be called.
This PR removes the previous hack of attempting to inject coverage into
some other existing function instance, and generates dedicated instances
for each unused function. This change, and a few other adjustments
(similar to what is required for `-C link-dead-code`, but with lower
impact), makes it possible to support LLVM optimizations.
Fixes: #79651
Coverage report: "Unexecuted instantiation:..." for a generic function
from multiple crates
Fixed by removing the aforementioned hack. Some "Unexecuted
instantiation" notices are unavoidable, as explained in the
`used_crate.rs` test, but `-Zinstrument-coverage` has new options to
back off support for either unused generics, or all unused functions,
which avoids the notice, at the cost of less coverage of unused
functions.
Fixes: #82875
Invalid LLVM coverage data produced with crate brotli_decompressor
Fixed by disabling the LLVM function attribute that forces inlining, if
`-Z instrument-coverage` is enabled. This attribute is applied to
Rust functions with `#[inline(always)], and in some cases, the forced
inlining breaks coverage instrumentation and reports.
Make source-based code coverage compatible with MIR inlining
When codegenning code coverage use the instance that coverage data was
originally generated for, to ensure basic level of compatibility with
MIR inlining.
Fixes#83061
Adjust `-Ctarget-cpu=native` handling in cg_llvm
When cg_llvm encounters the `-Ctarget-cpu=native` it computes an
explciit set of features that applies to the target in order to
correctly compile code for the host CPU (because e.g. `skylake` alone is
not sufficient to tell if some of the instructions are available or
not).
However there were a couple of issues with how we did this. Firstly, the
order in which features were overriden wasn't quite right – conceptually
you'd expect `-Ctarget-cpu=native` option to override the features that
are implicitly set by the target definition. However due to how other
`-Ctarget-cpu` values are handled we must adopt the following order
of priority:
* Features from -Ctarget-cpu=*; are overriden by
* Features implied by --target; are overriden by
* Features from -Ctarget-feature; are overriden by
* function specific features.
Another problem was in that the function level `target-features`
attribute would overwrite the entire set of the globally enabled
features, rather than just the features the
`#[target_feature(enable/disable)]` specified. With something like
`-Ctarget-cpu=native` we'd end up in a situation wherein a function
without `#[target_feature(enable)]` annotation would have a broader
set of features compared to a function with one such attribute. This
turned out to be a cause of heavy run-time regressions in some code
using these function-level attributes in conjunction with
`-Ctarget-cpu=native`, for example.
With this PR rustc is more careful about specifying the entire set of
features for functions that use `#[target_feature(enable/disable)]` or
`#[instruction_set]` attributes.
Sadly testing the original reproducer for this behaviour is quite
impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in
particular on developer or CI machines.
cc https://github.com/rust-lang/rust/issues/83027 `@BurntSushi`
When cg_llvm encounters the `-Ctarget-cpu=native` it computes an
explciit set of features that applies to the target in order to
correctly compile code for the host CPU (because e.g. `skylake` alone is
not sufficient to tell if some of the instructions are available or
not).
However there were a couple of issues with how we did this. Firstly, the
order in which features were overriden wasn't quite right – conceptually
you'd expect `-Ctarget-cpu=native` option to override the features that
are implicitly set by the target definition. However due to how other
`-Ctarget-cpu` values are handled we must adopt the following order
of priority:
* Features from -Ctarget-cpu=*; are overriden by
* Features implied by --target; are overriden by
* Features from -Ctarget-feature; are overriden by
* function specific features.
Another problem was in that the function level `target-features`
attribute would overwrite the entire set of the globally enabled
features, rather than just the features the
`#[target_feature(enable/disable)]` specified. With something like
`-Ctarget-cpu=native` we'd end up in a situation wherein a function
without `#[target_feature(enable)]` annotation would have a broader
set of features compared to a function with one such attribute. This
turned out to be a cause of heavy run-time regressions in some code
using these function-level attributes in conjunction with
`-Ctarget-cpu=native`, for example.
With this PR rustc is more careful about specifying the entire set of
features for functions that use `#[target_feature(enable/disable)]` or
`#[instruction_set]` attributes.
Sadly testing the original reproducer for this behaviour is quite
impossible – we cannot rely on `-Ctarget-cpu=native` to be anything in
particular on developer or CI machines.
Allow rustdoc to handle asm! of foreign architectures
This allows rustdoc to process code containing `asm!` for architectures other than the current one. Since this never reaches codegen, we just replace target-specific registers and register classes with a dummy one.
Fixes#82869
Consider functions to be reachable for code coverage purposes, either
when they reach the code generation directly, or indirectly as inlined
part of another function.
When codegenning code coverage use the instance that coverage data was
originally generated for, to ensure basic level of compatibility with
MIR inlining.
The Rust code is often written under an assumption that for generic
methods inline attribute is mostly unnecessary, since for optimized
builds using ThinLTO, a method will be generated in at least one CGU and
available for import.
For example, deref implementations for Box, Vec, MutexGuard, and
MutexGuard are not currently marked as inline, neither is identity
implementation of From trait.
In PGO builds, when functions are determined to be cold, the default
multiplier of zero will stop the import, even for completely trivial
functions.
Increase slightly the default multiplier from 0 to 0.1 to import them
regardless.