rustc: Improving safe wasm float->int casts
This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the `nontrapping-fptoint` feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.
Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the `fptosi` and `fptoui`
instructions are defined to return an `undef` value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing `i32.trunc_f32_s` (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the `fptosi` instruction
doesn't trap. Overall this means that a function like this:
#[no_mangle]
pub unsafe extern "C" fn cast(x: f64) -> u32 {
x as u32
}
will generate this WebAssembly today:
(func $cast (type 0) (param f64) (result i32)
(local i32 i32)
local.get 0
f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
f64.gt
local.set 1
block ;; label = @1
block ;; label = @2
local.get 0
f64.const 0x0p+0 (;=0;)
local.get 0
f64.const 0x0p+0 (;=0;)
f64.gt
select
local.tee 0
f64.const 0x1p+32 (;=4.29497e+09;)
f64.lt
local.get 0
f64.const 0x0p+0 (;=0;)
f64.ge
i32.and
i32.eqz
br_if 0 (;@2;)
local.get 0
i32.trunc_f64_u
local.set 2
br 1 (;@1;)
end
i32.const 0
local.set 2
end
i32.const -1
local.get 2
local.get 1
select)
This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing `fptosi`, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:
(func $cast (type 0) (param f64) (result i32)
(local i32)
block ;; label = @1
local.get 0
f64.const 0x0p+0 (;=0;)
f64.ge
local.tee 1
i32.const 1
i32.xor
br_if 0 (;@1;)
local.get 0
f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
f64.le
i32.eqz
br_if 0 (;@1;)
local.get 0
i32.trunc_f64_u
return
end
i32.const -1
i32.const 0
local.get 1
select)
For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:
(func $cast (type 0) (param f64) (result i32)
block ;; label = @1
local.get 0
f64.const 0x1p+32 (;=4.29497e+09;)
f64.lt
local.get 0
f64.const 0x0p+0 (;=0;)
f64.ge
i32.and
i32.eqz
br_if 0 (;@1;)
local.get 0
i32.trunc_f64_u
return
end
i32.const 0)
So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the `i32.trunc_sat_f32_s` instruction rather than always
replacing out-of-bounds values with zero.
There is still work that could be done to improve casts such as `f32` to
`u8`. That form of cast still uses the `fptosi` instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...
Closes#73591
Move bulk of BTreeMap::insert method down to new method on handle
Adjust the boundary between the map and node layers for insertion: do more in the node layer, keep root manipulation and pointer dereferencing separate. No change in undefined behaviour or performance.
r? @Mark-Simulacrum
Fix change detection in CfgSimplifier::collapse_goto_chain
Check that the old target is different from the new collapsed one, before concluding that anything changed.
Fixes#75074Fixes#75051
tests: Ignore src/test/debuginfo/rc_arc.rs on Windows
It requires loading pretty-printers (`src\etc\gdb_load_rust_pretty_printers.py`), but GDB doesn't load them on Windows.
Not sure how this passes through CI, due to an old GDB version perhaps?
Introduce an abstraction for EvaluationCache and SelectionCache
The small duplicated code has been moved to librustc_query_system.
The remaining changes are some cleanups of structural impls.
Remove `GCX_PTR`.
We store an `ImplicitCtxt` pointer in a thread-local value (TLV). This allows
implicit access to a `GlobalCtxt` and some other things.
We also store a `GlobalCtxt` pointer in `GCX_PTR`. This is always the same
`GlobalCtxt` as the one within the `ImplicitCtxt` pointer in TLV. `GCX_PTR`
is only used in the parallel compiler's `handle_deadlock()` function.
This commit does the following.
- It removes `GCX_PTR`.
- It also adds `ImplicitCtxt::new()`, which constructs an `ImplicitCtxt` from a
`GlobalCtxt`. `ImplicitCtxt::new()` + `tls::enter_context()` is now
equivalent to the old `tls::enter_global()`.
- Makes `tls::get_tlv()` public for the parallel compiler, because it's
now used in `handle_deadlock()`.
r? @petrochenkov
Stabilize `Result::as_deref` and `as_deref_mut`
FCP completed in https://github.com/rust-lang/rust/issues/50264#issuecomment-645681400.
This PR stabilizes two new APIs for `std::result::Result`:
```rust
fn as_deref(&self) -> Result<&T::Target, &E> where T: Deref;
fn as_deref_mut(&mut self) -> Result<&mut T::Target, &mut E> where T: DerefMut;
```
This PR also removes two rarely used unstable APIs from `Result`:
```rust
fn as_deref_err(&self) -> Result<&T, &E::Target> where E: Deref;
fn as_deref_mut_err(&mut self) -> Result<&mut T, &mut E::Target> where E: DerefMut;
```
Closes#50264
We store an `ImplicitCtxt` pointer in a thread-local value (TLV). This allows
implicit access to a `GlobalCtxt` and some other things.
We also store a `GlobalCtxt` pointer in `GCX_PTR`. This is always the same
`GlobalCtxt` as the one within the `ImplicitCtxt` pointer in TLV. `GCX_PTR`
is only used in the parallel compiler's `handle_deadlock()` function.
This commit does the following.
- It removes `GCX_PTR`.
- It also adds `ImplicitCtxt::new()`, which constructs an `ImplicitCtxt` from a
`GlobalCtxt`. `ImplicitCtxt::new()` + `tls::enter_context()` is now
equivalent to the old `tls::enter_global()`.
- Makes `tls::get_tlv()` public for the parallel compiler, because it's
now used in `handle_deadlock()`.
Rollup of 5 pull requests
Successful merges:
- #74980 (pprust: adjust mixed comment printing and add regression test for #74745)
- #75009 (Document the discrepancy in the mask type for _mm_shuffle_ps)
- #75031 (Do not trigger `unused_{braces,parens}` lints with `yield`)
- #75059 (fix typos)
- #75064 (compiletest: Support ignoring tests requiring missing LLVM components)
Failed merges:
r? @ghost
compiletest: Support ignoring tests requiring missing LLVM components
This PR implements a more principled solution to the problem described in https://github.com/rust-lang/rust/pull/66084.
Builds of LLVM backends take a lot of time and disk space.
So it usually makes sense to build rustc with
```toml
[llvm]
targets = "X86"
experimental-targets = ""
```
unless you are working on some target-specific tasks.
A few tests, however, require non-x86 backends to be built.
A new test directive `// needs-llvm-components: component1 component2 component3` makes such tests to be automatically ignored if one of the listed components is missing in the provided LLVM (this is determined through `llvm-config --components`).
As a result, the test suite now fully passes with LLVM built only with the x86 backend. The component list in this case is
```
aggressiveinstcombine all all-targets analysis asmparser asmprinter binaryformat bitreader bitstreamreader bitwriter cfguard codegen core coroutines coverage debuginfocodeview debuginfodwarf debuginfogsym debuginfomsf debuginfopdb demangle dlltooldriver dwarflinker engine executionengine frontendopenmp fuzzmutate globalisel instcombine instrumentation interpreter ipo irreader jitlink libdriver lineeditor linker lto mc mca mcdisassembler mcjit mcparser mirparser native nativecodegen objcarcopts object objectyaml option orcerror orcjit passes profiledata remarks runtimedyld scalaropts selectiondag support symbolize tablegen target textapi transformutils vectorize windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86utils xray
```
(With the default target list it's much larger.)
```
aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter cfguard codegen core coroutines coverage debuginfocodeview debuginfodwarf debuginfogsym debuginfomsf debuginfopdb demangle dlltooldriver dwarflinker engine executionengine frontendopenmp fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo instcombine instrumentation interpreter ipo irreader jitlink libdriver lineeditor linker lto mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts object objectyaml option orcerror orcjit passes powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvutils runtimedyld scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target textapi transformutils vectorize webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86utils xray
```
https://github.com/rust-lang/rust/pull/66084 is also reverted now.
r? @Mark-Simulacrum
pprust: adjust mixed comment printing and add regression test for #74745Fixes#74745.
This PR adds a regression test for #74745. While a `ignore-tidy-trailing-lines` header is required, this doesn't stop the test from reproducing, so long as there is no newline at the end of the file.
However, adding the header comments made the test fail due to a bug in pprust - so this PR also adjusts the pretty printing of mixed comments so that the initial zero-break isn't emitted at the beginning of the line. Through this, the `block-comment-wchar` test can have the `pp-exact` file removed, as it no longer converges from pretty printing of the source.
Introduce NonterminalKind for more type-safe mbe parsing
It encapsulate the (part of) the interface between the parser and
macro by example (macro_rules) parser.
The second bit is somewhat more general `parse_ast_fragment`, which is
the reason why we keep some `parse_xxx` functions as public.
Fix ICEs with `@ ..` binding
This reverts #74557 and introduces an alternative fix while ensuring that #74954 is not broken.
The diagnostics are verbose though, it fixes three related issues.
cc #74954, #74539, and #74702
Rollup of 10 pull requests
Successful merges:
- #74686 (BTreeMap: remove into_slices and its unsafe block)
- #74762 (BTreeMap::drain_filter should not touch the root during iteration)
- #74781 (Clean up E0733 explanation)
- #74874 (BTreeMap: define forget_type only when relevant)
- #74974 (Make tests faster in Miri)
- #75010 (Update elasticlunr-rs and ammonia transitive deps)
- #75041 (Replaced log with tracing crate)
- #75044 (Clean up E0744 explanation)
- #75054 (Rename rustc_middle::cstore::DepKind to CrateDepKind)
- #75057 (Avoid dumping rustc invocations to stdout)
Failed merges:
- #74827 (Move bulk of BTreeMap::insert method down to new method on handle)
r? @ghost
Avoid dumping rustc invocations to stdout
These are quite long, usually, and in most cases not interesting. On smaller
terminals they can take up more than a full page of output, hiding the error
diagnostics emitted.
BTreeMap: define forget_type only when relevant
Similar to `forget_node_type` for handles.
No effect on generated code, apart maybe from the superfluous calls that might not have been optimized away.
r? @Mark-Simulacrum
BTreeMap::drain_filter should not touch the root during iteration
Although Miri doesn't point it out, I believe there is undefined behaviour using `drain_filter` when draining the 11th-last element from a tree that was larger. When this happens, the last remaining child nodes are merged, the root becomes empty and is popped from the tree. That last step establishes a mutable reference to the node elected root and writes a pointer in `node::Root`, while iteration continues to visit the same node.
This is mostly code from #74437, slightly adapted.
These are quite long, usually, and in most cases not interesting. On smaller
terminals they can take up more than a full page of output, hiding the error
diagnostics emitted.
This commit modifies compiletest so that a diff of actual and expected
output is shown for pretty tests. This makes it far easier to work out
what has changed.
Signed-off-by: David Wood <david@davidtw.co>
This commit adds a regression test for #74745. While a
`ignore-tidy-trailing-lines` header is required, this doesn't stop the
test from reproducing, so long as there is no newline at the end of the
file.
However, adding the header comments made the test fail due to a bug in
pprust, fixed in the previous commit.
Signed-off-by: David Wood <david@davidtw.co>
This commit adjusts the pretty printing of mixed comments so that the
initial zero-break isn't emitted at the beginning of the line. Through
this, the `block-comment-wchar` test can have the `pp-exact` file
removed, as it no longer converges from pretty printing of the source.
Signed-off-by: David Wood <david@davidtw.co>
Add fallible AArch64 CI builder
This adds the `aarch64-gnu` CI builder to the `auto-fallible` job, as a first step in the process of actually gating on it.
r? @Mark-Simulacrum
Deduplicate `::` -> `:` typo errors
Deduplicate errors caused by the same type ascription typo, including
ones suggested during parsing that would get reported again during
resolve. Fix#70382.