Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
This saves some debug and scope metadata in every single function that calls it.
Normally wouldn't be worth it, but with the derives there's *so* many of these.
refactor check_{lang,library}_ub: use a single intrinsic
This enacts the plan I laid out [here](https://github.com/rust-lang/rust/pull/122282#issuecomment-1996917998): use a single intrinsic, called `ub_checks` (in aniticpation of https://github.com/rust-lang/compiler-team/issues/725), that just exposes the value of `debug_assertions` (consistently implemented in both codegen and the interpreter). Put the language vs library UB logic into the library.
This makes it easier to do something like https://github.com/rust-lang/rust/pull/122282 in the future: that just slightly alters the semantics of `ub_checks` (making it more approximating when crates built with different flags are mixed), but it no longer affects whether these checks can happen in Miri or compile-time.
The first commit just moves things around; I don't think these macros and functions belong into `intrinsics.rs` as they are not intrinsics.
r? `@saethlin`
Fold pointer operations in GVN
This PR proposes 2 combinations of cast operations in MIR GVN:
- a chain of `PtrToPtr` or `MutToConstPointer` casts can be folded together into a single `PtrToPtr` cast;
- we attempt to evaluate more ptr ops when there is no provenance.
In particular, this allows to read from static slices.
This is not yet sufficient to see through slice operations that use `PtrComponents` (because that's a union), but still a step forward.
r? `@ghost`
Sandwich MIR optimizations between DSE.
This PR reorders MIR optimization passes in an attempt to increase their efficiency.
- Stop running CopyProp before GVN, it's useless as GVN will do the same thing anyway. Instead, we perform CopyProp at the end of the pipeline, to ensure we do not emit copy/move chains.
- Run DSE before GVN, as it increases the probability to have single-assignment locals.
- Run DSE after the final CopyProp to turn copies into moves.
r? `@ghost`
Avoid some redundant work in GVN
The first 2 commits are about reducing the perf effect.
Third commit avoids doing redundant work: is a local is SSA, it already has been simplified, and the resulting value is in `self.locals`. No need to call any code on it.
The last commit avoids removing some storage statements.
r? wg-mir-opt
Reorder early post-inlining passes.
`RemoveZsts`, `RemoveUnneededDrops` and `UninhabitedEnumBranching` only depend on types, so they should be executed together early after MIR inlining introduces those types.
This does not change the end-result, but this makes the pipeline a bit more consistent.
adjust how closure/generator types are printed
I saw `&[closure@$DIR/issue-20862.rs:2:5]` and I thought it is a slice type, because that's usually what `&[_]` is... it took me a while to realize that this is just a confusing printer and actually there's no slice. Let's use something that cannot be mistaken for a regular type.