Save/restore more items in cache with incremental compilation
Right now they don't play very well together, consider a simple example:
```
$ export RUSTFLAGS="--emit asm"
$ cargo new --lib foo
Created library `foo` package
$ cargo build -q
$ touch src/lib.rs
$ cargo build
error: could not copy
"/path/to/foo/target/debug/deps/foo-e307cc7fa7b6d64f.4qbzn9k8mosu50a5.rcgu.s"
to "/path/to/foo/target/debug/deps/foo-e307cc7fa7b6d64f.s":
No such file or directory (os error 2)
```
Touch triggers the rebuild, incremental compilation detects no changes (yay) and everything explodes while trying to copy files were they should go.
This pull request fixes it by copying and restoring more files in the incremental compilation cache
Fixes https://github.com/rust-lang/rust/issues/89149
Fixes https://github.com/rust-lang/rust/issues/88829
Related: https://internals.rust-lang.org/t/interaction-between-incremental-compilation-and-emit/20551
rename ptr::from_exposed_addr -> ptr::with_exposed_provenance
As discussed on [Zulip](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/To.20expose.20or.20not.20to.20expose/near/427757066).
The old name, `from_exposed_addr`, makes little sense as it's not the address that is exposed, it's the provenance. (`ptr.expose_addr()` stays unchanged as we haven't found a better option yet. The intended interpretation is "expose the provenance and return the address".)
The new name nicely matches `ptr::without_provenance`.
Add `Ord::cmp` for primitives as a `BinOp` in MIR
Update: most of this OP was written months ago. See https://github.com/rust-lang/rust/pull/118310#issuecomment-2016940014 below for where we got to recently that made it ready for review.
---
There are dozens of reasonable ways to implement `Ord::cmp` for integers using comparison, bit-ops, and branches. Those differences are irrelevant at the rust level, however, so we can make things better by adding `BinOp::Cmp` at the MIR level:
1. Exactly how to implement it is left up to the backends, so LLVM can use whatever pattern its optimizer best recognizes and cranelift can use whichever pattern codegens the fastest.
2. By not inlining those details for every use of `cmp`, we drastically reduce the amount of MIR generated for `derive`d `PartialOrd`, while also making it more amenable to MIR-level optimizations.
Having extremely careful `if` ordering to μoptimize resource usage on broadwell (#63767) is great, but it really feels to me like libcore is the wrong place to put that logic. Similarly, using subtraction [tricks](https://graphics.stanford.edu/~seander/bithacks.html#CopyIntegerSign) (#105840) is arguably even nicer, but depends on the optimizer understanding it (https://github.com/llvm/llvm-project/issues/73417) to be practical. Or maybe [bitor is better than add](https://discourse.llvm.org/t/representing-in-ir/67369/2?u=scottmcm)? But maybe only on a future version that [has `or disjoint` support](https://discourse.llvm.org/t/rfc-add-or-disjoint-flag/75036?u=scottmcm)? And just because one of those forms happens to be good for LLVM, there's no guarantee that it'd be the same form that GCC or Cranelift would rather see -- especially given their very different optimizers. Not to mention that if LLVM gets a spaceship intrinsic -- [which it should](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Suboptimal.20inlining.20in.20std.20function.20.60binary_search.60/near/404250586) -- we'll need at least a rustc intrinsic to be able to call it.
As for simplifying it in Rust, we now regularly inline `{integer}::partial_cmp`, but it's quite a large amount of IR. The best way to see that is with 8811efa88b (diff-d134c32d028fbe2bf835fef2df9aca9d13332dd82284ff21ee7ebf717bfa4765R113) -- I added a new pre-codegen MIR test for a simple 3-tuple struct, and this PR change it from 36 locals and 26 basic blocks down to 24 locals and 8 basic blocks. Even better, as soon as the construct-`Some`-then-match-it-in-same-BB noise is cleaned up, this'll expose the `Cmp == 0` branches clearly in MIR, so that an InstCombine (#105808) can simplify that to just a `BinOp::Eq` and thus fix some of our generated code perf issues. (Tracking that through today's `if a < b { Less } else if a == b { Equal } else { Greater }` would be *much* harder.)
---
r? `@ghost`
But first I should check that perf is ok with this
~~...and my true nemesis, tidy.~~
Simplify trim-paths feature by merging all debuginfo options together
This PR simplifies the trim-paths feature by merging all debuginfo options together, as described in https://github.com/rust-lang/rust/issues/111540#issuecomment-1994010274.
And also do some correctness fixes found during the review.
cc `@weihanglo`
r? `@michaelwoerister`
Codegen const panic messages as function calls
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting shared library (see [perf]).
A sample improvement from nightly:
```
leaq str.0(%rip), %rdi
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdx
movl $25, %esi
callq *_ZN4core9panicking5panic17h17cabb89c5bcc999E@GOTPCREL(%rip)
```
to this PR:
```
leaq .Lalloc_d6aeb8e2aa19de39a7f0e861c998af13(%rip), %rdi
callq *_RNvNtNtCsduqIKoij8JB_4core9panicking11panic_const23panic_const_div_by_zero@GOTPCREL(%rip)
```
[perf]: https://perf.rust-lang.org/compare.html?start=a7e4de13c1785819f4d61da41f6704ed69d5f203&end=64fbb4f0b2d621ff46d559d1e9f5ad89a8d7789b&stat=instructions:u
Remove `TypeAndMut` from `ty::RawPtr` variant, make it take `Ty` and `Mutability`
Pretty much mechanically converting `ty::RawPtr(ty::TypeAndMut { ty, mutbl })` to `ty::RawPtr(ty, mutbl)` and its fallout.
r? lcnr
cc rust-lang/types-team#124
"Handle" calls to upstream monomorphizations in compiler_builtins
This is pretty cooked, but I think it works.
compiler-builtins has a long-standing problem that at link time, its rlib cannot contain any calls to `core`. And yet, in codegen we _love_ inserting calls to symbols in `core`, generally from various panic entrypoints.
I intend this PR to attack that problem as completely as possible. When we generate a function call, we now check if we are generating a function call from `compiler_builtins` and whether the callee is a function which was not lowered in the current crate, meaning we will have to link to it.
If those conditions are met, actually generating the call is asking for a linker error. So we don't. If the callee diverges, we lower to an abort with the same behavior as `core::intrinsics::abort`. If the callee does not diverge, we produce an error. This means that compiler-builtins can contain panics, but they'll SIGILL instead of panicking. I made non-diverging calls a compile error because I'm guessing that they'd mostly get into compiler-builtins by someone making a mistake while working on the crate, and compile errors are better than linker errors. We could turn such calls into aborts as well if that's preferred.
This skips emitting extra arguments at every callsite (of which there
can be many). For a librustc_driver build with overflow checks enabled,
this cuts 0.7MB from the resulting binary.
add test ensuring simd codegen checks don't run when a static assertion failed
stdarch relies on this to ensure that SIMD indices are in bounds.
I would love to know why this works, but I can't figure out where codegen decides to not codegen a function if a required-const does not evaluate. `@oli-obk` `@bjorn3` do you have any idea?
Distinguish between library and lang UB in assert_unsafe_precondition
As described in https://github.com/rust-lang/rust/pull/121583#issuecomment-1963168186, `assert_unsafe_precondition` now explicitly distinguishes between language UB (conditions we explicitly optimize on) and library UB (things we document you shouldn't do, and maybe some library internals assume you don't do).
`debug_assert_nounwind` was originally added to avoid the "only at runtime" aspect of `assert_unsafe_precondition`. Since then the difference between the macros has gotten muddied. This totally revamps the situation.
Now _all_ preconditions shall be checked with `assert_unsafe_precondition`. If you have a precondition that's only checkable at runtime, do a `const_eval_select` hack, as done in this PR.
r? RalfJung
Add asm goto support to `asm!`
Tracking issue: #119364
This PR implements asm-goto support, using the syntax described in "future possibilities" section of [RFC2873](https://rust-lang.github.io/rfcs/2873-inline-asm.html#asm-goto).
Currently I have only implemented the `label` part, not the `fallthrough` part (i.e. fallthrough is implicit). This doesn't reduce the expressive though, since you can use label-break to get arbitrary control flow or simply set a value and rely on jump threading optimisation to get the desired control flow. I can add that later if deemed necessary.
r? ``@Amanieu``
cc ``@ojeda``
Remove useless lifetime of ArchiveBuilder
`trait ArchiveBuilder<'a>` has a seemingly useless lifetime a, so I remove it. If this is intentional, please reject this PR.
```rust
pub trait ArchiveBuilder<'a> {
fn add_file(&mut self, path: &Path);
fn add_archive(
&mut self,
archive: &Path,
skip: Box<dyn FnMut(&str) -> bool + 'static>,
) -> io::Result<()>;
fn build(self: Box<Self>, output: &Path) -> bool;
}
```
Add "algebraic" fast-math intrinsics, based on fast-math ops that cannot return poison
Setting all of LLVM's fast-math flags makes our fast-math intrinsics very dangerous, because some inputs are UB. This set of flags permits common algebraic transformations, but according to the [LangRef](https://llvm.org/docs/LangRef.html#fastmath), only the flags `nnan` (no nans) and `ninf` (no infs) can produce poison.
And this uses the algebraic float ops to fix https://github.com/rust-lang/rust/issues/120720
cc `@orlp`
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
Replacement of #114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two
This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See #114390 for more discussion.
While I have extended the scope of the power of two optimization from #114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.
Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).
Fixes#47234Resolves#114390
`@Centri3`
Improved support of collapse_debuginfo attribute for macros.
Added walk_chain_collapsed function to consider collapse_debuginfo attribute in parent macros in call chain.
Fixed collapse_debuginfo attribute processing for cranelift (there was if/else branches error swap).
cc https://github.com/rust-lang/rust/issues/100758
Remove `DiagCtxt` API duplication
`DiagCtxt` defines the internal API for creating and emitting diagnostics: methods like `struct_err`, `struct_span_warn`, `note`, `create_fatal`, `emit_bug`. There are over 50 methods.
Some of these methods are then duplicated across several other types: `Session`, `ParseSess`, `Parser`, `ExtCtxt`, and `MirBorrowckCtxt`. `Session` duplicates the most, though half the ones it does are unused. Each duplicated method just calls forward to the corresponding method in `DiagCtxt`. So this duplication exists to (in the best case) shorten chains like `ecx.tcx.sess.parse_sess.dcx.emit_err()` to `ecx.emit_err()`.
This API duplication is ugly and has been bugging me for a while. And it's inconsistent: there's no real logic about which methods are duplicated, and the use of `#[rustc_lint_diagnostic]` and `#[track_caller]` attributes vary across the duplicates.
This PR removes the duplicated API methods and makes all diagnostic creation and emission go through `DiagCtxt`. It also adds `dcx` getter methods to several types to shorten chains. This approach scales *much* better than API duplication; indeed, the PR adds `dcx()` to numerous types that didn't have API duplication: `TyCtxt`, `LoweringCtxt`, `ConstCx`, `FnCtxt`, `TypeErrCtxt`, `InferCtxt`, `CrateLoader`, `CheckAttrVisitor`, and `Resolver`. These result in a lot of changes from `foo.tcx.sess.emit_err()` to `foo.dcx().emit_err()`. (You could do this with more types, but it gets into diminishing returns territory for types that don't emit many diagnostics.)
After all these changes, some call sites are more verbose, some are less verbose, and many are the same. The total number of lines is reduced, mostly because of the removed API duplication. And consistency is increased, because calls to `emit_err` and friends are always preceded with `.dcx()` or `.dcx`.
r? `@compiler-errors`
detects redundant imports that can be eliminated.
for #117772 :
In order to facilitate review and modification, split the checking code and
removing redundant imports code into two PR.
compile-time evaluation: detect writes through immutable pointers
This has two motivations:
- it unblocks https://github.com/rust-lang/rust/pull/116745 (and therefore takes a big step towards `const_mut_refs` stabilization), because we can now detect if the memory that we find in `const` can be interned as "immutable"
- it would detect the UB that was uncovered in https://github.com/rust-lang/rust/pull/117905, which was caused by accidental stabilization of `copy` functions in `const` that can only be called with UB
When UB is detected, we emit a future-compat warn-by-default lint. This is not a breaking change, so completely in line with [the const-UB RFC](https://rust-lang.github.io/rfcs/3016-const-ub.html), meaning we don't need t-lang FCP here. I made the lint immediately show up for dependencies since it is nearly impossible to even trigger this lint without `const_mut_refs` -- the accidentally stabilized `copy` functions are the only way this can happen, so the crates that popped up in #117905 are the only causes of such UB (in the code that crater covers), and the three cases of UB that we know about have all been fixed in their respective crates already.
The way this is implemented is by making use of the fact that our interpreter is already generic over the notion of provenance. For CTFE we now use the new `CtfeProvenance` type which is conceptually an `AllocId` plus a boolean `immutable` flag (but packed for a more efficient representation). This means we can mark a pointer as immutable when it is created as a shared reference. The flag will be propagated to all pointers derived from this one. We can then check the immutable flag on each write to reject writes through immutable pointers.
I just hope perf works out.
Currently, `Handler::fatal` returns `FatalError`. But `Session::fatal`
returns `!`, because it calls `Handler::fatal` and then calls `raise` on
the result. This inconsistency is unfortunate.
This commit changes `Handler::fatal` to do the `raise` itself, changing
its return type to `!`. This is safe because there are only two calls to
`Handler::fatal`, one in `rustc_session` and one in
`rustc_codegen_cranelift`, and they both call `raise` on the result.
`HandlerInner::fatal` still returns `FatalError`, so I renamed it
`fatal_no_raise` to emphasise the return type difference.
share some track_caller logic between interpret and codegen
Also move the code that implements the track_caller intrinsics out of the core interpreter engine -- it's just a helper creating a const-allocation, doesn't need to be part of the interpreter core.
Bump stdarch submodule and remove special handling for LLVM intrinsics that are no longer needed
Bumps stdarch to pull https://github.com/rust-lang/stdarch/pull/1477, which reimplemented some functions with portable SIMD intrinsics instead of arch specific LLVM intrinsics.
Handling of those LLVM intrinsics is removed from cranelift codegen and miri.
cc `@RalfJung` `@bjorn3`