Improve the `array::map` codegen
The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.
There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.
(No libs-api changes; everything should be completely implementation-detail-internal.)
It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞
r? `@thomcc`
As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
%array.i.i.i.i = alloca [64 x i32], align 4
%_3.sroa.5.i.i.i = alloca [65 x i32], align 4
%_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)
But with this PR it's only `sub rsp, 520`
```llvm
start:
%array.i.i.i.i.i.i = alloca [64 x i32], align 4
%array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```
Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
mov esi, 64
mov edi, 0
cmp rdx, 64
je .LBB0_3
lea rsi, [rdx + 1]
mov qword ptr [rsp + 784], rsi
mov r8d, dword ptr [rsp + 4*rdx + 528]
mov edi, 1
lea edx, [r8 + 2*r8]
lea r8d, [r8 + 4*rdx]
add r8d, 7
.LBB0_3:
test edi, edi
je .LBB0_11
mov dword ptr [rsp + 4*rcx + 272], r8d
cmp rsi, 64
jne .LBB0_6
xor r8d, r8d
mov edx, 64
test r8d, r8d
jne .LBB0_8
jmp .LBB0_11
.LBB0_6:
lea rdx, [rsi + 1]
mov qword ptr [rsp + 784], rdx
mov edi, dword ptr [rsp + 4*rsi + 528]
mov r8d, 1
lea esi, [rdi + 2*rdi]
lea edi, [rdi + 4*rsi]
add edi, 7
test r8d, r8d
je .LBB0_11
.LBB0_8:
mov dword ptr [rsp + 4*rcx + 276], edi
add rcx, 2
cmp rcx, 64
jne .LBB0_1
```
whereas with this PR it's unrolled and vectorized
```nasm
vpmulld ymm1, ymm0, ymmword ptr [rsp + 64]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 328], ymm1
vpmulld ymm1, ymm0, ymmword ptr [rsp + 96]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
Rollup of 7 pull requests
Successful merges:
- #107654 (reword descriptions of the deprecated int modules)
- #107915 (Add `array::map` benchmarks)
- #107961 (Avoid copy-pasting the `ilog` panic string in a bunch of places)
- #107962 (Add a doc note about why `Chain` is not `ExactSizeIterator`)
- #107966 (Update browser-ui-test version to 0.14.3)
- #107970 (Hermit: Remove floor symbol)
- #107973 (Fix unintentional UB in SIMD tests)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid copy-pasting the `ilog` panic string in a bunch of places
I also ended up changing the implementations to `if let` because it doesn't work to
```rust
self.checked_ilog2().unwrap_or_else(panic_for_nonpositive_argument)
```
due to the `!`. But as a bonus that meant I could remove the `rustc_allow_const_fn_unstable` too.
Add `array::map` benchmarks
Since there were no previous benchmarks for `array::map`, and it is known to have mediocre/poor performance, add some simple benchmarks. These benchmarks vary the length of the array and size of each item.
avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs
``@Vanille-N`` is working on a successor for Stacked Borrows. It will mostly accept strictly more code than Stacked Borrows did, with one exception: the following pattern no longer works.
```rust
let mut root = 6u8;
let mref = &mut root;
let ptr = mref as *mut u8;
*ptr = 0; // Write
assert_eq!(root, 0); // Parent Read
*ptr = 0; // Attempted Write
```
This worked in Stacked Borrows kind of by accident: when doing the "parent read", under SB we Disable `mref`, but the raw ptrs derived from it remain usable. The fact that we can still use the "children" of a reference that is no longer usable is quite nasty and leads to some undesirable effects (in particular it is the major blocker for resolving https://github.com/rust-lang/unsafe-code-guidelines/issues/257). So in Tree Borrows we no longer do that; instead, reading from `root` makes `mref` and all its children read-only.
Due to other improvements in Tree Borrows, the entire Miri test suite still passes with this new behavior, and even the entire libcore and liballoc test suite, except for these 2 cases this PR fixes. Both of these involve code where the programmer wrote `&mut` but then used pointers derived from that reference in ways that alias with the parent pointer, which arguably is violating uniqueness. They are fixed by properly using raw pointers throughout.
Document `PointerLike`
I forgot to document this, and even though it's currently more of an implementation detail, the old doc was kinda embarrassing 😅
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
Speedup heapsort by 1.5x by making it branchless
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find
a good pivot. By making the core child update code branchless it is much faster.
On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us
Stabilize feature `cstr_from_bytes_until_nul`
This PR seeks to stabilize `cstr_from_bytes_until_nul`.
Partially addresses #95027
This function has only been on nightly for about 10 months, but I think it is simple enough that there isn't harm discussing stabilization. It has also had at least a handful of mentions on both the user forum and the discord, so it seems like it's already in use or at least known.
This needs FCP still.
Comment on potential discussion points:
- eventual conversion of `CStr` to be a single thin pointer: this function will still be useful to provide a safe way to create a `CStr` after this change.
- should this return a length too, to address concerns about the `CStr` change? I don't see it as being particularly useful, and it seems less ergonomic (i.e. returning `Result<(&CStr, usize), FromBytesUntilNulError>`). I think users that also need this length without the additional `strlen` call are likely better off using a combination of other methods, but this is up for discussion
- `CString::from_vec_until_nul`: this is also useful, but it doesn't even have a nightly implementation merged yet. I propose feature gating that separately, as opposed to blocking this `CStr` implementation on that
Possible alternatives:
A user can use `from_bytes_with_nul` on a slice up to `my_slice[..my_slice.iter().find(|c| c == 0).unwrap()]`. However; that is significantly less ergonomic, and is a bit more work for the compiler to optimize compared the direct `memchr` call that this wraps.
## New stable API
```rs
// both in core::ffi
pub struct FromBytesUntilNulError(());
impl CStr {
pub const fn from_bytes_until_nul(
bytes: &[u8]
) -> Result<&CStr, FromBytesUntilNulError>
}
```
cc ```@ericseppanen``` original author, ```@Mark-Simulacrum``` original reviewer, ```@m-ou-se``` brought up some issues on the thin pointer CStr
```@rustbot``` modify labels: +T-libs-api +needs-fcp
Rename `PointerSized` to `PointerLike`
The old name was unnecessarily vague. This PR renames a nightly language feature that I added, so I don't think it needs any additional approval, though anyone can feel free to speak up if you dislike the rename.
It's still unsatisfying that we don't the user which of {size, alignment} is wrong, but this trait really is just a stepping stone for a more generalized mechanism to create `dyn*`, just meant for nightly testing, so I don't think it really deserves additional diagnostic machinery for now.
Fixes#107696, cc ``@RalfJung``
r? ``@eholk``
Mark 'atomic_mut_ptr' methods const
There's nothing that would block these methods from being const (just an UnsafeCell get), and it would be helpful for FFI interfaces in static contexts
Related tracking issue: #66893
improve panic message for slice windows and chunks
before:
```text
thread 'main' panicked at 'size is zero', /rustc/1e225413a21fa69570bd3fefea9eb05e33f8b917/library/core/src/slice/mod.rs:809:44
```
```text
thread 'main' panicked at 'assertion failed: `(left != right)`
left: `0`,
right: `0`: chunks cannot have a size of zero', /rustc/1e225413a21fa69570bd3fefea9eb05e33f8b917/library/core/src/slice/mod.rs:843:9
```
after:
```text
thread 'main' panicked at 'chunk size must be non-zero', src/main.rs:4:22
```
fixes https://github.com/rust-lang/rust/issues/107437
Fixing confusion between mod and remainder
Like many programming languages, rust too confuses remainder and modulus. The `%` operator and the associated `Rem` trait is (as the trait name suggests) the remainder, but since most people are linguistically more familiar with the modulus the documentation sometimes claims otherwise. This PR tries to fix this problem in rustc.
Bump bootstrap compiler to 1.68
This also changes our stage0.json to include the rustc component for the rustfmt pinned nightly toolchain, which is currently necessary due to rustfmt dynamically linking to that toolchain's librustc_driver and libstd.
r? `@pietroalbini`
Remove `GenFuture` from core
The handling of async constructs in the compiler does not rely on `GenFuture` anymore since `1.67`, so this code can now be removed from `core`.
Implement `signum` with `Ord`
Rather than needing to do things like #105840 for `signum` too, might as well just implement that method using `Ord`, since it's doing the same "I need `-1`/`0`/`+1`" behaviour that `cmp` is already doing.
This also seems to slightly improve the assembly: <https://rust.godbolt.org/z/5oEEqbxK1>
Skip possible where_clause_object_safety lints when checking `multiple_supertrait_upcastable`
Fix#106247
To achieve this, I lifted the `WhereClauseReferencesSelf` out from `object_safety_violations` and move it into `is_object_safe` (which is changed to a new query).
cc `@dtolnay`
r? `@compiler-errors`
Remove `ControlFlow::{BREAK, CONTINUE}`
Libs-API decided to remove these in #102697.
Follow-up to #107023, which removed them from `compiler/`, but a couple new ones showed up since that was merged.
r? libs
Libs-API decided to remove these in #102697.
Follow-up to #107023, which removed them from `compiler/`, but a couple new ones showed up since that was merged.
core: Support variety of atomic widths in width-agnostic functions
Before this change, the following functions and macros were annotated with `#[cfg(target_has_atomic = "8")]` or
`#[cfg(target_has_atomic_load_store = "8")]`:
* `atomic_int`
* `strongest_failure_ordering`
* `atomic_swap`
* `atomic_add`
* `atomic_sub`
* `atomic_compare_exchange`
* `atomic_compare_exchange_weak`
* `atomic_and`
* `atomic_nand`
* `atomic_or`
* `atomic_xor`
* `atomic_max`
* `atomic_min`
* `atomic_umax`
* `atomic_umin`
However, none of those functions and macros actually depend on 8-bit width and they are needed for all atomic widths (16-bit, 32-bit, 64-bit etc.). Some targets might not support 8-bit atomics (i.e. BPF, if we would enable atomic CAS for it).
This change fixes that by removing the `"8"` argument from annotations, which results in accepting the whole variety of widths.
Fixes#106845Fixes#106795
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
Rollup of 8 pull requests
Successful merges:
- #106904 (Preserve split DWARF files when building archives.)
- #106971 (Handle diagnostics customization on the fluent side (for one specific diagnostic))
- #106978 (Migrate mir_build's borrow conflicts)
- #107150 (`ty::tls` cleanups)
- #107168 (Use a type-alias-impl-trait in `ObligationForest`)
- #107189 (Encode info for Adt in a single place.)
- #107322 (Custom mir: Add support for some remaining, easy to support constructs)
- #107323 (Disable ConstGoto opt in cleanup blocks)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Custom mir: Add support for some remaining, easy to support constructs
Some documentation for previous changes and support for `Deinit`, checked binops, len, and array repetition
r? ```@oli-obk``` or ```@tmiasko```
Move format_args!() into AST (and expand it during AST lowering)
Implements https://github.com/rust-lang/compiler-team/issues/541
This moves FormatArgs from rustc_builtin_macros to rustc_ast_lowering. For now, the end result is the same. But this allows for future changes to do smarter things with format_args!(). It also allows Clippy to directly access the ast::FormatArgs, making things a lot easier.
This change turns the format args types into lang items. The builtin macro used to refer to them by their path. After this change, the path is no longer relevant, making it easier to make changes in `core`.
This updates clippy to use the new language items, but this doesn't yet make clippy use the ast::FormatArgs structure that's now available. That should be done after this is merged.
impl DispatchFromDyn for Cell and UnsafeCell
After some fruitful discussion on [Internals](https://internals.rust-lang.org/t/impl-dispatchfromdyn-for-cell-2/16520) here's my first PR to rust-lang/rust 🎉
Please let me know if there's something I missed.
This adds `DispatchFromDyn` impls for `Cell`, `UnsafeCell` and `SyncUnsafeCell`.
An existing test is also expanded to test the `Cell` impl (which requires the `UnsafeCell` impl)
The different `RefCell` types can not implement `DispatchFromDyn` since they have more than one (non ZST) field.
**Edit:**
### What:
These changes allow one to make types like `MyRc`(code below), to be object safe method receivers after implementing `DispatchFromDyn` and `Deref` for them.
This allows for code like this:
```rust
struct MyRc<T: ?Sized>(Cell<NonNull<RcBox<T>>>);
/* impls for DispatchFromDyn, CoerceUnsized and Deref for MyRc*/
trait Trait {
fn foo(self: MyRc<Self>);
}
let impls_trait = ...;
let rc = MyRc::new(impls_trait) as MyRc<dyn Trait>;
rc.foo();
```
Note: `Cell` and `UnsafeCell` won't directly become valid method receivers since they don't implement `Deref`. Making use of these changes requires a wrapper type and nightly features.
### Why:
A custom pointer type with interior mutability allows one to store extra information in the pointer itself.
These changes allow for such a type to be a method receiver.
### Examples:
My use case is a cycle aware custom `Rc` implementation that when dropping a cycle marks some references dangling.
On the [forum](https://internals.rust-lang.org/t/impl-dispatchfromdyn-for-cell/14762/8) andersk mentioned that they track if a `Gc` reference is rooted with an extra bit in the reference itself.
Suggest using a lock for `*Cell: Sync` bounds
I mostly did this for `OnceCell<T>` at first because users will be confused to see that the `OnceCell<T>` in `std` isn't `Sync` but then extended it to `Cell<T>` and `RefCell<T>` as well.
`sub_ptr()` is equivalent to `usize::try_from().unwrap_unchecked()`, not `usize::from().unwrap_unchecked()`
`usize::from()` gives a `usize`, not `Result<usize>`, and `usize: From<isize>` is not implemented.
Allow fmt::Arguments::as_str() to return more Some(_).
This adjusts the documentation to allow optimization of format_args!() to be visible through fmt::Arguments::as_str().
This allows for future changes like https://github.com/rust-lang/rust/pull/106824.
Before this change, the following functions and macros were annotated
with `#[cfg(target_has_atomic = "8")]` or
`#[cfg(target_has_atomic_load_store = "8")]`:
* `atomic_int`
* `strongest_failure_ordering`
* `atomic_swap`
* `atomic_add`
* `atomic_sub`
* `atomic_compare_exchange`
* `atomic_compare_exchange_weak`
* `atomic_and`
* `atomic_nand`
* `atomic_or`
* `atomic_xor`
* `atomic_max`
* `atomic_min`
* `atomic_umax`
* `atomic_umin`
However, none of those functions and macros actually depend on 8-bit
width and they are needed for all atomic widths (16-bit, 32-bit, 64-bit
etc.). Some targets might not support 8-bit atomics (i.e. BPF, if we
would enable atomic CAS for it).
This change fixes that by removing the `"8"` argument from annotations,
which results in accepting the whole variety of widths.
Fixes#106845Fixes#106795
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
Memory pre-fetching prefers forward scanning vs backwards scanning, and the
code-gen is usually better. For the most sensitive types such as integers, these
are planned to be merged bidirectionally at once. So there is no benefit in
scanning backwards.
The largest perf gains are seen for full ascending and descending inputs, which
see 1.5x speedups. Random inputs benefit too, and some patterns can loose out,
but these losses are minimal.
Improve the documentation of `black_box`
There don't seem to be many great resources on how `black_box` should be used, so I added some information here