Optimize break patterns
Use `wyrand` instead of calling `XORSHIFT` 2 times in break patterns for the 64-bit platform. The new PRNG is 2x faster than the previous one.
Bench result(via https://gist.github.com/zhangyunhao116/11ef41a150f5c23bb47d86255fbeba89):
```
old time: [1.3258 ns 1.3262 ns 1.3266 ns]
change: [+0.5901% +0.6731% +0.7791%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
new time: [657.65 ps 657.89 ps 658.18 ps]
change: [-1.6910% -1.6110% -1.5256%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
```
implement const iterator using `rustc_do_not_const_check`
Previous experiment: #102225.
Explanation: rather than making all default methods work under `const` all at once, this uses `rustc_do_not_const_check` as a workaround to "trick" the compiler to not run any checks on those other default methods. Any const implementations are only required to implement the `next` method. Any actual calls to the trait methods other than `next` will either error in compile time (at CTFE runs), or run the methods correctly if they do not have any non-const operations. This is extremely easy to maintain, remove, or improve.
Rename atomic 'as_mut_ptr' to 'as_ptr' to match Cell (ref #66893)
Originally discussed in https://github.com/rust-lang/rust/issues/66893#issuecomment-1419198623
~~This uses #107706 as a base to avoid a merge conflict once that gets rolled up (so disregard const changes in the diff until it does)~~ all merged & rebased
`@rustbot` label +T-libs-api
r? m-ou-se
Document that CStr::as_ptr returns a type alias
Rustdoc resolves type aliases too eagerly #15823 which makes the [std re-export](https://doc.rust-lang.org/stable/std/ffi/struct.CStr.html#method.as_ptr) of `CStr::as_ptr` show `i8` instead of `c_char`. To work around this I've added info about `c_char` in the method's description.
BTW, I've also added a comment to what-not-to-do example in case someone copypasted it without reading the surrounding text.
Improve the `array::map` codegen
The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.
There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.
(No libs-api changes; everything should be completely implementation-detail-internal.)
It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞
r? `@thomcc`
As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
%array.i.i.i.i = alloca [64 x i32], align 4
%_3.sroa.5.i.i.i = alloca [65 x i32], align 4
%_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)
But with this PR it's only `sub rsp, 520`
```llvm
start:
%array.i.i.i.i.i.i = alloca [64 x i32], align 4
%array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```
Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
mov esi, 64
mov edi, 0
cmp rdx, 64
je .LBB0_3
lea rsi, [rdx + 1]
mov qword ptr [rsp + 784], rsi
mov r8d, dword ptr [rsp + 4*rdx + 528]
mov edi, 1
lea edx, [r8 + 2*r8]
lea r8d, [r8 + 4*rdx]
add r8d, 7
.LBB0_3:
test edi, edi
je .LBB0_11
mov dword ptr [rsp + 4*rcx + 272], r8d
cmp rsi, 64
jne .LBB0_6
xor r8d, r8d
mov edx, 64
test r8d, r8d
jne .LBB0_8
jmp .LBB0_11
.LBB0_6:
lea rdx, [rsi + 1]
mov qword ptr [rsp + 784], rdx
mov edi, dword ptr [rsp + 4*rsi + 528]
mov r8d, 1
lea esi, [rdi + 2*rdi]
lea edi, [rdi + 4*rsi]
add edi, 7
test r8d, r8d
je .LBB0_11
.LBB0_8:
mov dword ptr [rsp + 4*rcx + 276], edi
add rcx, 2
cmp rcx, 64
jne .LBB0_1
```
whereas with this PR it's unrolled and vectorized
```nasm
vpmulld ymm1, ymm0, ymmword ptr [rsp + 64]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 328], ymm1
vpmulld ymm1, ymm0, ymmword ptr [rsp + 96]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
Rollup of 7 pull requests
Successful merges:
- #107654 (reword descriptions of the deprecated int modules)
- #107915 (Add `array::map` benchmarks)
- #107961 (Avoid copy-pasting the `ilog` panic string in a bunch of places)
- #107962 (Add a doc note about why `Chain` is not `ExactSizeIterator`)
- #107966 (Update browser-ui-test version to 0.14.3)
- #107970 (Hermit: Remove floor symbol)
- #107973 (Fix unintentional UB in SIMD tests)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid copy-pasting the `ilog` panic string in a bunch of places
I also ended up changing the implementations to `if let` because it doesn't work to
```rust
self.checked_ilog2().unwrap_or_else(panic_for_nonpositive_argument)
```
due to the `!`. But as a bonus that meant I could remove the `rustc_allow_const_fn_unstable` too.
Document `PointerLike`
I forgot to document this, and even though it's currently more of an implementation detail, the old doc was kinda embarrassing 😅
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
Speedup heapsort by 1.5x by making it branchless
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find
a good pivot. By making the core child update code branchless it is much faster.
On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us
Stabilize feature `cstr_from_bytes_until_nul`
This PR seeks to stabilize `cstr_from_bytes_until_nul`.
Partially addresses #95027
This function has only been on nightly for about 10 months, but I think it is simple enough that there isn't harm discussing stabilization. It has also had at least a handful of mentions on both the user forum and the discord, so it seems like it's already in use or at least known.
This needs FCP still.
Comment on potential discussion points:
- eventual conversion of `CStr` to be a single thin pointer: this function will still be useful to provide a safe way to create a `CStr` after this change.
- should this return a length too, to address concerns about the `CStr` change? I don't see it as being particularly useful, and it seems less ergonomic (i.e. returning `Result<(&CStr, usize), FromBytesUntilNulError>`). I think users that also need this length without the additional `strlen` call are likely better off using a combination of other methods, but this is up for discussion
- `CString::from_vec_until_nul`: this is also useful, but it doesn't even have a nightly implementation merged yet. I propose feature gating that separately, as opposed to blocking this `CStr` implementation on that
Possible alternatives:
A user can use `from_bytes_with_nul` on a slice up to `my_slice[..my_slice.iter().find(|c| c == 0).unwrap()]`. However; that is significantly less ergonomic, and is a bit more work for the compiler to optimize compared the direct `memchr` call that this wraps.
## New stable API
```rs
// both in core::ffi
pub struct FromBytesUntilNulError(());
impl CStr {
pub const fn from_bytes_until_nul(
bytes: &[u8]
) -> Result<&CStr, FromBytesUntilNulError>
}
```
cc ```@ericseppanen``` original author, ```@Mark-Simulacrum``` original reviewer, ```@m-ou-se``` brought up some issues on the thin pointer CStr
```@rustbot``` modify labels: +T-libs-api +needs-fcp
Rename `PointerSized` to `PointerLike`
The old name was unnecessarily vague. This PR renames a nightly language feature that I added, so I don't think it needs any additional approval, though anyone can feel free to speak up if you dislike the rename.
It's still unsatisfying that we don't the user which of {size, alignment} is wrong, but this trait really is just a stepping stone for a more generalized mechanism to create `dyn*`, just meant for nightly testing, so I don't think it really deserves additional diagnostic machinery for now.
Fixes#107696, cc ``@RalfJung``
r? ``@eholk``
Mark 'atomic_mut_ptr' methods const
There's nothing that would block these methods from being const (just an UnsafeCell get), and it would be helpful for FFI interfaces in static contexts
Related tracking issue: #66893