Stabilize `#![feature(target_feature_11)]`
## Stabilization report
### Summary
Allows for safe functions to be marked with `#[target_feature]` attributes.
Functions marked with `#[target_feature]` are generally considered as unsafe functions: they are unsafe to call, cannot be assigned to safe function pointers, and don't implement the `Fn*` traits.
However, calling them from other `#[target_feature]` functions with a superset of features is safe.
```rust
// Demonstration function
#[target_feature(enable = "avx2")]
fn avx2() {}
fn foo() {
// Calling `avx2` here is unsafe, as we must ensure
// that AVX is available first.
unsafe {
avx2();
}
}
#[target_feature(enable = "avx2")]
fn bar() {
// Calling `avx2` here is safe.
avx2();
}
```
### Test cases
Tests for this feature can be found in [`src/test/ui/rfcs/rfc-2396-target_feature-11/`](b67ba9ba20/src/test/ui/rfcs/rfc-2396-target_feature-11/).
### Edge cases
- https://github.com/rust-lang/rust/issues/73631
Closures defined inside functions marked with `#[target_feature]` inherit the target features of their parent function. They can still be assigned to safe function pointers and implement the appropriate `Fn*` traits.
```rust
#[target_feature(enable = "avx2")]
fn qux() {
let my_closure = || avx2(); // this call to `avx2` is safe
let f: fn() = my_closure;
}
```
This means that in order to call a function with `#[target_feature]`, you must show that the target-feature is available while the function executes *and* for as long as whatever may escape from that function lives.
### Documentation
- Reference: https://github.com/rust-lang/reference/pull/1181
---
cc tracking issue #69098
r? `@ghost`
Move IpAddr, SocketAddr and V4+V6 related types to `core`
Implements RFC https://github.com/rust-lang/rfcs/pull/2832. The RFC has completed FCP with disposition merge, but is not yet merged.
Moves IP types to `core` as specified in the RFC.
The full list of moved types is: `IpAddr`, `Ipv4Addr`, `Ipv6Addr`, `SocketAddr`, `SocketAddrV4`, `SocketAddrV6`, `Ipv6MulticastScope` and `AddrParseError`.
Doing this move was one of the main driving arguments behind #78802.
Remove `from` lang item
It was probably a leftover from the old `?` desugaring but anyways, it's unused now except for clippy, which can just use a diagnostics item.
Require `literal`s for some `(u)int_impl!` parameters
The point of these is to be seen *lexically* in the docs, so they should always be passed as the correct literal, not as an expression.
(Otherwise we could just compute `Min`/`Max` from `BITS`, for example.)
r? Nilstrieb
Optimize break patterns
Use `wyrand` instead of calling `XORSHIFT` 2 times in break patterns for the 64-bit platform. The new PRNG is 2x faster than the previous one.
Bench result(via https://gist.github.com/zhangyunhao116/11ef41a150f5c23bb47d86255fbeba89):
```
old time: [1.3258 ns 1.3262 ns 1.3266 ns]
change: [+0.5901% +0.6731% +0.7791%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
new time: [657.65 ps 657.89 ps 658.18 ps]
change: [-1.6910% -1.6110% -1.5256%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
```
implement const iterator using `rustc_do_not_const_check`
Previous experiment: #102225.
Explanation: rather than making all default methods work under `const` all at once, this uses `rustc_do_not_const_check` as a workaround to "trick" the compiler to not run any checks on those other default methods. Any const implementations are only required to implement the `next` method. Any actual calls to the trait methods other than `next` will either error in compile time (at CTFE runs), or run the methods correctly if they do not have any non-const operations. This is extremely easy to maintain, remove, or improve.
The point of these is to be seen lexically in the docs, so they should always be passed as the correct literal, not as an expression.
(Otherwise we could just compute `Min`/`Max` from `BITS`, for example.)
Rename atomic 'as_mut_ptr' to 'as_ptr' to match Cell (ref #66893)
Originally discussed in https://github.com/rust-lang/rust/issues/66893#issuecomment-1419198623
~~This uses #107706 as a base to avoid a merge conflict once that gets rolled up (so disregard const changes in the diff until it does)~~ all merged & rebased
`@rustbot` label +T-libs-api
r? m-ou-se
Document that CStr::as_ptr returns a type alias
Rustdoc resolves type aliases too eagerly #15823 which makes the [std re-export](https://doc.rust-lang.org/stable/std/ffi/struct.CStr.html#method.as_ptr) of `CStr::as_ptr` show `i8` instead of `c_char`. To work around this I've added info about `c_char` in the method's description.
BTW, I've also added a comment to what-not-to-do example in case someone copypasted it without reading the surrounding text.
Improve the `array::map` codegen
The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.
There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.
(No libs-api changes; everything should be completely implementation-detail-internal.)
It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞
r? `@thomcc`
As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
%array.i.i.i.i = alloca [64 x i32], align 4
%_3.sroa.5.i.i.i = alloca [65 x i32], align 4
%_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)
But with this PR it's only `sub rsp, 520`
```llvm
start:
%array.i.i.i.i.i.i = alloca [64 x i32], align 4
%array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```
Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
mov esi, 64
mov edi, 0
cmp rdx, 64
je .LBB0_3
lea rsi, [rdx + 1]
mov qword ptr [rsp + 784], rsi
mov r8d, dword ptr [rsp + 4*rdx + 528]
mov edi, 1
lea edx, [r8 + 2*r8]
lea r8d, [r8 + 4*rdx]
add r8d, 7
.LBB0_3:
test edi, edi
je .LBB0_11
mov dword ptr [rsp + 4*rcx + 272], r8d
cmp rsi, 64
jne .LBB0_6
xor r8d, r8d
mov edx, 64
test r8d, r8d
jne .LBB0_8
jmp .LBB0_11
.LBB0_6:
lea rdx, [rsi + 1]
mov qword ptr [rsp + 784], rdx
mov edi, dword ptr [rsp + 4*rsi + 528]
mov r8d, 1
lea esi, [rdi + 2*rdi]
lea edi, [rdi + 4*rsi]
add edi, 7
test r8d, r8d
je .LBB0_11
.LBB0_8:
mov dword ptr [rsp + 4*rcx + 276], edi
add rcx, 2
cmp rcx, 64
jne .LBB0_1
```
whereas with this PR it's unrolled and vectorized
```nasm
vpmulld ymm1, ymm0, ymmword ptr [rsp + 64]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 328], ymm1
vpmulld ymm1, ymm0, ymmword ptr [rsp + 96]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
Rollup of 7 pull requests
Successful merges:
- #107654 (reword descriptions of the deprecated int modules)
- #107915 (Add `array::map` benchmarks)
- #107961 (Avoid copy-pasting the `ilog` panic string in a bunch of places)
- #107962 (Add a doc note about why `Chain` is not `ExactSizeIterator`)
- #107966 (Update browser-ui-test version to 0.14.3)
- #107970 (Hermit: Remove floor symbol)
- #107973 (Fix unintentional UB in SIMD tests)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Avoid copy-pasting the `ilog` panic string in a bunch of places
I also ended up changing the implementations to `if let` because it doesn't work to
```rust
self.checked_ilog2().unwrap_or_else(panic_for_nonpositive_argument)
```
due to the `!`. But as a bonus that meant I could remove the `rustc_allow_const_fn_unstable` too.
Document `PointerLike`
I forgot to document this, and even though it's currently more of an implementation detail, the old doc was kinda embarrassing 😅
Use associated items of `char` instead of freestanding items in `core::char`
The associated functions and constants on `char` have been stable since 1.52 and the freestanding items have soft-deprecated since 1.62 (https://github.com/rust-lang/rust/pull/95566). This PR ~~marks them as "deprecated in future", similar to the integer and floating point modules (`core::{i32, f32}` etc)~~ replaces all uses of `core::char::*` with `char::*` to prepare for future deprecation of `core::char::*`.
Speedup heapsort by 1.5x by making it branchless
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find a good pivot. By making the core child update code branchless it is much faster. On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us
`slice::sort_unstable` will fall back to heapsort if it repeatedly fails to find
a good pivot. By making the core child update code branchless it is much faster.
On Zen3 sorting 10k `u64` and forcing the sort to pick heapsort, results in:
455us -> 278us