Rollup of 5 pull requests
Successful merges:
- #94362 (Add well known values to `--check-cfg` implementation)
- #94577 (only disable SIMD for doctests in Miri (not for the stdlib build itself))
- #94595 (Fix invalid `unresolved imports` errors for a single-segment import)
- #94596 (Delay bug in expr adjustment when check_expr is called multiple times)
- #94618 (Don't round stack size up for created threads in Windows)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
only disable SIMD for doctests in Miri (not for the stdlib build itself)
Also we can enable library/core/tests/simd.rs now, Miri supports enough SIMD for that.
Miri/CTFE: properly treat overflow in (signed) division/rem as UB
To my surprise, it looks like LLVM treats overflow of signed div/rem as UB. From what I can tell, MIR `Div`/`Rem` directly lowers to the corresponding LLVM operation, so to make that correct we also have to consider these overflows UB in the CTFE/Miri interpreter engine.
r? `@oli-obk`
When CStr moves to core with an alias in std, this can link to
`crate::ffi::CStr`. However, linking in the reverse direction (from core
to std) requires a relative path, and that path can't work from both
core::ffi and std::os::raw (different number of `../` traversals
required).
The ability to interoperate with C code via FFI is not limited to crates
using std; this allows using these types without std.
The existing types in `std::os::raw` become type aliases for the ones in
`core::ffi`. This uses type aliases rather than re-exports, to allow the
std types to remain stable while the core types are unstable.
This also moves the currently unstable `NonZero_` variants and
`c_size_t`/`c_ssize_t`/`c_ptrdiff_t` types to `core::ffi`, while leaving
them unstable.
core can't depend on external crates the way std can. Rather than revert
usage of cfg_if, add a copy of it to core. This does not export our
copy, even unstably; such a change could occur in a later commit.
Add Atomic*::from_mut_slice
Tracking issue #76314 for `from_mut` has a question about the possibility of `from_mut_slice`, and I found a real case for it. A user in the forum had a parallelism problem that could be solved by open-indexing updates to a vector of atomics, but they didn't want to affect the other code using that vector. Using `from_mut_slice`, they could borrow that data as atomics just long enough for their parallel loop.
ref: https://users.rust-lang.org/t/sharing-vector-with-rayon-par-iter-correctly/72022
Rollup of 5 pull requests
Successful merges:
- #93603 (Populate liveness facts when calling `get_body_with_borrowck_facts` without `-Z polonius`)
- #93870 (Fix switch on discriminant detection in a presence of coverage counters)
- #94355 (Add one more case to avoid ICE)
- #94363 (Remove needless borrows from core::fmt)
- #94377 (`check_used` should only look at actual `used` attributes)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
This function was updated in a recent PR (92911) to be called without the caller
information passed in, but the function signature itself was not altered with
cfg_attr at the time.
Stop manually SIMDing in `swap_nonoverlapping`
Like I previously did for `reverse` (#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.
A variety of codegen tests are included to confirm that the various cases are still being vectorized.
It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.
As a bonus, this change also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
<details>
<summary>ASM for this example</summary>
## Before (from godbolt)
note the `push`/`pop`s and `memcpy`
```x86
swap_m256_slice:
push r15
push r14
push r13
push r12
push rbx
sub rsp, 32
cmp rsi, rcx
jne .LBB0_6
mov r14, rsi
shl r14, 5
je .LBB0_6
mov r15, rdx
mov rbx, rdi
xor eax, eax
.LBB0_3:
mov rcx, rax
vmovaps ymm0, ymmword ptr [rbx + rax]
vmovaps ymm1, ymmword ptr [r15 + rax]
vmovaps ymmword ptr [rbx + rax], ymm1
vmovaps ymmword ptr [r15 + rax], ymm0
add rax, 32
add rcx, 64
cmp rcx, r14
jbe .LBB0_3
sub r14, rax
jbe .LBB0_6
add rbx, rax
add r15, rax
mov r12, rsp
mov r13, qword ptr [rip + memcpy@GOTPCREL]
mov rdi, r12
mov rsi, rbx
mov rdx, r14
vzeroupper
call r13
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call r13
mov rdi, r15
mov rsi, r12
mov rdx, r14
call r13
.LBB0_6:
add rsp, 32
pop rbx
pop r12
pop r13
pop r14
pop r15
vzeroupper
ret
```
## After (from my machine)
Note no `rsp` manipulation, sorry for different ASM syntax
```x86
swap_m256_slice:
cmpq %r9, %rdx
jne .LBB1_6
testq %rdx, %rdx
je .LBB1_6
cmpq $1, %rdx
jne .LBB1_7
xorl %r10d, %r10d
jmp .LBB1_4
.LBB1_7:
movq %rdx, %r9
andq $-2, %r9
movl $32, %eax
xorl %r10d, %r10d
.p2align 4, 0x90
.LBB1_8:
vmovaps -32(%rcx,%rax), %ymm0
vmovaps -32(%r8,%rax), %ymm1
vmovaps %ymm1, -32(%rcx,%rax)
vmovaps %ymm0, -32(%r8,%rax)
vmovaps (%rcx,%rax), %ymm0
vmovaps (%r8,%rax), %ymm1
vmovaps %ymm1, (%rcx,%rax)
vmovaps %ymm0, (%r8,%rax)
addq $2, %r10
addq $64, %rax
cmpq %r10, %r9
jne .LBB1_8
.LBB1_4:
testb $1, %dl
je .LBB1_6
shlq $5, %r10
vmovaps (%rcx,%r10), %ymm0
vmovaps (%r8,%r10), %ymm1
vmovaps %ymm1, (%rcx,%r10)
vmovaps %ymm0, (%r8,%r10)
.LBB1_6:
vzeroupper
retq
```
</details>
This does all its copying operations as either the original type or as `MaybeUninit`s, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers.
<details>
<summary>Perf is essentially unchanged</summary>
Though perhaps with more target features this would help more, if it could pick bigger chunks
## Before
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 894 ns/iter (+/- 11)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,476 ns/iter (+/- 2,784)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,257 ns/iter (+/- 7)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,922 ns/iter (+/- 959)
test slice::swap_with_slice_rgb_30 ... bench: 328 ns/iter (+/- 27)
test slice::swap_with_slice_rgb_3000 ... bench: 16,215 ns/iter (+/- 176)
test slice::swap_with_slice_u8_30 ... bench: 312 ns/iter (+/- 9)
test slice::swap_with_slice_u8_3000 ... bench: 5,401 ns/iter (+/- 123)
test slice::swap_with_slice_usize_30 ... bench: 368 ns/iter (+/- 3)
test slice::swap_with_slice_usize_3000 ... bench: 28,472 ns/iter (+/- 3,913)
```
## After
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 868 ns/iter (+/- 36)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,642 ns/iter (+/- 1,507)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,194 ns/iter (+/- 11)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,761 ns/iter (+/- 5,018)
test slice::swap_with_slice_rgb_30 ... bench: 324 ns/iter (+/- 6)
test slice::swap_with_slice_rgb_3000 ... bench: 15,962 ns/iter (+/- 287)
test slice::swap_with_slice_u8_30 ... bench: 281 ns/iter (+/- 5)
test slice::swap_with_slice_u8_3000 ... bench: 5,324 ns/iter (+/- 40)
test slice::swap_with_slice_usize_30 ... bench: 275 ns/iter (+/- 5)
test slice::swap_with_slice_usize_3000 ... bench: 28,277 ns/iter (+/- 277)
```
</detail>
remove feature gate in control_flow examples
Stabilization was done in https://github.com/rust-lang/rust/pull/91091, but the two examples weren't updated accordingly.
Probably too late to put it into stable, but it should be in the next release :)
Some improvements to the async docs
The goal here is to make the docs overall a little bit more comprehensive and add more links between the things.
One thing that's not working yet is the links to the keywords. Somehow I couldn't get them to work.
r? ````@GuillaumeGomez```` do you know how I could get the keyword links to work?
Like I previously did for `reverse`, this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.
It does still need logic to type-erase where appropriate, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.
As a bonus, this also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
core: Implement ASCII trim functions on byte slices
Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.
This PR adds three new methods to byte slices:
- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`
I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's https://github.com/rust-lang/rfcs/issues/2547 ("Trim methods on slices"), but that has a different purpose.
As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.
Tracking issue: https://github.com/rust-lang/rust/issues/94035
Guard against unwinding in cleanup code
Currently the only safe guard we have against double unwind is the panic count (which is local to Rust). When double unwinds indeed happen (e.g. C++ exception + Rust panic, or two C++ exceptions), then the second unwind actually goes through and the first unwind is leaked. This can cause UB. cc rust-lang/project-ffi-unwind#6
E.g. given the following C++ code:
```c++
extern "C" void foo() {
throw "A";
}
extern "C" void execute(void (*fn)()) {
try {
fn();
} catch(...) {
}
}
```
This program is well-defined to terminate:
```c++
struct dtor {
~dtor() noexcept(false) {
foo();
}
};
void a() {
dtor a;
dtor b;
}
int main() {
execute(a);
return 0;
}
```
But this Rust code doesn't catch the double unwind:
```rust
extern "C-unwind" {
fn foo();
fn execute(f: unsafe extern "C-unwind" fn());
}
struct Dtor;
impl Drop for Dtor {
fn drop(&mut self) {
unsafe { foo(); }
}
}
extern "C-unwind" fn a() {
let _a = Dtor;
let _b = Dtor;
}
fn main() {
unsafe { execute(a) };
}
```
To address this issue, this PR adds an unwind edge to an abort block, so that the Rust example aborts. This is similar to how clang guards against double unwind (except clang calls terminate per C++ spec and we abort).
The cost should be very small; it's an additional trap instruction (well, two for now, since we use TrapUnreachable, but that's a different issue) for each function with landing pads; if LLVM gains support to encode "abort/terminate" info directly in LSDA like GCC does, then it'll be free. It's an additional basic block though so compile time may be worse, so I'd like a perf run.
r? `@ghost`
`@rustbot` label: F-c_unwind
Add a `try_collect()` helper method to `Iterator`
Implement `Iterator::try_collect()` as a helper around `Iterator::collect()` as discussed [here](https://internals.rust-lang.org/t/idea-fallible-iterator-mapping-with-try-map/15715/5?u=a.lafrance).
First time contributor so definitely open to any feedback about my implementation! Specifically wondering if I should open a tracking issue for the unstable feature I introduced.
As the main participant in the internals discussion: r? `@scottmcm`
Add documentation to more `From::from` implementations.
For users looking at documentation through IDE popups, this gives them relevant information rather than the generic trait documentation wording “Performs the conversion”. For users reading the documentation for a specific type for any reason, this informs them when the conversion may allocate or copy significant memory versus when it is always a move or cheap copy.
Notes on specific cases:
* The new documentation for `From<T> for T` explains that it is not a conversion at all.
* Also documented `impl<T, U> Into<U> for T where U: From<T>`, the other central blanket implementation of conversion.
* The new documentation for construction of maps and sets from arrays of keys mentions the handling of duplicates. Future work could be to do this for *all* code paths that convert an iterable to a map or set.
* I did not add documentation to conversions of a specific error type to a more general error type.
* I did not add documentation to unstable code.
This change was prepared by searching for the text "From<... for" and so may have missed some cases that for whatever reason did not match. I also looked for `Into` impls but did not find any worth documenting by the above criteria.
Destabilize cfg(target_has_atomic_load_store = ...)
This was not intended to be stabilized yet.
This keeps the cfg_target_has_atomic feature gate name since compiler-builtins otherwise depends on it and I'd rather not try to manage a bump across a crates.io published repository given the time-sensitivity here (we need to land this quickly to avoid a beta backport).
Closes https://github.com/rust-lang/rust/issues/32976
r? `@Amanieu`
Make [u8]::cmp implementation branchless
The current implementation generates rather ugly assembly code, branching when the common parts are equal. By performing the comparison of the lengths upfront using a subtraction, the assembly gets much prettier: https://godbolt.org/z/4e5fnEKGd.
This will probably not impact speed too much, as the expensive part is in most cases the `memcmp`, but it sure looks better (I'm porting a sorting algorithm currently, and that branch just bothered me).
Since `decl_macro`s and/or `Span::def_site()` is deemed quite unstable,
no public-facing macro that relies on it can hope to be, itself, stabilized.
We circumvent the issue by no longer relying on field privacy for safety and,
instead, relying on an unstable feature-gate to act as the gate keeper for
non users of the macro (thanks to `allow_internal_unstable`).
This is technically not correct (since a `nightly` user could technically enable
the feature and cause unsoundness with it); or, in other words, this makes the
feature-gate used to gate the access to the field be (technically unsound, and
in practice) `unsafe`. Hence it having `unsafe` in its name.
Back to the macro, we go back to `macro_rules!` / `mixed_site()`-span rules thanks
to declaring the `decl_macro` as `semitransparent`, which is a hack to basically have
`pub macro_rules!`
Co-Authored-By: Mara Bos <m-ou.se@m-ou.se>
add diagnostic items for clippy's `trim_split_whitespace`
Adding the following diagnostic items:
* str_split_whitespace,
* str_trim,
* str_trim_start,
* str_trim_end
They are needed for https://github.com/rust-lang/rust-clippy/pull/8575
r? `@flip1995`
add module-level documentation for vec's in-place iteration
As requested in the last libs team meeting and during previous reviews.
Feel free to point out any gaps you encounter, after all non-obvious things may with hindsight seem obvious to me.
r? `@yaahc`
CC `@steffahn`
Rename `~const Drop` to `~const Destruct`
r? `@oli-obk`
Completely switching to `~const Destructible` would be rather complicated, so it seems best to add it for now and wait for it to be backported to beta in the next release.
The rationale is to prevent complications such as #92149 and #94803 by introducing an entirely new trait. And `~const Destructible` reads a bit better than `~const Drop`. Name Bikesheddable.
Add u16::is_utf16_surrogate
Right now, there are methods in the standard library for encoding and decoding UTF-16, but at least for the moment, there aren't any methods specifically for `u16` to help work with UTF-16 data. Since the full logic already exists, this wouldn't really add any code, just expose what's already there.
This method in particular is useful for working with the data returned by Windows `OsStrExt::encode_wide`. Initially, I was planning to also offer a `TryFrom<u16> for char`, but decided against it for now. There is plenty of code in rustc that could be rewritten to use this method, but I only checked within the standard library to replace them.
I think that offering more UTF-16-related methods to u16 would be useful, but I think this one is a good start. For example, one useful method might be `u16::is_pattern_whitespace`, which would check if something is the Unicode `Pattern_Whitespace` category. We can get away with this because all of the `Pattern_Whitespace` characters are in the basic multilingual plane, and hence we don't need to check for surrogates.
Provide more useful documentation of conversion methods
I thought that the documentation for these methods needed to be a bit more explanatory for new users. For advanced users, the comments are relatively unnecessary. I think it would be useful to explain precisely what the method does. As a new user, when you see the `into` method, where the type is inferred, if you are new you don't even know what you convert to, because it is implicit. I believe this can help new users understand.
I thought that the documentation for these methods needed to be a bit more explanatory for new users. For advanced users, the comments are relatively unnecessary. I think it would be useful to explain precisely what the method does. As a new user, when you see the `into` method, where the type is inferred, if you are new you don't even know what you convert to, because it is implicit. I believe this can help new users understand.
Document that `Option<extern "abi" fn>` discriminant elision applies for any ABI
The current phrasing was not very clear on that aspect.
r? `@RalfJung`
`@rustbot` modify labels: A-docs A-ffi
Derive Eq for std::cmp::Ordering, instead of using manual impl.
This allows consts of type Ordering to be used in patterns, and with feature(adt_const_params) allows using `Ordering` as a const generic parameter.
Currently, `std::cmp::Ordering` implements `Eq` using a manually written `impl Eq for Ordering {}`, instead of `derive(Eq)`. This means that it does not implement `StructuralEq`.
This commit removes the manually written impl, and adds `derive(Eq)` to `Ordering`, so that it will implement `StructuralEq`.