As an example:
#[test]
#[ignore = "not yet implemented"]
fn test_ignored() {
...
}
Will now render as:
running 2 tests
test tests::test_ignored ... ignored, not yet implemented
test result: ok. 1 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.00s
Stop manually SIMDing in `swap_nonoverlapping`
Like I previously did for `reverse` (#90821), this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.
A variety of codegen tests are included to confirm that the various cases are still being vectorized.
It does still need logic to type-erase in some cases, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.
As a bonus, this change also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
<details>
<summary>ASM for this example</summary>
## Before (from godbolt)
note the `push`/`pop`s and `memcpy`
```x86
swap_m256_slice:
push r15
push r14
push r13
push r12
push rbx
sub rsp, 32
cmp rsi, rcx
jne .LBB0_6
mov r14, rsi
shl r14, 5
je .LBB0_6
mov r15, rdx
mov rbx, rdi
xor eax, eax
.LBB0_3:
mov rcx, rax
vmovaps ymm0, ymmword ptr [rbx + rax]
vmovaps ymm1, ymmword ptr [r15 + rax]
vmovaps ymmword ptr [rbx + rax], ymm1
vmovaps ymmword ptr [r15 + rax], ymm0
add rax, 32
add rcx, 64
cmp rcx, r14
jbe .LBB0_3
sub r14, rax
jbe .LBB0_6
add rbx, rax
add r15, rax
mov r12, rsp
mov r13, qword ptr [rip + memcpy@GOTPCREL]
mov rdi, r12
mov rsi, rbx
mov rdx, r14
vzeroupper
call r13
mov rdi, rbx
mov rsi, r15
mov rdx, r14
call r13
mov rdi, r15
mov rsi, r12
mov rdx, r14
call r13
.LBB0_6:
add rsp, 32
pop rbx
pop r12
pop r13
pop r14
pop r15
vzeroupper
ret
```
## After (from my machine)
Note no `rsp` manipulation, sorry for different ASM syntax
```x86
swap_m256_slice:
cmpq %r9, %rdx
jne .LBB1_6
testq %rdx, %rdx
je .LBB1_6
cmpq $1, %rdx
jne .LBB1_7
xorl %r10d, %r10d
jmp .LBB1_4
.LBB1_7:
movq %rdx, %r9
andq $-2, %r9
movl $32, %eax
xorl %r10d, %r10d
.p2align 4, 0x90
.LBB1_8:
vmovaps -32(%rcx,%rax), %ymm0
vmovaps -32(%r8,%rax), %ymm1
vmovaps %ymm1, -32(%rcx,%rax)
vmovaps %ymm0, -32(%r8,%rax)
vmovaps (%rcx,%rax), %ymm0
vmovaps (%r8,%rax), %ymm1
vmovaps %ymm1, (%rcx,%rax)
vmovaps %ymm0, (%r8,%rax)
addq $2, %r10
addq $64, %rax
cmpq %r10, %r9
jne .LBB1_8
.LBB1_4:
testb $1, %dl
je .LBB1_6
shlq $5, %r10
vmovaps (%rcx,%r10), %ymm0
vmovaps (%r8,%r10), %ymm1
vmovaps %ymm1, (%rcx,%r10)
vmovaps %ymm0, (%r8,%r10)
.LBB1_6:
vzeroupper
retq
```
</details>
This does all its copying operations as either the original type or as `MaybeUninit`s, so as far as I know there should be no potential abstract machine issues with reading padding bytes as integers.
<details>
<summary>Perf is essentially unchanged</summary>
Though perhaps with more target features this would help more, if it could pick bigger chunks
## Before
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 894 ns/iter (+/- 11)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,476 ns/iter (+/- 2,784)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,257 ns/iter (+/- 7)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,922 ns/iter (+/- 959)
test slice::swap_with_slice_rgb_30 ... bench: 328 ns/iter (+/- 27)
test slice::swap_with_slice_rgb_3000 ... bench: 16,215 ns/iter (+/- 176)
test slice::swap_with_slice_u8_30 ... bench: 312 ns/iter (+/- 9)
test slice::swap_with_slice_u8_3000 ... bench: 5,401 ns/iter (+/- 123)
test slice::swap_with_slice_usize_30 ... bench: 368 ns/iter (+/- 3)
test slice::swap_with_slice_usize_3000 ... bench: 28,472 ns/iter (+/- 3,913)
```
## After
```
running 10 tests
test slice::swap_with_slice_4x_usize_30 ... bench: 868 ns/iter (+/- 36)
test slice::swap_with_slice_4x_usize_3000 ... bench: 99,642 ns/iter (+/- 1,507)
test slice::swap_with_slice_5x_usize_30 ... bench: 1,194 ns/iter (+/- 11)
test slice::swap_with_slice_5x_usize_3000 ... bench: 139,761 ns/iter (+/- 5,018)
test slice::swap_with_slice_rgb_30 ... bench: 324 ns/iter (+/- 6)
test slice::swap_with_slice_rgb_3000 ... bench: 15,962 ns/iter (+/- 287)
test slice::swap_with_slice_u8_30 ... bench: 281 ns/iter (+/- 5)
test slice::swap_with_slice_u8_3000 ... bench: 5,324 ns/iter (+/- 40)
test slice::swap_with_slice_usize_30 ... bench: 275 ns/iter (+/- 5)
test slice::swap_with_slice_usize_3000 ... bench: 28,277 ns/iter (+/- 277)
```
</detail>
remove feature gate in control_flow examples
Stabilization was done in https://github.com/rust-lang/rust/pull/91091, but the two examples weren't updated accordingly.
Probably too late to put it into stable, but it should be in the next release :)
Fix a layout possible miscalculation in `alloc::RawVec`
A layout miscalculation could happen in `RawVec` when used with a type whose size isn't a multiple of its alignment. I don't know if such type can exist in Rust, but the Layout API provides ways to manipulate such types. Anyway, it is better to calculate memory size in a consistent way.
Some improvements to the async docs
The goal here is to make the docs overall a little bit more comprehensive and add more links between the things.
One thing that's not working yet is the links to the keywords. Somehow I couldn't get them to work.
r? ````@GuillaumeGomez```` do you know how I could get the keyword links to work?
Like I previously did for `reverse`, this leaves it to LLVM to pick how to vectorize it, since it can know better the chunk size to use, compared to the "32 bytes always" approach we currently have.
It does still need logic to type-erase where appropriate, though, as while LLVM is now smart enough to vectorize over slices of things like `[u8; 4]`, it fails to do so over slices of `[u8; 3]`.
As a bonus, this also means one no longer gets the spurious `memcpy`(s?) at the end up swapping a slice of `__m256`s: <https://rust.godbolt.org/z/joofr4v8Y>
Fix miniz_oxide types showing up in std docs
Fixes#90526.
Thanks to ```````@camelid,``````` I rediscovered `doc(masked)`, allowing us to prevent `miniz_oxide` type to show up in std docs.
r? ```````@notriddle```````
removing architecture requirements for RustyHermit
RustHermit and HermitCore is able to run on aarch64 and x86_64. In the future these operating systems will also support RISC-V. Consequently, the dependency to a specific target should be removed.
The build process of `hermit-abi` fails if the architecture isn't supported.
core: Implement ASCII trim functions on byte slices
Hi ````````@rust-lang/libs!```````` This is a feature that I wished for when implementing serial protocols with microcontrollers. Often these protocols may contain leading or trailing whitespace, which needs to be removed. Because oftentimes drivers will operate on the byte level, decoding to unicode and checking for unicode whitespace is unnecessary overhead.
This PR adds three new methods to byte slices:
- `trim_ascii_start`
- `trim_ascii_end`
- `trim_ascii`
I did not find any pre-existing discussions about this, which surprises me a bit. Maybe I'm missing something, and this functionality is already possible through other means? There's https://github.com/rust-lang/rfcs/issues/2547 ("Trim methods on slices"), but that has a different purpose.
As per the [std dev guide](https://std-dev-guide.rust-lang.org/feature-lifecycle/new-unstable-features.html), this is a proposed implementation without any issue / RFC. If this is the wrong process, please let me know. However, I thought discussing code is easier than discussing a mere idea, and hacking on the stdlib was fun.
Tracking issue: https://github.com/rust-lang/rust/issues/94035
Guard against unwinding in cleanup code
Currently the only safe guard we have against double unwind is the panic count (which is local to Rust). When double unwinds indeed happen (e.g. C++ exception + Rust panic, or two C++ exceptions), then the second unwind actually goes through and the first unwind is leaked. This can cause UB. cc rust-lang/project-ffi-unwind#6
E.g. given the following C++ code:
```c++
extern "C" void foo() {
throw "A";
}
extern "C" void execute(void (*fn)()) {
try {
fn();
} catch(...) {
}
}
```
This program is well-defined to terminate:
```c++
struct dtor {
~dtor() noexcept(false) {
foo();
}
};
void a() {
dtor a;
dtor b;
}
int main() {
execute(a);
return 0;
}
```
But this Rust code doesn't catch the double unwind:
```rust
extern "C-unwind" {
fn foo();
fn execute(f: unsafe extern "C-unwind" fn());
}
struct Dtor;
impl Drop for Dtor {
fn drop(&mut self) {
unsafe { foo(); }
}
}
extern "C-unwind" fn a() {
let _a = Dtor;
let _b = Dtor;
}
fn main() {
unsafe { execute(a) };
}
```
To address this issue, this PR adds an unwind edge to an abort block, so that the Rust example aborts. This is similar to how clang guards against double unwind (except clang calls terminate per C++ spec and we abort).
The cost should be very small; it's an additional trap instruction (well, two for now, since we use TrapUnreachable, but that's a different issue) for each function with landing pads; if LLVM gains support to encode "abort/terminate" info directly in LSDA like GCC does, then it'll be free. It's an additional basic block though so compile time may be worse, so I'd like a perf run.
r? `@ghost`
`@rustbot` label: F-c_unwind
Add debug assertions to validate NUL terminator in c strings
The `unchecked` variants from the stdlib usually perform the check anyway if debug assertions are on (for example, `unwrap_unchecked`). This PR does the same thing for `CStr` and `CString`, validating the correctness for the NUL byte in debug mode.
Destabilise entry_insert
See: https://github.com/rust-lang/rust/pull/90345
I didn't revert the rename that was done in that PR, I left it as `entry_insert`.
Additionally, before that PR, `VacantEntry::insert_entry` seemingly had no stability attribute on it? I kept the attribute, just made it an unstable one, same as the one on `Entry`.
There didn't seem to be any mention of this in the RELEASES.md, so I don't think there's anything for me to do other than this?
kmc-solid: Use the filesystem thread-safety wrapper
Fixes the thread unsafety of the `std::fs` implementation used by the [`*-kmc-solid_*`](https://doc.rust-lang.org/nightly/rustc/platform-support/kmc-solid.html) Tier 3 targets.
Neither the SOLID filesystem API nor built-in filesystem drivers guarantee thread safety by default. Although this may suffice in general embedded-system use cases, and in fact the API can be used from multiple threads without any problems in many cases, this has been a source of unsoundness in `std::sys::solid::fs`.
This commit updates the implementation to leverage the filesystem thread-safety wrapper (which uses a pluggable synchronization mechanism) to enforce thread safety. This is done by prefixing all paths passed to the filesystem API with `\TS`. (Note that relative paths aren't supported in this platform.)