Optimize `core::str::Chars::count`
I wrote this a while ago after seeing this function as a bottleneck in a profile, but never got around to contributing it. I saw it again, and so here it is. The implementation is fairly complex, but I tried to explain what's happening at both a high level (in the header comment for the file), and in line comments in the impl. Hopefully it's clear enough.
This implementation (`case00_cur_libcore` in the benchmarks below) is somewhat consistently around 4x to 5x faster than the old implementation (`case01_old_libcore` in the benchmarks below), for a wide variety of workloads, without regressing performance on any of the workload sizes I've tried.
I also improved the benchmarks for this code, so that they explicitly check text in different languages and of different sizes (err, the cross product of language x size). The results of the benchmarks are here:
<details>
<summary>Benchmark results</summary>
<pre>
test str::char_count::emoji_huge::case00_cur_libcore ... bench: 20,216 ns/iter (+/- 3,673) = 17931 MB/s
test str::char_count::emoji_huge::case01_old_libcore ... bench: 108,851 ns/iter (+/- 12,777) = 3330 MB/s
test str::char_count::emoji_huge::case02_iter_increment ... bench: 329,502 ns/iter (+/- 4,163) = 1100 MB/s
test str::char_count::emoji_huge::case03_manual_char_len ... bench: 223,333 ns/iter (+/- 14,167) = 1623 MB/s
test str::char_count::emoji_large::case00_cur_libcore ... bench: 293 ns/iter (+/- 6) = 19331 MB/s
test str::char_count::emoji_large::case01_old_libcore ... bench: 1,681 ns/iter (+/- 28) = 3369 MB/s
test str::char_count::emoji_large::case02_iter_increment ... bench: 5,166 ns/iter (+/- 85) = 1096 MB/s
test str::char_count::emoji_large::case03_manual_char_len ... bench: 3,476 ns/iter (+/- 62) = 1629 MB/s
test str::char_count::emoji_medium::case00_cur_libcore ... bench: 48 ns/iter (+/- 0) = 14750 MB/s
test str::char_count::emoji_medium::case01_old_libcore ... bench: 217 ns/iter (+/- 4) = 3262 MB/s
test str::char_count::emoji_medium::case02_iter_increment ... bench: 642 ns/iter (+/- 7) = 1102 MB/s
test str::char_count::emoji_medium::case03_manual_char_len ... bench: 445 ns/iter (+/- 3) = 1591 MB/s
test str::char_count::emoji_small::case00_cur_libcore ... bench: 18 ns/iter (+/- 0) = 3777 MB/s
test str::char_count::emoji_small::case01_old_libcore ... bench: 23 ns/iter (+/- 0) = 2956 MB/s
test str::char_count::emoji_small::case02_iter_increment ... bench: 66 ns/iter (+/- 2) = 1030 MB/s
test str::char_count::emoji_small::case03_manual_char_len ... bench: 29 ns/iter (+/- 1) = 2344 MB/s
test str::char_count::en_huge::case00_cur_libcore ... bench: 25,909 ns/iter (+/- 39,260) = 13299 MB/s
test str::char_count::en_huge::case01_old_libcore ... bench: 102,887 ns/iter (+/- 3,257) = 3349 MB/s
test str::char_count::en_huge::case02_iter_increment ... bench: 166,370 ns/iter (+/- 12,439) = 2071 MB/s
test str::char_count::en_huge::case03_manual_char_len ... bench: 166,332 ns/iter (+/- 4,262) = 2071 MB/s
test str::char_count::en_large::case00_cur_libcore ... bench: 281 ns/iter (+/- 6) = 19160 MB/s
test str::char_count::en_large::case01_old_libcore ... bench: 1,598 ns/iter (+/- 19) = 3369 MB/s
test str::char_count::en_large::case02_iter_increment ... bench: 2,598 ns/iter (+/- 167) = 2072 MB/s
test str::char_count::en_large::case03_manual_char_len ... bench: 2,578 ns/iter (+/- 55) = 2088 MB/s
test str::char_count::en_medium::case00_cur_libcore ... bench: 44 ns/iter (+/- 1) = 15295 MB/s
test str::char_count::en_medium::case01_old_libcore ... bench: 201 ns/iter (+/- 51) = 3348 MB/s
test str::char_count::en_medium::case02_iter_increment ... bench: 322 ns/iter (+/- 40) = 2090 MB/s
test str::char_count::en_medium::case03_manual_char_len ... bench: 319 ns/iter (+/- 5) = 2109 MB/s
test str::char_count::en_small::case00_cur_libcore ... bench: 15 ns/iter (+/- 0) = 2333 MB/s
test str::char_count::en_small::case01_old_libcore ... bench: 14 ns/iter (+/- 0) = 2500 MB/s
test str::char_count::en_small::case02_iter_increment ... bench: 30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::en_small::case03_manual_char_len ... bench: 30 ns/iter (+/- 1) = 1166 MB/s
test str::char_count::ru_huge::case00_cur_libcore ... bench: 16,439 ns/iter (+/- 3,105) = 19777 MB/s
test str::char_count::ru_huge::case01_old_libcore ... bench: 89,480 ns/iter (+/- 2,555) = 3633 MB/s
test str::char_count::ru_huge::case02_iter_increment ... bench: 217,703 ns/iter (+/- 22,185) = 1493 MB/s
test str::char_count::ru_huge::case03_manual_char_len ... bench: 157,330 ns/iter (+/- 19,188) = 2066 MB/s
test str::char_count::ru_large::case00_cur_libcore ... bench: 243 ns/iter (+/- 6) = 20905 MB/s
test str::char_count::ru_large::case01_old_libcore ... bench: 1,384 ns/iter (+/- 51) = 3670 MB/s
test str::char_count::ru_large::case02_iter_increment ... bench: 3,381 ns/iter (+/- 543) = 1502 MB/s
test str::char_count::ru_large::case03_manual_char_len ... bench: 2,423 ns/iter (+/- 429) = 2096 MB/s
test str::char_count::ru_medium::case00_cur_libcore ... bench: 42 ns/iter (+/- 1) = 15119 MB/s
test str::char_count::ru_medium::case01_old_libcore ... bench: 180 ns/iter (+/- 4) = 3527 MB/s
test str::char_count::ru_medium::case02_iter_increment ... bench: 402 ns/iter (+/- 45) = 1579 MB/s
test str::char_count::ru_medium::case03_manual_char_len ... bench: 280 ns/iter (+/- 29) = 2267 MB/s
test str::char_count::ru_small::case00_cur_libcore ... bench: 12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case01_old_libcore ... bench: 12 ns/iter (+/- 0) = 2666 MB/s
test str::char_count::ru_small::case02_iter_increment ... bench: 19 ns/iter (+/- 0) = 1684 MB/s
test str::char_count::ru_small::case03_manual_char_len ... bench: 14 ns/iter (+/- 1) = 2285 MB/s
test str::char_count::zh_huge::case00_cur_libcore ... bench: 15,053 ns/iter (+/- 2,640) = 20067 MB/s
test str::char_count::zh_huge::case01_old_libcore ... bench: 82,622 ns/iter (+/- 3,602) = 3656 MB/s
test str::char_count::zh_huge::case02_iter_increment ... bench: 230,456 ns/iter (+/- 7,246) = 1310 MB/s
test str::char_count::zh_huge::case03_manual_char_len ... bench: 220,595 ns/iter (+/- 11,624) = 1369 MB/s
test str::char_count::zh_large::case00_cur_libcore ... bench: 227 ns/iter (+/- 65) = 20792 MB/s
test str::char_count::zh_large::case01_old_libcore ... bench: 1,136 ns/iter (+/- 144) = 4154 MB/s
test str::char_count::zh_large::case02_iter_increment ... bench: 3,147 ns/iter (+/- 253) = 1499 MB/s
test str::char_count::zh_large::case03_manual_char_len ... bench: 2,993 ns/iter (+/- 400) = 1577 MB/s
test str::char_count::zh_medium::case00_cur_libcore ... bench: 36 ns/iter (+/- 5) = 16388 MB/s
test str::char_count::zh_medium::case01_old_libcore ... bench: 142 ns/iter (+/- 18) = 4154 MB/s
test str::char_count::zh_medium::case02_iter_increment ... bench: 379 ns/iter (+/- 37) = 1556 MB/s
test str::char_count::zh_medium::case03_manual_char_len ... bench: 364 ns/iter (+/- 51) = 1620 MB/s
test str::char_count::zh_small::case00_cur_libcore ... bench: 11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case01_old_libcore ... bench: 11 ns/iter (+/- 1) = 3000 MB/s
test str::char_count::zh_small::case02_iter_increment ... bench: 20 ns/iter (+/- 3) = 1650 MB/s
</pre>
</details>
I also added fairly thorough tests for different sizes and alignments. This completes on my machine in 0.02s, which is surprising given how thorough they are, but it seems to detect bugs in the implementation. (I haven't run the tests on a 32 bit machine yet since before I reworked the code a little though, so... hopefully I'm not about to embarrass myself).
This uses similar SWAR-style techniques to the `is_ascii` impl I contributed in https://github.com/rust-lang/rust/pull/74066, so I'm going to request review from the same person who reviewed that one. That said am not particularly picky, and might not have the correct syntax for requesting a review from someone (so it goes).
r? `@nagisa`
Replace iterator-based construction of collections by `Into<T>`
Just a few quality of life improvements in the doc examples. I also removed some `Vec`s in favor of arrays.
Stabilize arc_new_cyclic
This stabilizes feature `arc_new_cyclic` as the implementation has been merged for one year and there is no unresolved questions. The FCP is not started yet.
Closes#75861 .
``@rustbot`` label +T-libs-api
Improve `Arc` and `Rc` documentation
This makes two changes (I can split the PR if necessary, but the changes are pretty small):
1. A bunch of trait implementations claimed to be zero cost; however, they use the `Arc<T>: From<Box<T>>` impl which is definitely not free, especially for large dynamically sized `T`.
2. The code in deferred initialization examples unnecessarily used excessive amounts of `unsafe`. This has been reduced.
doc: guarantee call order for sort_by_cached_key
`slice::sort_by_cached_key` takes a caching function `f: impl FnMut(&T) -> K`, which means that the order that calls to the caching function are made is user-visible. This adds a clause to the documentation to promise the current behavior, which is that `f` is called on all elements of the slice from left to right, unless the slice has len < 2 in which case `f` is not called.
For example, this can be used to ensure that the following code is a correct way to involve the index of the element in the sort key:
```rust
let mut index = 0;
slice.sort_by_cached_key(|x| (my_key(index, x), index += 1).0);
```
Clarify explicitly that BTree{Map,Set} are ordered.
One of the main reasons one would want to use a BTree{Map,Set} rather than a Hash{Map,Set} is because they maintain their keys in sorted order; but this was never explicitly stated in the top-level docs (it was only indirectly alluded to there, and stated explicitly in the docs for `iter`, `values`, etc.)
This PR states the ordering guarantee more prominently.
Add diagnostic items for macros
For use in Clippy, it adds diagnostic items to all the stable public macros
Clippy has lints that look for almost all of these (currently by name or path), but there are a few that aren't currently part of any lint, I could remove those if it's preferred to add them as needed rather than ahead of time
Yield means something else in the context of generators, which are
sufficiently close to iterators that it's better to avoid the
terminology collision here.
Implement `panic::update_hook`
Add a new function `panic::update_hook` to allow creating panic hooks that forward the call to the previously set panic hook, without race conditions. It works by taking a closure that transforms the old panic hook into a new one, while ensuring that during the execution of the closure no other thread can modify the panic hook. This is a small function so I hope it can be discussed here without a formal RFC, however if you prefer I can write one.
Consider the following example:
```rust
let prev = panic::take_hook();
panic::set_hook(Box::new(move |info| {
println!("panic handler A");
prev(info);
}));
```
This is a common pattern in libraries that need to do something in case of panic: log panic to a file, record code coverage, send panic message to a monitoring service, print custom message with link to github to open a new issue, etc. However it is impossible to avoid race conditions with the current API, because two threads can execute in this order:
* Thread A calls `panic::take_hook()`
* Thread B calls `panic::take_hook()`
* Thread A calls `panic::set_hook()`
* Thread B calls `panic::set_hook()`
And the result is that the original panic hook has been lost, as well as the panic hook set by thread A. The resulting panic hook will be the one set by thread B, which forwards to the default panic hook. This is not considered a big issue because the panic handler setup is usually run during initialization code, probably before spawning any other threads.
Using the new `panic::update_hook` function, this race condition is impossible, and the result will be either `A, B, original` or `B, A, original`.
```rust
panic::update_hook(|prev| {
Box::new(move |info| {
println!("panic handler A");
prev(info);
})
});
```
I found one real world use case here: 988cf403e7/src/detection.rs (L32) the workaround is to detect the race condition and panic in that case.
The pattern of `take_hook` + `set_hook` is very common, you can see some examples in this pull request, so I think it's natural to have a function that combines them both. Also using `update_hook` instead of `take_hook` + `set_hook` reduces the number of calls to `HOOK_LOCK.write()` from 2 to 1, but I don't expect this to make any difference in performance.
### Unresolved questions:
* `panic::update_hook` takes a closure, if that closure panics the error message is "panicked while processing panic" which is not nice. This is a consequence of holding the `HOOK_LOCK` while executing the closure. Could be avoided using `catch_unwind`?
* Reimplement `panic::set_hook` as `panic::update_hook(|_prev| hook)`?
Partially stabilize `maybe_uninit_extra`
This covers:
```rust
impl<T> MaybeUninit<T> {
pub unsafe fn assume_init_read(&self) -> T { ... }
pub unsafe fn assume_init_drop(&mut self) { ... }
}
```
It does not cover the const-ness of `write` under `const_maybe_uninit_write` nor the const-ness of `assume_init_read` (this commit adds `const_maybe_uninit_assume_init_read` for that).
FCP: https://github.com/rust-lang/rust/issues/63567#issuecomment-958590287.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
This covers:
impl<T> MaybeUninit<T> {
pub unsafe fn assume_init_read(&self) -> T { ... }
pub unsafe fn assume_init_drop(&mut self) { ... }
}
It does not cover the const-ness of `write` under
`const_maybe_uninit_write` nor the const-ness of
`assume_init_read` (this commit adds
`const_maybe_uninit_assume_init_read` for that).
FCP: https://github.com/rust-lang/rust/issues/63567#issuecomment-958590287.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Replace usages of vec![].into_iter with [].into_iter
`[].into_iter` is idiomatic over `vec![].into_iter` because its simpler and faster (unless the vec is optimized away in which case it would be the same)
So we should change all the implementation, documentation and tests to use it.
I skipped:
* `src/tools` - Those are copied in from upstream
* `src/test/ui` - Hard to tell if `vec![].into_iter` was used intentionally or not here and not much benefit to changing it.
* any case where `vec![].into_iter` was used because we specifically needed a `Vec::IntoIter<T>`
* any case where it looked like we were intentionally using `vec![].into_iter` to test it.
Mak DefId to AccessLevel map in resolve for export
hir_id to accesslevel in resolve and applied in privacy
using local def id
removing tracing probes
making function not recursive and adding comments
Move most of Exported/Public res to rustc_resolve
moving public/export res to resolve
fix missing stability attributes in core, std and alloc
move code to access_levels.rs
return for some kinds instead of going through them
Export correctness, macro changes, comments
add comment for import binding
add comment for import binding
renmae to access level visitor, remove comments, move fn as closure, remove new_key
fmt
fix rebase
fix rebase
fmt
fmt
fix: move macro def to rustc_resolve
fix: reachable AccessLevel for enum variants
fmt
fix: missing stability attributes for other architectures
allow unreachable pub in rustfmt
fix: missing impl access level + renaming export to reexport
Missing impl access level was found thanks to a test in clippy
Fix a minor mistake in `String::try_reserve_exact` examples
The examples of `String::try_reserve_exact` didn't actually use `try_reserve_exact`, which was probably a minor mistake, and this PR fixed it.
Drop guards in slice sorting derive src pointers from &mut T, which is invalidated by interior mutation in comparison
I tried to run https://github.com/rust-lang/miri-test-libstd on `alloc` with `-Zmiri-track-raw-pointers`, and got a failure on the test `slice::panic_safe`. The test failure has nothing to do with panic safety, it's from how the test tests for panic safety.
I minimized the test failure into this very silly program:
```rust
use std::cell::Cell;
use std::cmp::Ordering;
#[derive(Clone)]
struct Evil(Cell<usize>);
fn main() {
let mut input = vec![Evil(Cell::new(0)); 3];
// Hits the bug pattern via CopyOnDrop in core
input.sort_unstable_by(|a, _b| {
a.0.set(0);
Ordering::Less
});
// Hits the bug pattern via InsertionHole in alloc
input.sort_by(|_a, b| {
b.0.set(0);
Ordering::Less
});
}
```
To fix this, I'm just removing the mutability/uniqueness where it wasn't required.
Implement split_at_spare_mut without Deref to a slice so that the spare slice is valid
~I'm not sure I understand what's going on here correctly. And I'm pretty sure this safety comment needs to be changed. I'm just referring to the same thing that `as_mut_ptr_range` does.~ (Thanks `@RalfJung` for the guidance and clearing things up)
I tried to run https://github.com/rust-lang/miri-test-libstd on alloc with -Zmiri-track-raw-pointers, and got a failure on the test `vec::test_extend_from_within`.
I minimized the test failure into this program:
```rust
#![feature(vec_split_at_spare)]
fn main() {
Vec::<i32>::with_capacity(1).split_at_spare_mut();
}
```
The problem is that the existing implementation is actually getting a pointer range where both pointers are derived from the initialized region of the Vec's allocation, but we need the second one to be valid for the region between len and capacity. (thanks Ralf for clearing this up)