Improve slice.binary_search_by()'s best-case performance to O(1)
This PR aimed to improve the [slice.binary_search_by()](https://doc.rust-lang.org/std/primitive.slice.html#method.binary_search_by)'s best-case performance to O(1).
# Noticed
I don't know why the docs of `binary_search_by` said `"If there are multiple matches, then any one of the matches could be returned."`, but the implementation isn't the same thing. Actually, it returns the **last one** if multiple matches found.
Then we got two options:
## If returns the last one is the correct or desired result
Then I can rectify the docs and revert my changes.
## If the docs are correct or desired result
Then my changes can be merged after fully reviewed.
However, if my PR gets merged, another issue raised: this could be a **breaking change** since if multiple matches found, the returning order no longer the last one instead of it could be any one.
For example:
```rust
let mut s = vec![0, 1, 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55];
let num = 1;
let idx = s.binary_search(&num);
s.insert(idx, 2);
// Old implementations
assert_eq!(s, [0, 1, 1, 1, 1, 2, 2, 3, 5, 8, 13, 21, 34, 42, 55]);
// New implementations
assert_eq!(s, [0, 1, 1, 1, 2, 1, 2, 3, 5, 8, 13, 21, 34, 42, 55]);
```
# Benchmarking
**Old implementations**
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 4)
test slice::binary_search_l1_with_dups ... bench: 59 ns/iter (+/- 3)
test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 5)
test slice::binary_search_l2_with_dups ... bench: 77 ns/iter (+/- 17)
test slice::binary_search_l3 ... bench: 183 ns/iter (+/- 23)
test slice::binary_search_l3_with_dups ... bench: 185 ns/iter (+/- 19)
```
**New implementations (1)**
Implemented by this PR.
```rust
if cmp == Equal {
return Ok(mid);
} else if cmp == Less {
base = mid
}
```
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 58 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 4)
test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 6)
test slice::binary_search_l3 ... bench: 200 ns/iter (+/- 30)
test slice::binary_search_l3_with_dups ... bench: 157 ns/iter (+/- 6)
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 8)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 2)
test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 2)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 2)
test slice::binary_search_l3 ... bench: 198 ns/iter (+/- 21)
test slice::binary_search_l3_with_dups ... bench: 158 ns/iter (+/- 11)
```
**New implementations (2)**
Suggested by `@nbdd0121` in [comment](https://github.com/rust-lang/rust/pull/74024#issuecomment-665430239).
```rust
base = if cmp == Greater { base } else { mid };
if cmp == Equal { break }
```
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 7)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 5)
test slice::binary_search_l2 ... bench: 75 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench: 56 ns/iter (+/- 3)
test slice::binary_search_l3 ... bench: 195 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 7)
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 57 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench: 38 ns/iter (+/- 2)
test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 11)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 4)
test slice::binary_search_l3 ... bench: 194 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 18)
```
I run some benchmarking testings against on two implementations. The new implementation has a lot of improvement in duplicates cases, while in `binary_search_l3` case, it's a little bit slower than the old one.
This commit fixes an issue pointed out in #82758 where LTO changed the
behavior of a program. It turns out that LTO was not at fault here, it
simply uncovered an existing bug. The bindings to
`__wasilibc_find_relpath` assumed that the relative portion of the path
returned was always contained within thee input `buf` we passed in. This
isn't actually the case, however, and sometimes the relative portion of
the path may reference a sub-portion of the input string itself.
The fix here is to use the relative path pointer coming out of
`__wasilibc_find_relpath` as the source of truth. The `buf` used for
local storage is discarded in this function and the relative path is
copied out unconditionally. We might be able to get away with some
`Cow`-like business or such to avoid the extra allocation, but for now
this is probably the easiest patch to fix the original issue.
Add diagnostic item to `Default` trait
This PR adds diagnostic item to `Default` trait to be used by rust-lang/rust-clippy#6562 issue.
Also fixes the obsolete path to the `symbols.rs` file in the comment.
Add assert_matches macro.
This adds `assert_matches!(expression, pattern)`.
Unlike the other asserts, this one ~~consumes the expression~~ may consume the expression, to be able to match the pattern. (It could add a `&` implicitly, but that's noticable in the pattern, and will make a consuming guard impossible.)
See https://github.com/rust-lang/rust/issues/62633#issuecomment-790737853
This re-uses the same `left: .. right: ..` output as the `assert_eq` and `assert_ne` macros, but with the pattern as the right part:
assert_eq:
```
assertion failed: `(left == right)`
left: `Some("asdf")`,
right: `None`
```
assert_matches:
```
assertion failed: `(left matches right)`
left: `Ok("asdf")`,
right: `Err(_)`
```
cc ```@cuviper```
Add {BTreeMap,HashMap}::try_insert
`{BTreeMap,HashMap}::insert(key, new_val)` returns `Some(old_val)` if the key was already in the map. It's often useful to assert no duplicate values are inserted.
We experimented with `map.insert(key, val).unwrap_none()` (https://github.com/rust-lang/rust/issues/62633), but decided that that's not the kind of method we'd like to have on `Option`s.
`insert` always succeeds because it replaces the old value if it exists. One could argue that `insert()` is never the right method for panicking on duplicates, since already handles that case by replacing the value, only allowing you to panic after that already happened.
This PR adds a `try_insert` method that instead returns a `Result::Err` when the key already exists. This error contains both the `OccupiedEntry` and the value that was supposed to be inserted. This means that unwrapping that result gives more context:
```rust
map.insert(10, "world").unwrap_none();
// thread 'main' panicked at 'called `Option::unwrap_none()` on a `Some` value: "hello"', src/main.rs:8:29
```
```rust
map.try_insert(10, "world").unwrap();
// thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value:
// OccupiedError { key: 10, old_value: "hello", new_value: "world" }', src/main.rs:6:33
```
It also allows handling the failure in any other way, as you have full access to the `OccupiedEntry` and the value.
`try_insert` returns a reference to the value in case of success, making it an alternative to `.entry(key).or_insert(value)`.
r? ```@Amanieu```
Fixes https://github.com/rust-lang/rfcs/issues/3092
Avoid unnecessary Vec construction in BufReader
As mentioned in #80460, creating a `Vec` and calling `Vec::into_boxed_slice()` emits unnecessary calls to `realloc()` and `free()`. Updated the code to use `Box::new_uninit_slice()` to create a boxed slice directly. I think this also makes it more explicit that the initial contents of the buffer are uninitialized.
r? ``@m-ou-se``
Add suggestion `.collect()` for iterators in iterators
Closes#81584
```
error[E0515]: cannot return value referencing function parameter `y`
--> main3.rs:4:38
|
4 | ... .map(|y| y.iter().map(|x| x + 1))
| -^^^^^^^^^^^^^^^^^^^^^^
| |
| returns a value referencing data owned by the current function
| `y` is borrowed here
| help: Maybe use `.collect()` to allocate the iterator
```
Added the suggestion: `help: Maybe use `.collect()` to allocate the iterator`
Improved IO Bytes Size Hint
After trying to implement better `size_hint()` return values for `File` in [this PR](https://github.com/rust-lang/rust/pull/81044) and changing to implementing it for `BufReader` in [this PR](https://github.com/rust-lang/rust/pull/81052), I have arrived at this implementation that provides tighter bounds for the `Bytes` iterator of various readers including `BufReader`, `Empty`, and `Chain`.
Unfortunately, for `BufReader`, the size_hint only improves after calling `fill_buffer` due to it using the contents of the buffer for the hint. Nevertheless, the the tighter bounds should result in better pre-allocation of space to handle the contents of the `Bytes` iterator.
Closes#81052
Implement NOOP_METHOD_CALL lint
Implements the beginnings of https://github.com/rust-lang/lang-team/issues/67 - a lint for detecting noop method calls (e.g, calling `<&T as Clone>::clone()` when `T: !Clone`).
This PR does not fully realize the vision and has a few limitations that need to be addressed either before merging or in subsequent PRs:
* [ ] No UFCS support
* [ ] The warning message is pretty plain
* [ ] Doesn't work for `ToOwned`
The implementation uses [`Instance::resolve`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/instance/struct.Instance.html#method.resolve) which is normally later in the compiler. It seems that there are some invariants that this function relies on that we try our best to respect. For instance, it expects substitutions to have happened, which haven't yet performed, but we check first for `needs_subst` to ensure we're dealing with a monomorphic type.
Thank you to ```@davidtwco,``` ```@Aaron1011,``` and ```@wesleywiser``` for helping me at various points through out this PR ❤️.
Upgrade to LLVM 12
This implements the necessary adjustments to make rustc work with LLVM 12. I didn't encounter any major issues so far.
r? `@cuviper`
Previously vec's len was updated only after full copy, making the method
leak if T::clone panic!s.
This commit makes `Vec::extend_from_within` (or, more accurately, it's
`T: Clone` specialization) update vec's len on every iteration, fixing
the issue.
`T: Copy` specialization was not affected by the issue b/c it doesn't
call user specified code (as, e.g. `T::clone`), and instead calls
`ptr::copy_nonoverlapping`.
If different unices have different bit patterns for WIFSTOPPED and
WIFCONTINUED then simply being glibc is probably not good enough for
this rather ad-hoc test to work. Do it on Linux only.
Signed-off-by: Ian Jackson <ijackson@chiark.greenend.org.uk>
Revert `Vec::spare_capacity_mut` impl to prevent pointers invalidation
The implementation was changed in #79015.
Later it was [pointed out](https://github.com/rust-lang/rust/issues/81944#issuecomment-782849785) that the implementation invalidates pointers to the buffer (initialized elements) by creating a unique reference to the buffer. This PR reverts the implementation.
r? ```@RalfJung```
enable atomic_min/max tests in Miri
Thanks to `@henryboisdequin` and `@GregBowyer,` Miri now supports these intrinsics. :)
Also includes the necessary Miri update.
unix: Non-mutable bufs in send_vectored_with_ancillary_to
This is the same PR as [#79753](https://github.com/rust-lang/rust/pull/79753). It was closed because of inactivity. Therefore, I create a new one. ````@lukaslihotzki````
Add is_enclave_range/is_user_range overflow checks
Fixes#76343.
This adds overflow checking to `is_enclave_range` and `is_user_range` in `sgx::os::fortanix_sgx::mem` in order to mitigate possible security issues with enclave code. It also accounts for an edge case where the memory range provided ends exactly at the end of the address space, where calculating `p + len` would overflow back to zero despite the range potentially being valid.
Turn may_have_side_effect into an associated constant
The `may_have_side_effect` is an implementation detail of `TrustedRandomAccess`
trait. It describes if obtaining an iterator element may have side effects. It
is currently implemented as an associated function.
Turn `may_have_side_effect` into an associated constant. This makes the
value immediately available to the optimizer.
Convert primitives in the standard library to intra-doc links
Blocked on https://github.com/rust-lang/rust/pull/80181. I forgot that this needs to wait for the beta bump so the standard library can be documented with `doc --stage 0`.
Notably I didn't convert `core::slice` because it's like 50 links and I got scared 😨
- Rename `broken_intra_doc_links` to `rustdoc::broken_intra_doc_links`
- Ensure that the old lint names still work and give deprecation errors
- Register lints even when running doctests
Otherwise, all `rustdoc::` lints would be ignored.
- Register all existing lints as removed
This unfortunately doesn't work with `register_renamed` because tool
lints have not yet been registered when rustc is running. For similar
reasons, `check_backwards_compat` doesn't work either. Call
`register_removed` directly instead.
- Fix fallout
+ Rustdoc lints for compiler/
+ Rustdoc lints for library/
Note that this does *not* suggest `rustdoc::broken_intra_doc_links` for
`rustdoc::intra_doc_link_resolution_failure`, since there was no time
when the latter was valid.
Change twice used large const table to static
This table is used twice in core::num::dec2flt::algorithm::power_of_ten. According to the semantics of const, a separate huge definition of the table is inlined at both places.
5233edcf1c/library/core/src/num/dec2flt/algorithm.rs (L16-L22)
Theoretically this gets cleaned up by optimization passes, but in practice I am experiencing a miscompile from LTO on this code. Making the table a static, which would only be defined a single time and not require attention from LTO, eliminates the miscompile and seems semantically more appropriate anyway. A separate bug report on the LTO bug is forthcoming.
Original addition of `const` is from #27307.
This table is used twice in core::num::dec2flt::algorithm::power_of_ten.
According to the semantics of const, a separate huge definition of the
table is inlined at both places.
fn power_of_ten(e: i16) -> Fp {
assert!(e >= table::MIN_E);
let i = e - table::MIN_E;
let sig = table::POWERS.0[i as usize];
let exp = table::POWERS.1[i as usize];
Fp { f: sig, e: exp }
}
Theoretically this gets cleaned up by optimization passes, but in
practice I am experiencing a miscompile from LTO on this code. Making
the table a static, which would only be defined a single time and not
require attention from LTO, eliminates the miscompile and seems
semantically more appropriate anyway. A separate bug report on the LTO
bug is forthcoming.
Whether for Rust's own `target_os`, LLVM's triples, or GNU config's, the
OS-related have fields have been for code running *on* that OS, not code
that is *part* of the OS.
The difference is huge, as syscall interfaces are nothing like
freestanding interfaces. Kernels are (hypervisors and other more exotic
situations aside) freestanding programs that use the interfaces provided
by the hardware. It's *those* interfaces, the ones external to the
program being built and its software dependencies, that are the content
of the target.
For the Linux Kernel in particular, `target_env: "gnu"` is removed for
the same reason: that `-gnu` refers to glibc or GNU/linux, neither of
which applies to the kernel itself.
Relates to #74247
Thanks @ojeda for catching some things.
Clarify that SyncOnceCell::set blocks.
Reading the discussion of this feature, I gained the mistaken impression that neither `set` nor `get` blocked, and thus calling `get` immediately after `set` was not guaranteed to succeed. It turns out that `set` *does* block, guaranteeing that the cell contains a value once `set` returns. This change updates the documentation to state that explicitly.
Happy to adjust the wording as desired.
Rollup of 10 pull requests
Successful merges:
- #82309 (Propagate RUSTDOCFLAGS in the environment when documenting)
- #82403 (rustbuild: print out env vars on verbose rustc invocations)
- #82507 (Rename the `tidy` binary to `rust-tidy`)
- #82531 (Add GUI tests)
- #82532 (Add `build.print_step_rusage` to config.toml)
- #82543 (fix env var name in CI)
- #82622 (Propagate `--test-args` for `x.py test src/tools/cargo`)
- #82628 (Try to clarify GlobalAlloc::realloc documentation comment.)
- #82630 (Fix a typo in the `find_anon_type` doc)
- #82643 (Add more proc-macro attribute tests)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Try to clarify GlobalAlloc::realloc documentation comment.
This PR tries to improve the documentation of [GlobalAlloc::realloc](https://doc.rust-lang.org/alloc/alloc/trait.GlobalAlloc.html#method.realloc) with two aspects:
1. Explicitly mention that `realloc` preserves the contents of the original memory block.
2. Explicitly mention which layout should be used to deallocate the reallocated block.
BTree: no longer define impossible casts
Casts to leaf to internal only make sense when the original has a chance of being the thing it's cast to.
r? `@Mark-Simulacrum`
BTreeMap: split up range_search into two stages
`range_search` expects the caller to pass the same root twice and starts searching a node for both bounds of a range. It's not very clear that in the early iterations, it searches twice in the same node. This PR splits that search up in an initial `find_leaf_edges_spanning_range` that postpones aliasing until the last second, and a second phase for continuing the search for the range in the each subtree independently (`find_lower_bound_edge` & `find_upper_bound_edge`), which greatly helps for use in #81075. It also moves those functions over to the search module.
r? `@Mark-Simulacrum`
Reading the discussion of this feature, I gained the mistaken impression that neither `set` nor `get` blocked, and thus calling `get` immediately after `set` was not guaranteed to succeed. It turns out that `set` *does* block, guaranteeing that the cell contains a value once `set` returns. This change updates the documentation to state that explicitly.
Add a chapter on the test harness.
There isn't really any online documentation on the test harness, so this adds a chapter to the rustc book which provides information on how the harness works and details on the command-line options.
Remove the x86_64-rumprun-netbsd target
Herein we remove the target from the compiler and the code from libstd intended to support the now-defunct rumprun project.
Closes#81514
clarify RW lock's priority gotcha
In particular, the following program works on Linux, but deadlocks on
mac:
```rust
use std::{
sync::{Arc, RwLock},
thread,
time::Duration,
};
fn main() {
let lock = Arc::new(RwLock::new(()));
let r1 = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _rg = lock.read();
eprintln!("r1/1");
sleep(1000);
let _rg = lock.read();
eprintln!("r1/2");
sleep(5000);
}
});
sleep(100);
let w = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _wg = lock.write();
eprintln!("w");
}
});
sleep(100);
let r2 = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _rg = lock.read();
eprintln!("r2");
sleep(2000);
}
});
r1.join().unwrap();
r2.join().unwrap();
w.join().unwrap();
}
fn sleep(ms: u64) {
std:🧵:sleep(Duration::from_millis(ms))
}
```
Context: I was completely mystified by a my CI deadlocking on mac ([here](https://github.com/matklad/xshell/pull/7)), until ``@azdavis`` debugged the issue. See a stand-alone reproduciton here: https://github.com/matklad/xshell/pull/15
Add missing "see its documentation for more" stdio
StdoutLock and StderrLock does not have example, it would be better
to leave "see its documentation for more" like iter docs.
In particular, the following program works on Linux, but deadlocks on
mac:
use std::{
sync::{Arc, RwLock},
thread,
time::Duration,
};
fn main() {
let lock = Arc::new(RwLock::new(()));
let r1 = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _rg = lock.read();
eprintln!("r1/1");
sleep(1000);
let _rg = lock.read();
eprintln!("r1/2");
sleep(5000);
}
});
sleep(100);
let w = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _wg = lock.write();
eprintln!("w");
}
});
sleep(100);
let r2 = thread::spawn({
let lock = Arc::clone(&lock);
move || {
let _rg = lock.read();
eprintln!("r2");
sleep(2000);
}
});
r1.join().unwrap();
r2.join().unwrap();
w.join().unwrap();
}
fn sleep(ms: u64) {
std:🧵:sleep(Duration::from_millis(ms))
}
Specialize slice::fill with Copy type and u8/i8/bool
I don't expect rustperf could measure any perf improvements with this changes
since `slice::fill` is newly added.
Godbolt link for this change: <https://rust.godbolt.org/z/r3fzee>.
r? `@matthewjasper` since this patch added new specialization.
Use libc::accept4 on Android instead of raw syscall.
This PR replaces the use of a raw `accept4` syscall with `libc::accept4`. This was originally added (by me) because `std` couldn't update to the latest `libc` with `accept4` support for android. By now, libc is already on 0.2.85, so the workaround can be removed.
`@rustbot` label +O-android +T-libs-impl
Add a `size()` function to WASI's `MetadataExt`.
WASI's `filestat` type includes a size field, so expose it in
`MetadataExt` via a `size()` function, similar to the corresponding Unix
function.
r? ``````@alexcrichton``````
Enable API documentation for `std::os::wasi`.
This adds API documentation support for `std::os::wasi` modeled after
how `std::os::unix` works, so that WASI can be documented [here] along
with the other platforms.
[here]: https://doc.rust-lang.org/stable/std/os/index.html
Two changes of particular interest:
- This changes the `AsRawFd` for `io::Stdin` for WASI to return
`libc::STDIN_FILENO` instead of `sys::stdio::Stdin.as_raw_fd()` (and
similar for `Stdout` and `Stderr`), which matches how the `unix`
version works. `STDIN_FILENO` etc. may not always be explicitly
reserved at the WASI level, but as long as we have Rust's `std` and
`libc`, I think it's reasonable to guarantee that we'll always use
`libc::STDIN_FILENO` for stdin.
- This duplicates the `osstr2str` utility function, rather than
trying to share it across all the configurations that need it.
r? ```@alexcrichton```