Add #[must_use] to expensive computations
The unifying theme for this commit is weak, admittedly. I put together a list of "expensive" functions when I originally proposed this whole effort, but nobody's cared about that criterion. Still, it's a decent way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite a bit. I'm open to wording changes.
For some reason clippy flagged four `BTreeSet` methods but didn't say boo about equivalent ones on `HashSet`. I stared at them for a while but I can't figure out the difference so I added the `HashSet` ones in.
```rust
// Flagged by clippy.
alloc::collections::btree_set::BTreeSet<T> fn difference<'a>(&'a self, other: &'a BTreeSet<T>) -> Difference<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn symmetric_difference<'a>(&'a self, other: &'a BTreeSet<T>) -> SymmetricDifference<'a, T>
alloc::collections::btree_set::BTreeSet<T> fn intersection<'a>(&'a self, other: &'a BTreeSet<T>) -> Intersection<'a, T>;
alloc::collections::btree_set::BTreeSet<T> fn union<'a>(&'a self, other: &'a BTreeSet<T>) -> Union<'a, T>;
// Ignored by clippy, but not by me.
std::collections::HashSet<T, S> fn difference<'a>(&'a self, other: &'a HashSet<T, S>) -> Difference<'a, T, S>;
std::collections::HashSet<T, S> fn symmetric_difference<'a>(&'a self, other: &'a HashSet<T, S>) -> SymmetricDifference<'a, T, S>
std::collections::HashSet<T, S> fn intersection<'a>(&'a self, other: &'a HashSet<T, S>) -> Intersection<'a, T, S>;
std::collections::HashSet<T, S> fn union<'a>(&'a self, other: &'a HashSet<T, S>) -> Union<'a, T, S>;
```
Parent issue: #89692
r? ```@joshtriplett```
Implement split_array and split_array_mut
This implements `[T]::split_array::<const N>() -> (&[T; N], &[T])` and `[T; N]::split_array::<const M>() -> (&[T; M], &[T])` and their mutable equivalents. These are another few “missing” array implementations now that const generics are a thing, similar to #74373, #75026, etc. Fixes#74674.
This implements `[T; N]::split_array` returning an array and a slice. Ultimately, this is probably not what we want, we would want the second return value to be an array of length N-M, which will likely be possible with future const generics enhancements. We need to implement the array method now though, to immediately shadow the slice method. This way, when the slice methods get stabilized, calling them on an array will not be automatic through coercion, so we won't have trouble stabilizing the array methods later (cf. into_iter debacle).
An unchecked version of `[T]::split_array` could also be added as in #76014. This would not be needed for `[T; N]::split_array` as that can be compile-time checked. Edit: actually, since split_at_unchecked is internal-only it could be changed to be split_array-only.
Make more `From` impls `const` (libcore)
Adding `const` to `From` implementations in the core. `rustc_const_unstable` attribute is not added to unstable implementations.
Tracking issue: #88674
<details>
<summary>Done</summary><div>
- `T` from `T`
- `T` from `!`
- `Option<T>` from `T`
- `Option<&T>` from `&Option<T>`
- `Option<&mut T>` from `&mut Option<T>`
- `Cell<T>` from `T`
- `RefCell<T>` from `T`
- `UnsafeCell<T>` from `T`
- `OnceCell<T>` from `T`
- `Poll<T>` from `T`
- `u32` from `char`
- `u64` from `char`
- `u128` from `char`
- `char` from `u8`
- `AtomicBool` from `bool`
- `AtomicPtr<T>` from `*mut T`
- `AtomicI(bits)` from `i(bits)`
- `AtomicU(bits)` from `u(bits)`
- `i(bits)` from `NonZeroI(bits)`
- `u(bits)` from `NonZeroU(bits)`
- `NonNull<T>` from `Unique<T>`
- `NonNull<T>` from `&T`
- `NonNull<T>` from `&mut T`
- `Unique<T>` from `&mut T`
- `Infallible` from `!`
- `TryIntError` from `!`
- `TryIntError` from `Infallible`
- `TryFromSliceError` from `Infallible`
- `FromResidual for Option<T>`
</div></details>
<details>
<summary>Remaining</summary><dev>
- `NonZero` from `NonZero`
These can't be made const at this time because these use Into::into.
https://github.com/rust-lang/rust/blob/master/library/core/src/convert/num.rs#L393
- `std`, `alloc`
There may still be many implementations that can be made `const`.
</div></details>
Automatic exponential formatting in Debug
Context: See [this comment from the libs team](https://github.com/rust-lang/rfcs/pull/2729#issuecomment-853454204)
---
Makes `"{:?}"` switch to exponential for floats based on magnitude. The libs team suggested exploring this idea in the discussion thread for RFC rust-lang/rfcs#2729. (**note:** this is **not** an implementation of the RFC; it is an implementation of one of the alternatives)
Thresholds chosen were 1e-4 and 1e16. Justification described [here](https://github.com/rust-lang/rfcs/pull/2729#issuecomment-864482954).
**This will require a crater run.**
---
As mentioned in the commit message of 8731d4dfb4, this behavior will not apply when a precision is supplied, because I wanted to preserve the following existing and useful behavior of `{:.PREC?}` (which recursively applies `{:.PREC}` to floats in a struct):
```rust
assert_eq!(
format!("{:.2?}", [100.0, 0.000004]),
"[100.00, 0.00]",
)
```
I looked around and am not sure where there are any tests that actually use this in the test suite, though?
All things considered, I'm surprised that this change did not seem to break even a single existing test in `x.py test --stage 2`. (even when I tried a smaller threshold of 1e6)
The unifying theme for this commit is weak, admittedly. I put together a
list of "expensive" functions when I originally proposed this whole
effort, but nobody's cared about that criterion. Still, it's a decent
way to bite off a not-too-big chunk of work.
Given the grab bag nature of this commit, the messages I used vary quite
a bit.
Speedup int log10 branchless
This is achieved with a branchless bit-twiddling implementation of the case x < 100_000, and using this as building block.
Benchmark on an Intel i7-8700K (Coffee Lake):
```
name old ns/iter new ns/iter diff ns/iter diff % speedup
num::int_log::u8_log10_predictable 165 169 4 2.42% x 0.98
num::int_log::u8_log10_random 438 423 -15 -3.42% x 1.04
num::int_log::u8_log10_random_small 438 423 -15 -3.42% x 1.04
num::int_log::u16_log10_predictable 633 417 -216 -34.12% x 1.52
num::int_log::u16_log10_random 908 471 -437 -48.13% x 1.93
num::int_log::u16_log10_random_small 945 471 -474 -50.16% x 2.01
num::int_log::u32_log10_predictable 1,496 1,340 -156 -10.43% x 1.12
num::int_log::u32_log10_random 1,076 873 -203 -18.87% x 1.23
num::int_log::u32_log10_random_small 1,145 874 -271 -23.67% x 1.31
num::int_log::u64_log10_predictable 4,005 3,171 -834 -20.82% x 1.26
num::int_log::u64_log10_random 1,247 1,021 -226 -18.12% x 1.22
num::int_log::u64_log10_random_small 1,265 921 -344 -27.19% x 1.37
num::int_log::u128_log10_predictable 39,667 39,579 -88 -0.22% x 1.00
num::int_log::u128_log10_random 6,456 6,696 240 3.72% x 0.96
num::int_log::u128_log10_random_small 4,108 3,903 -205 -4.99% x 1.05
```
Benchmark on an M1 Mac Mini:
```
name old ns/iter new ns/iter diff ns/iter diff % speedup
num::int_log::u8_log10_predictable 143 130 -13 -9.09% x 1.10
num::int_log::u8_log10_random 375 325 -50 -13.33% x 1.15
num::int_log::u8_log10_random_small 376 325 -51 -13.56% x 1.16
num::int_log::u16_log10_predictable 500 322 -178 -35.60% x 1.55
num::int_log::u16_log10_random 794 405 -389 -48.99% x 1.96
num::int_log::u16_log10_random_small 1,035 405 -630 -60.87% x 2.56
num::int_log::u32_log10_predictable 1,144 894 -250 -21.85% x 1.28
num::int_log::u32_log10_random 832 786 -46 -5.53% x 1.06
num::int_log::u32_log10_random_small 832 787 -45 -5.41% x 1.06
num::int_log::u64_log10_predictable 2,681 2,057 -624 -23.27% x 1.30
num::int_log::u64_log10_random 1,015 806 -209 -20.59% x 1.26
num::int_log::u64_log10_random_small 1,004 795 -209 -20.82% x 1.26
num::int_log::u128_log10_predictable 56,825 56,526 -299 -0.53% x 1.01
num::int_log::u128_log10_random 9,056 8,861 -195 -2.15% x 1.02
num::int_log::u128_log10_random_small 1,528 1,527 -1 -0.07% x 1.00
```
The 128 bit case remains ridiculously slow because llvm fails to optimize division by a constant 128-bit value to multiplications. This could be worked around but it seems preferable to fix this in llvm.
From u32 up, table lookup (like suggested [here](https://github.com/rust-lang/rust/issues/70887#issuecomment-881099813)) is still faster, but requires a hardware `leading_zeros` to be viable, and might clog up the cache.
Add 'core::array::from_fn' and 'core::array::try_from_fn'
These auxiliary methods fill uninitialized arrays in a safe way and are particularly useful for elements that don't implement `Default`.
```rust
// Foo doesn't implement Default
struct Foo(usize);
let _array = core::array::from_fn::<_, _, 2>(|idx| Foo(idx));
```
Different from `FromIterator`, it is guaranteed that the array will be fully filled and no error regarding uninitialized state will be throw. In certain scenarios, however, the creation of an **element** can fail and that is why the `try_from_fn` function is also provided.
```rust
#[derive(Debug, PartialEq)]
enum SomeError {
Foo,
}
let array = core::array::try_from_fn(|i| Ok::<_, SomeError>(i));
assert_eq!(array, Ok([0, 1, 2, 3, 4]));
let another_array = core::array::try_from_fn(|_| Err(SomeError::Foo));
assert_eq!(another_array, Err(SomeError::Foo));
```
Fix Lower/UpperExp formatting for integers and precision zero
Fixes the integer part of #89493 (I daren't touch the floating-point formatting code). The issue is that the "subtracted" precision essentially behaves like extra trailing zeros, but this is not currently reflected in the code properly.
implement advance_(back_)_by on more iterators
Add more efficient, non-default implementations for `feature(iter_advance_by)` (#77404) on more iterators and adapters.
This PR only contains implementations where skipping over items doesn't elide any observable side-effects such as user-provided closures or `clone()` functions. I'll put those in a separate PR.
const fn for option copied, take & replace
Tracking issue: [#67441](https://github.com/rust-lang/rust/issues/67441)
Adding const fn for the copied, take and replace method of Option. Also adding necessary unit test.
It's my first contribution so I am pretty sure I don't know what I'm doing but there's a first for everything!
Constify ?-operator for Result and Option
Try to make `?`-operator usable in `const fn` with `Result` and `Option`, see #74935 . Note that the try-operator itself was constified in #87237.
TODO
* [x] Add tests for const T -> T conversions
* [x] cleanup commits
* [x] Remove `#![allow(incomplete_features)]`
* [?] Await decision in #86808 - I'm not sure
* [x] Await support for parsing `~const` in bootstrapping compiler
* [x] Tracking issue(s)? - #88674
Make `Duration` respect `width` when formatting using `Debug`
When printing or writing a `std::time::Duration` using `Debug` formatting, it previously completely ignored any specified `width`. This is unlike types like integers and floats, which do pad to `width`, for both `Display` and `Debug`, though not all types consider `width` in their `Debug` output (see e.g. #30164). Curiously, `Duration`'s `Debug` formatting *did* consider `precision`.
This PR makes `Duration` pad to `width` just like integers and floats, so that
```rust
format!("|{:8?}|", Duration::from_millis(1234))
```
returns
```
|1.234s |
```
Before you ask "who formats `Debug` output?", note that `Duration` doesn't actually implement `Display`, so `Debug` is currently the only way to format `Duration`s. I think that's wrong, and `Duration` should get a `Display` implementation, but in the meantime there's no harm in making the `Debug` formatting respect `width` rather than ignore it.
I chose the default alignment to be left-aligned. The general rule Rust uses is: numeric types are right-aligned by default, non-numeric types left-aligned. It wasn't clear to me whether `Duration` is a numeric type or not. The fact that a formatted `Duration` can end with suffixes of variable length (`"s"`, `"ms"`, `"µs"`, etc.) made me lean towards left-alignment, but it would be trivial to change it.
Fixes issue #88059.
These functions are unstable, but because they're inherent they still
introduce conflicts with stable trait functions in crates. Temporarily
rename them to fix these conflicts, until we can resolve those conflicts
in a better way.
This is achieved with a branchless bit-twiddling implementation of the
case x < 100_000, and using this as building block.
Benchmark on an Intel i7-8700K (Coffee Lake):
name old ns/iter new ns/iter diff ns/iter diff % speedup
num::int_log::u8_log10_predictable 165 169 4 2.42% x 0.98
num::int_log::u8_log10_random 438 423 -15 -3.42% x 1.04
num::int_log::u8_log10_random_small 438 423 -15 -3.42% x 1.04
num::int_log::u16_log10_predictable 633 417 -216 -34.12% x 1.52
num::int_log::u16_log10_random 908 471 -437 -48.13% x 1.93
num::int_log::u16_log10_random_small 945 471 -474 -50.16% x 2.01
num::int_log::u32_log10_predictable 1,496 1,340 -156 -10.43% x 1.12
num::int_log::u32_log10_random 1,076 873 -203 -18.87% x 1.23
num::int_log::u32_log10_random_small 1,145 874 -271 -23.67% x 1.31
num::int_log::u64_log10_predictable 4,005 3,171 -834 -20.82% x 1.26
num::int_log::u64_log10_random 1,247 1,021 -226 -18.12% x 1.22
num::int_log::u64_log10_random_small 1,265 921 -344 -27.19% x 1.37
num::int_log::u128_log10_predictable 39,667 39,579 -88 -0.22% x 1.00
num::int_log::u128_log10_random 6,456 6,696 240 3.72% x 0.96
num::int_log::u128_log10_random_small 4,108 3,903 -205 -4.99% x 1.05
Benchmark on an M1 Mac Mini:
name old ns/iter new ns/iter diff ns/iter diff % speedup
num::int_log::u8_log10_predictable 143 130 -13 -9.09% x 1.10
num::int_log::u8_log10_random 375 325 -50 -13.33% x 1.15
num::int_log::u8_log10_random_small 376 325 -51 -13.56% x 1.16
num::int_log::u16_log10_predictable 500 322 -178 -35.60% x 1.55
num::int_log::u16_log10_random 794 405 -389 -48.99% x 1.96
num::int_log::u16_log10_random_small 1,035 405 -630 -60.87% x 2.56
num::int_log::u32_log10_predictable 1,144 894 -250 -21.85% x 1.28
num::int_log::u32_log10_random 832 786 -46 -5.53% x 1.06
num::int_log::u32_log10_random_small 832 787 -45 -5.41% x 1.06
num::int_log::u64_log10_predictable 2,681 2,057 -624 -23.27% x 1.30
num::int_log::u64_log10_random 1,015 806 -209 -20.59% x 1.26
num::int_log::u64_log10_random_small 1,004 795 -209 -20.82% x 1.26
num::int_log::u128_log10_predictable 56,825 56,526 -299 -0.53% x 1.01
num::int_log::u128_log10_random 9,056 8,861 -195 -2.15% x 1.02
num::int_log::u128_log10_random_small 1,528 1,527 -1 -0.07% x 1.00
The 128 bit case remains ridiculously slow because llvm fails to optimize division by
a constant 128-bit value to multiplications. This could be worked around but it seems
preferable to fix this in llvm.
From u32 up, table lookup (like suggested here
https://github.com/rust-lang/rust/issues/70887#issuecomment-881099813) is still
faster, but requires a hardware leading_zero to be viable, and might clog up the
cache.
Stabilize `UnsafeCell::raw_get()`
This PR stabilizes the associated function `UnsafeCell::raw_get()`. The FCP has [already completed](https://github.com/rust-lang/rust/issues/66358#issuecomment-899095068). While there was some discussion about the naming after the close of the FCP, it looks like people have agreed on this name. Still, it would probably be best if a `libs-api` member had a look at this and stated whether more discussion is needed.
While I was at it, I added some tests for `UnsafeCell`, because there were barely any.
Closes#66358.
fix: move test that require mut to another
Adding TODOs for Option::take and Option::copied
TODO to FIXME + moving const stability under normal
Moving const stability attr under normal stab attr
move more rustc stability attributes
Added the `Option::unzip()` method
* Adds the `Option::unzip()` method to turn an `Option<(T, U)>` into `(Option<T>, Option<U>)` under the `unzip_option` feature
* Adds tests for both `Option::unzip()` and `Option::zip()`, I noticed that `.zip()` didn't have any
* Adds `#[inline]` to a few of `Option`'s methods that were missing it
implement TrustedLen for Flatten/FlatMap if the U: IntoIterator == [T; N]
This only works if arrays are passed directly instead of array iterators
because we need to be sure that they have not been advanced before
Flatten does its size calculation.
resolves#87094
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.
# Summary
Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).
Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](0e2b5d163d/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.
In addition, [edge-cases](https://github.com/rust-lang/rust/issues/85234) in Rust's [dec2flt](868c702d0c/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.
Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.
```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>(); // Err(ParseFloatError { kind: Invalid })
```
# Solution
This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.
**Documentation**
Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:
```rust
// Round-to-even only happens for negative values of q
// when q ≥ −4 in the 64-bit case and when q ≥ −17 in
// the 32-bitcase.
//
// When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
// have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
// 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
//
// When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
// so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
// or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
// (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
// or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
//
// Thus we have that we only need to round ties to even when
// we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
// (in the 32-bit case). In both cases,the power of five(5^|q|)
// fits in a 64-bit word.
const MIN_EXPONENT_ROUND_TO_EVEN: i32;
const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```
This ensures maintainability of the code base.
**Improvements for Disguised Fast-Path Cases**
The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.
However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in https://github.com/rust-lang/rust/issues/85198.
**Digit Parsing Improvements**
Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.
**Unsafe Changes**
Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.
```rust
impl AsciiStr {
#[inline]
pub fn step_by(&mut self, n: usize) -> &mut Self {
unsafe { self.ptr = self.ptr.add(n) };
self
}
}
...
#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
// the first character is 'e'/'E' and scientific mode is enabled
let start = *s;
s.step();
...
}
```
The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.
```rust
impl AsciiStr {
/// Advance the view by n, advancing it in-place to (n..).
pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
// SAFETY: same as step_by, safe as long n is less than the buffer length
self.ptr = unsafe { self.ptr.add(n) };
self
}
}
...
/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
let start = *s;
// SAFETY: the first character is 'e'/'E' and scientific mode is enabled
unsafe {
s.step();
}
...
}
```
This allows us to trivially demonstrate the new implementation of dec2flt is safe.
**Inline Annotations Have Been Removed**
In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](https://github.com/aldanor/fast-float-rust/issues/15#issuecomment-864485157).
**Fixed Correctness Tests**
Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.
**Undefined Behavior**
An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:
```rust
#[inline]
pub fn check_len(&self, n: usize) -> bool {
unsafe { self.ptr.add(n) <= self.end }
}
```
And the new implementation is as follows:
```rust
/// Check if the slice at least `n` length.
fn check_len(&self, n: usize) -> bool {
n <= self.as_ref().len()
}
```
Note that this has since been fixed in [fast-float-rust](https://github.com/aldanor/fast-float-rust/pull/29).
**Inferring Binary Exponents**
Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](868c702d0c/library/core/src/num/dec2flt/table.rs (L8)).
# Code Size
The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.
**new**
Using rustc version 1.55.0-dev.
opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K
**old**
Using rustc version 1.53.0-nightly.
opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K
# Correctness
The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.
# Issues Addressed
This will fix and close the following issues:
- resolves#85198
- resolves#85214
- resolves#85234
- fixes#31407
- fixes#31109
- fixes#53015
- resolves#68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
Implementation is based off fast-float-rust, with a few notable changes.
- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for #31109 and #31407 in core tests.
- Added regression tests for #31109 and #31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.
- resolves#85198
- resolves#85214
- resolves#85234
- fixes#31407
- fixes#31109
- fixes#53015
- resolves#68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
Due to #20400 the corresponding TrustedLen impls need a helper trait
instead of directly adding `Item = &[T;N]` bounds.
Since TrustedLen is a public trait this in turn means
the helper trait needs to be public. Since it's just a workaround
for a compiler deficit it's marked hidden, unstable and unsafe.
This only works if arrays are passed directly instead of array iterators
because we need to be sure that they have not been advanced before
Flatten does its size calculation.
Add Integer::log variants
_This is another attempt at landing https://github.com/rust-lang/rust/pull/70835, which was approved by the libs team but failed on Android tests through Bors. The text copied here is from the original issue. The only change made so far is the addition of non-`checked_` variants of the log methods._
_Tracking issue: #70887_
---
This implements `{log,log2,log10}` methods for all integer types. The implementation was provided by `@substack` for use in the stdlib.
_Note: I'm not big on math, so this PR is a best effort written with limited knowledge. It's likely I'll be getting things wrong, but happy to learn and correct. Please bare with me._
## Motivation
Calculating the logarithm of a number is a generally useful operation. Currently the stdlib only provides implementations for floats, which means that if we want to calculate the logarithm for an integer we have to cast it to a float and then back to an int.
> would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure
_— [`@substack,` 2020-03-08](https://twitter.com/substack/status/1236445105197727744)_
At higher numbers converting from an integer to a float we also risk overflows. This means that Rust currently only provides log operations for a limited set of integers.
The process of doing log operations by converting between floats and integers is also prone to rounding errors. In the following example we're trying to calculate `base10` for an integer. We might try and calculate the `base2` for the values, and attempt [a base swap](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules) to arrive at `base10`. However because we're performing intermediate rounding we arrive at the wrong result:
```rust
// log10(900) = ~2.95 = 2
dbg!(900f32.log10() as u64);
// log base change rule: logb(x) = logc(x) / logc(b)
// log2(900) / log2(10) = 9/3 = 3
dbg!((900f32.log2() as u64) / (10f32.log2() as u64));
```
_[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)_
This is somewhat nuanced as a lot of the time it'll work well, but in real world code this could lead to some hard to track bugs. By providing correct log implementations directly on integers we can help prevent errors around this.
## Implementation notes
I checked whether LLVM intrinsics existed before implementing this, and none exist yet. ~~Also I couldn't really find a better way to write the `ilog` function. One option would be to make it a private method on the number, but I didn't see any precedent for that. I also didn't know where to best place the tests, so I added them to the bottom of the file. Even though they might seem like quite a lot they take no time to execute.~~
## References
- [Log rules](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules)
- [Rounding error playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)
- [substack's tweet asking about integer log2 in the stdlib](https://twitter.com/substack/status/1236445105197727744)
- [Integer Logarithm, A. Jaffer 2008](https://people.csail.mit.edu/jaffer/III/ilog.pdf)
core: add unstable no_fp_fmt_parse to disable float formatting code
In some projects (e.g. kernel), floating point is forbidden. They can disable
hardware floating point support and use `+soft-float` to avoid fp instructions
from being generated, but as libcore contains the formatting code for `f32`
and `f64`, some fp intrinsics are depended. One could define stubs for these
intrinsics that just panic [1], but it means that if any formatting functions
are accidentally used, mistake can only be caught during the runtime rather
than during compile-time or link-time, and they consume a lot of space without
LTO.
This patch provides an unstable cfg `no_fp_fmt_parse` to disable these.
A panicking stub is still provided for the `Debug` implementation (unfortunately)
because there are some SIMD types that use `#[derive(Debug)]`.
[1]: https://lkml.org/lkml/2021/4/14/1028
* {:.PREC?} already had legitimately useful behavior (recursive formatting of structs using
fixed precision for floats) and I suspect that changes to the output there would be unwelcome.
(besides, precision introduces sinister edge cases where a number can be rounded up to one
of the thresholds)
Thus, the new behavior of Debug is, "dynamically switch to exponential, but only if there's
no precision."
* This could not be implemented in terms of float_to_decimal_common without repeating the branch
on precision, so 'float_to_general_debug' is a new function. The name is '_debug' instead of
'_common' because the considerations in the previous bullet make this logic pretty specific
to Debug.
* 'float_to_decimal_common' is now only used by Display, so I inlined the min_precision argument
and renamed the function accordingly.
This was unsound since a panic in a.next_back() would result in the
length not being updated which would then lead to the same element
being revisited in the side-effect preserving code.
to_digit simplification (less jumps)
I just realised we might be able to make use of the fact that changing case in ascii is easy to help simplify to_digit some more.
It looks a bit cleaner and it looks like it's less jumps and there's less instructions in the generated assembly:
https://godbolt.org/z/84Erh5dhz
The benchmarks don't really tell me much. Maybe a slight improvement on the var radix.
Before:
```
test char::methods::bench_to_digit_radix_10 ... bench: 53,819 ns/iter (+/- 8,314)
test char::methods::bench_to_digit_radix_16 ... bench: 57,265 ns/iter (+/- 10,730)
test char::methods::bench_to_digit_radix_2 ... bench: 55,077 ns/iter (+/- 5,431)
test char::methods::bench_to_digit_radix_36 ... bench: 56,549 ns/iter (+/- 3,248)
test char::methods::bench_to_digit_radix_var ... bench: 43,848 ns/iter (+/- 3,189)
test char::methods::bench_to_digit_radix_10 ... bench: 51,707 ns/iter (+/- 10,946)
test char::methods::bench_to_digit_radix_16 ... bench: 52,835 ns/iter (+/- 2,689)
test char::methods::bench_to_digit_radix_2 ... bench: 51,012 ns/iter (+/- 2,746)
test char::methods::bench_to_digit_radix_36 ... bench: 53,210 ns/iter (+/- 8,645)
test char::methods::bench_to_digit_radix_var ... bench: 40,386 ns/iter (+/- 4,711)
test char::methods::bench_to_digit_radix_10 ... bench: 54,088 ns/iter (+/- 5,677)
test char::methods::bench_to_digit_radix_16 ... bench: 55,972 ns/iter (+/- 17,229)
test char::methods::bench_to_digit_radix_2 ... bench: 52,083 ns/iter (+/- 2,425)
test char::methods::bench_to_digit_radix_36 ... bench: 54,132 ns/iter (+/- 1,548)
test char::methods::bench_to_digit_radix_var ... bench: 41,250 ns/iter (+/- 5,299)
```
After:
```
test char::methods::bench_to_digit_radix_10 ... bench: 48,907 ns/iter (+/- 19,449)
test char::methods::bench_to_digit_radix_16 ... bench: 52,673 ns/iter (+/- 8,122)
test char::methods::bench_to_digit_radix_2 ... bench: 48,509 ns/iter (+/- 2,885)
test char::methods::bench_to_digit_radix_36 ... bench: 50,526 ns/iter (+/- 4,610)
test char::methods::bench_to_digit_radix_var ... bench: 38,618 ns/iter (+/- 3,180)
test char::methods::bench_to_digit_radix_10 ... bench: 54,202 ns/iter (+/- 6,994)
test char::methods::bench_to_digit_radix_16 ... bench: 56,585 ns/iter (+/- 8,448)
test char::methods::bench_to_digit_radix_2 ... bench: 50,548 ns/iter (+/- 1,674)
test char::methods::bench_to_digit_radix_36 ... bench: 52,749 ns/iter (+/- 2,576)
test char::methods::bench_to_digit_radix_var ... bench: 40,215 ns/iter (+/- 3,327)
test char::methods::bench_to_digit_radix_10 ... bench: 50,233 ns/iter (+/- 22,272)
test char::methods::bench_to_digit_radix_16 ... bench: 50,841 ns/iter (+/- 19,981)
test char::methods::bench_to_digit_radix_2 ... bench: 50,386 ns/iter (+/- 4,555)
test char::methods::bench_to_digit_radix_36 ... bench: 52,369 ns/iter (+/- 2,737)
test char::methods::bench_to_digit_radix_var ... bench: 40,417 ns/iter (+/- 2,766)
```
I removed the likely as it resulted in a few less instructions. (It's not been in there long - I added it in the last to_digit iteration).
Make copy/copy_nonoverlapping fn's again
Make copy/copy_nonoverlapping fn's again, rather than intrinsics.
This a short-term change to address issue #84297.
It effectively reverts PRs #81167#81238 (and part of #82967), #83091, and parts of #79684.
Update standard library for IntoIterator implementation of arrays
This PR partially resolves issue #84513 of updating the standard library part.
I haven't found any remaining doctest examples which are using iterators over e.g. &i32 instead of just i32 in the standard library. Can anyone point me to them if there's remaining any?
Thanks!
r? ```@m-ou-se```
While stdlib implementations of the unchecked methods require unchecked
math, there is no reason to gate it behind this for external users. The
reasoning for a separate `step_trait_ext` feature is unclear, and as
such has been merged as well.
Implement indexing slices with pairs of core::ops::Bound<usize>
Closes#49976.
I am not sure about code duplication between `check_range` and `into_maybe_range`. Should be former implemented in terms of the latter? Also this PR doesn't address code duplication between `impl SliceIndex for Range*`.
Format `Struct { .. }` on one line even with `{:#?}`.
The result of `debug_struct("A").finish_non_exhaustive()` before this change:
```
A {
..
}
```
And after this change:
```
A { .. }
```
If there's any fields, the result stays unchanged:
```
A {
field: value,
..
}
Stabilize `peekable_peek_mut`
Resolves#78302. Also adds some documentation on `std::iter::Iterator::peekable()` regarding the new method.
The feature was added in #77491 in Nov' 20, which is recently, but the feature seems reasonably small. Never did a stabilization-pr, excuse my ignorance if there is a protocol I'm not aware of.
Stabilize cmp_min_max_by
I would like to propose cmp::{min_by, min_by_key, max_by, max_by_key}
for stabilization.
These are relatively simple and seemingly uncontroversial functions and
have been unchanged in unstable for a while now.
Closes: #64460
I would like to propose cmp::{min_by, min_by_key, max_by, max_by_key}
for stabilization.
These are relatively simple and seemingly uncontroversial functions and
have been unchanged in unstable for a while now.
Add IEEE 754 compliant fmt/parse of -0, infinity, NaN
This pull request improves the Rust float formatting/parsing libraries to comply with IEEE 754's formatting expectations around certain special values, namely signed zero, the infinities, and NaN. It also adds IEEE 754 compliance tests that, while less stringent in certain places than many of the existing flt2dec/dec2flt capability tests, are intended to serve as the beginning of a roadmap to future compliance with the standard. Some relevant documentation is also adjusted with clarifying remarks.
This PR follows from discussion in https://github.com/rust-lang/rfcs/issues/1074, and closes#24623.
The most controversial change here is likely to be that -0 is now printed as -0. Allow me to explain: While there appears to be community support for an opt-in toggle of printing floats as if they exist in the naively expected domain of numbers, i.e. not the extended reals (where floats live), IEEE 754-2019 is clear that a float converted to a string should be capable of being transformed into the original floating point bit-pattern when it satisfies certain conditions (namely, when it is an actual numeric value i.e. not a NaN and the original and destination float width are the same). -0 is given special attention here as a value that should have its sign preserved. In addition, the vast majority of other programming languages not only output `-0` but output `-0.0` here.
While IEEE 754 offers a broad leeway in how to handle producing what it calls a "decimal character sequence", it is clear that the operations a language provides should be capable of round tripping, and it is confusing to advertise the f32 and f64 types as binary32 and binary64 yet have the most basic way of producing a string and then reading it back into a floating point number be non-conformant with the standard. Further, existing documentation suggested that e.g. -0 would be printed with -0 regardless of the presence of the `+` fmt character, but it prints "+0" instead if given such (which was what led to the opening of #24623).
There are other parsing and formatting issues for floating point numbers which prevent Rust from complying with the standard, as well as other well-documented challenges on the arithmetic level, but I hope that this can be the beginning of motion towards solving those challenges.
Add Result::into_err where the Ok variant is the never type
Equivalent of #66045 but for the inverse situation where `T: Into<!>` rather than `E: Into<!>`.
I'm using the same feature gate name. I can't see why one of these methods would be OK to stabilize but not the other.
Tracking issue: #61695
Remove Option::{unwrap_none, expect_none}.
This removes `Option::unwrap_none` and `Option::expect_none` since we're not going to stabilize them, see https://github.com/rust-lang/rust/issues/62633.
Closes#62633
This commit removes the previous mechanism of differentiating
between "Debug" and "Display" formattings for the sign of -0 so as
to comply with the IEEE 754 standard's requirements on external
character sequences preserving various attributes of a floating
point representation.
In addition, numerous tests are fixed.
Stabilize `unsafe_op_in_unsafe_fn` lint
This makes it possible to override the level of the `unsafe_op_in_unsafe_fn`, as proposed in https://github.com/rust-lang/rust/issues/71668#issuecomment-729770896.
Tracking issue: #71668
r? ```@nikomatsakis``` cc ```@SimonSapin``` ```@RalfJung```
# Stabilization report
This is a stabilization report for `#![feature(unsafe_block_in_unsafe_fn)]`.
## Summary
Currently, the body of unsafe functions is an unsafe block, i.e. you can perform unsafe operations inside.
The `unsafe_op_in_unsafe_fn` lint, stabilized here, can be used to change this behavior, so performing unsafe operations in unsafe functions requires an unsafe block.
For now, the lint is allow-by-default, which means that this PR does not change anything without overriding the lint level.
For more information, see [RFC 2585](https://github.com/rust-lang/rfcs/blob/master/text/2585-unsafe-block-in-unsafe-fn.md)
### Example
```rust
// An `unsafe fn` for demonstration purposes.
// Calling this is an unsafe operation.
unsafe fn unsf() {}
// #[allow(unsafe_op_in_unsafe_fn)] by default,
// the behavior of `unsafe fn` is unchanged
unsafe fn allowed() {
// Here, no `unsafe` block is needed to
// perform unsafe operations...
unsf();
// ...and any `unsafe` block is considered
// unused and is warned on by the compiler.
unsafe {
unsf();
}
}
#[warn(unsafe_op_in_unsafe_fn)]
unsafe fn warned() {
// Removing this `unsafe` block will
// cause the compiler to emit a warning.
// (Also, no "unused unsafe" warning will be emitted here.)
unsafe {
unsf();
}
}
#[deny(unsafe_op_in_unsafe_fn)]
unsafe fn denied() {
// Removing this `unsafe` block will
// cause a compilation error.
// (Also, no "unused unsafe" warning will be emitted here.)
unsafe {
unsf();
}
}
```
Prevent specialized ZipImpl from calling `__iterator_get_unchecked` twice with the same index
Fixes#82291
It's open for review, but conflicts with #82289, wait before merging. The conflict involves only the new test, so it should be rather trivial to fix.
Improve slice.binary_search_by()'s best-case performance to O(1)
This PR aimed to improve the [slice.binary_search_by()](https://doc.rust-lang.org/std/primitive.slice.html#method.binary_search_by)'s best-case performance to O(1).
# Noticed
I don't know why the docs of `binary_search_by` said `"If there are multiple matches, then any one of the matches could be returned."`, but the implementation isn't the same thing. Actually, it returns the **last one** if multiple matches found.
Then we got two options:
## If returns the last one is the correct or desired result
Then I can rectify the docs and revert my changes.
## If the docs are correct or desired result
Then my changes can be merged after fully reviewed.
However, if my PR gets merged, another issue raised: this could be a **breaking change** since if multiple matches found, the returning order no longer the last one instead of it could be any one.
For example:
```rust
let mut s = vec![0, 1, 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55];
let num = 1;
let idx = s.binary_search(&num);
s.insert(idx, 2);
// Old implementations
assert_eq!(s, [0, 1, 1, 1, 1, 2, 2, 3, 5, 8, 13, 21, 34, 42, 55]);
// New implementations
assert_eq!(s, [0, 1, 1, 1, 2, 1, 2, 3, 5, 8, 13, 21, 34, 42, 55]);
```
# Benchmarking
**Old implementations**
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 4)
test slice::binary_search_l1_with_dups ... bench: 59 ns/iter (+/- 3)
test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 5)
test slice::binary_search_l2_with_dups ... bench: 77 ns/iter (+/- 17)
test slice::binary_search_l3 ... bench: 183 ns/iter (+/- 23)
test slice::binary_search_l3_with_dups ... bench: 185 ns/iter (+/- 19)
```
**New implementations (1)**
Implemented by this PR.
```rust
if cmp == Equal {
return Ok(mid);
} else if cmp == Less {
base = mid
}
```
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 58 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 4)
test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 6)
test slice::binary_search_l3 ... bench: 200 ns/iter (+/- 30)
test slice::binary_search_l3_with_dups ... bench: 157 ns/iter (+/- 6)
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 8)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 2)
test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 2)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 2)
test slice::binary_search_l3 ... bench: 198 ns/iter (+/- 21)
test slice::binary_search_l3_with_dups ... bench: 158 ns/iter (+/- 11)
```
**New implementations (2)**
Suggested by `@nbdd0121` in [comment](https://github.com/rust-lang/rust/pull/74024#issuecomment-665430239).
```rust
base = if cmp == Greater { base } else { mid };
if cmp == Equal { break }
```
```sh
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 7)
test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 5)
test slice::binary_search_l2 ... bench: 75 ns/iter (+/- 3)
test slice::binary_search_l2_with_dups ... bench: 56 ns/iter (+/- 3)
test slice::binary_search_l3 ... bench: 195 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 7)
$ ./x.py bench --stage 1 library/libcore
test slice::binary_search_l1 ... bench: 57 ns/iter (+/- 2)
test slice::binary_search_l1_with_dups ... bench: 38 ns/iter (+/- 2)
test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 11)
test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 4)
test slice::binary_search_l3 ... bench: 194 ns/iter (+/- 15)
test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 18)
```
I run some benchmarking testings against on two implementations. The new implementation has a lot of improvement in duplicates cases, while in `binary_search_l3` case, it's a little bit slower than the old one.
Implement NOOP_METHOD_CALL lint
Implements the beginnings of https://github.com/rust-lang/lang-team/issues/67 - a lint for detecting noop method calls (e.g, calling `<&T as Clone>::clone()` when `T: !Clone`).
This PR does not fully realize the vision and has a few limitations that need to be addressed either before merging or in subsequent PRs:
* [ ] No UFCS support
* [ ] The warning message is pretty plain
* [ ] Doesn't work for `ToOwned`
The implementation uses [`Instance::resolve`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/instance/struct.Instance.html#method.resolve) which is normally later in the compiler. It seems that there are some invariants that this function relies on that we try our best to respect. For instance, it expects substitutions to have happened, which haven't yet performed, but we check first for `needs_subst` to ensure we're dealing with a monomorphic type.
Thank you to ```@davidtwco,``` ```@Aaron1011,``` and ```@wesleywiser``` for helping me at various points through out this PR ❤️.
Make ptr::write const
~~The code in this PR as of right now is not much more than an experiment.~~
~~This should, if I am not mistaken, in theory compile and pass the tests once the bootstraping compiler is updated. Thus the PR is blocked on that which should happen some time after the February the 9th. Also we might want to wait for #79989 to avoid regressing performance due to using `mem::forget` over `intrinsics::forget`~~.
Add tests for Atomic*::fetch_{min,max}
This ensures that all atomic operations except for fences are tested. This has been useful to test my work on using atomic instructions for atomic operations in cg_clif instead of a global lock.