Commit Graph

41 Commits

Author SHA1 Message Date
Josh Triplett
3ece63b64e Temporarily rename int_roundings functions to avoid conflicts
These functions are unstable, but because they're inherent they still
introduce conflicts with stable trait functions in crates. Temporarily
rename them to fix these conflicts, until we can resolve those conflicts
in a better way.
2021-09-22 13:56:01 -07:00
Falk Hüffner
d760c33183 Change return type for T::{log,log2,log10} to u32. The value is at
most 128, and this is consistent with using u32 for small values
elsewhere (e.g. BITS, count_ones, leading_zeros).
2021-09-05 17:09:21 +02:00
Jacob Pratt
727a4fc7e3
Implement #88581 2021-09-02 01:53:54 -04:00
Albin Hedman
c8bf5ed628
Add test for int to float 2021-08-07 19:03:34 +02:00
Albin Hedman
09928a9a20
Add tests 2021-08-07 19:03:33 +02:00
bors
f502bd3abd Auto merge of #86761 - Alexhuszagh:master, r=estebank
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.

# Summary

Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).

Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](0e2b5d163d/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.

In addition, [edge-cases](https://github.com/rust-lang/rust/issues/85234) in Rust's [dec2flt](868c702d0c/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })
```

# Solution

This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.

**Documentation**

Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:

```rust
    // Round-to-even only happens for negative values of q
    // when q ≥ −4 in the 64-bit case and when q ≥ −17 in
    // the 32-bitcase.
    //
    // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
    // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
    // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
    //
    // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
    // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
    // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
    // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
    // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
    //
    // Thus we have that we only need to round ties to even when
    // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
    // (in the 32-bit case). In both cases,the power of five(5^|q|)
    // fits in a 64-bit word.
    const MIN_EXPONENT_ROUND_TO_EVEN: i32;
    const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```

This ensures maintainability of the code base.

**Improvements for Disguised Fast-Path Cases**

The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.

However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in https://github.com/rust-lang/rust/issues/85198.

**Digit Parsing Improvements**

Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.

**Unsafe Changes**

Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.

```rust
impl AsciiStr {
    #[inline]
    pub fn step_by(&mut self, n: usize) -> &mut Self {
        unsafe { self.ptr = self.ptr.add(n) };
        self
    }
}

...

#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    // the first character is 'e'/'E' and scientific mode is enabled
    let start = *s;
    s.step();
    ...
}
```

The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.

```rust
impl AsciiStr {
    /// Advance the view by n, advancing it in-place to (n..).
    pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
        // SAFETY: same as step_by, safe as long n is less than the buffer length
        self.ptr = unsafe { self.ptr.add(n) };
        self
    }
}

...

/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    let start = *s;
    // SAFETY: the first character is 'e'/'E' and scientific mode is enabled
    unsafe {
        s.step();
    }
    ...
}
```

This allows us to trivially demonstrate the new implementation of dec2flt is safe.

**Inline Annotations Have Been Removed**

In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](https://github.com/aldanor/fast-float-rust/issues/15#issuecomment-864485157).

**Fixed Correctness Tests**

Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.

**Undefined Behavior**

An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:

```rust
    #[inline]
    pub fn check_len(&self, n: usize) -> bool {
        unsafe { self.ptr.add(n) <= self.end }
    }
```

And the new implementation is as follows:

```rust
    /// Check if the slice at least `n` length.
    fn check_len(&self, n: usize) -> bool {
        n <= self.as_ref().len()
    }
```

Note that this has since been fixed in [fast-float-rust](https://github.com/aldanor/fast-float-rust/pull/29).

**Inferring Binary Exponents**

Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](868c702d0c/library/core/src/num/dec2flt/table.rs (L8)).

# Code Size

The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.

**new**

Using rustc version 1.55.0-dev.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K

**old**

Using rustc version 1.53.0-nightly.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K

# Correctness

The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham  [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.

# Issues Addressed

This will fix and close the following issues:

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 12:56:22 +00:00
Alex Huszagh
8752b40369 Changed dec2flt to use the Eisel-Lemire algorithm.
Implementation is based off fast-float-rust, with a few notable changes.

- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for #31109 and #31407 in core tests.
- Added regression tests for #31109 and #31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 00:30:34 -05:00
Trevor Spiteri
b0f98c60a6 test integer log10 values close to all powers of 10 2021-07-07 14:07:32 +02:00
Yuki Okushi
9bbc470e97
Rollup merge of #80918 - yoshuawuyts:int-log2, r=m-ou-se
Add Integer::log variants

_This is another attempt at landing https://github.com/rust-lang/rust/pull/70835, which was approved by the libs team but failed on Android tests through Bors. The text copied here is from the original issue. The only change made so far is the addition of non-`checked_` variants of the log methods._

_Tracking issue: #70887_

---

This implements `{log,log2,log10}` methods for all integer types. The implementation was provided by `@substack` for use in the stdlib.

_Note: I'm not big on math, so this PR is a best effort written with limited knowledge. It's likely I'll be getting things wrong, but happy to learn and correct. Please bare with me._

## Motivation
Calculating the logarithm of a number is a generally useful operation. Currently the stdlib only provides implementations for floats, which means that if we want to calculate the logarithm for an integer we have to cast it to a float and then back to an int.

> would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure

_— [`@substack,` 2020-03-08](https://twitter.com/substack/status/1236445105197727744)_

At higher numbers converting from an integer to a float we also risk overflows. This means that Rust currently only provides log operations for a limited set of integers.

The process of doing log operations by converting between floats and integers is also prone to rounding errors. In the following example we're trying to calculate `base10` for an integer. We might try and calculate the `base2` for the values, and attempt [a base swap](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules) to arrive at `base10`. However because we're performing intermediate rounding we arrive at the wrong result:

```rust
// log10(900) = ~2.95 = 2
dbg!(900f32.log10() as u64);

// log base change rule: logb(x) = logc(x) / logc(b)
// log2(900) / log2(10) = 9/3 = 3
dbg!((900f32.log2() as u64) / (10f32.log2() as u64));
```
_[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)_

This is somewhat nuanced as a lot of the time it'll work well, but in real world code this could lead to some hard to track bugs. By providing correct log implementations directly on integers we can help prevent errors around this.

## Implementation notes

I checked whether LLVM intrinsics existed before implementing this, and none exist yet. ~~Also I couldn't really find a better way to write the `ilog` function. One option would be to make it a private method on the number, but I didn't see any precedent for that. I also didn't know where to best place the tests, so I added them to the bottom of the file. Even though they might seem like quite a lot they take no time to execute.~~

## References

- [Log rules](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules)
- [Rounding error playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)
- [substack's tweet asking about integer log2 in the stdlib](https://twitter.com/substack/status/1236445105197727744)
- [Integer Logarithm, A. Jaffer 2008](https://people.csail.mit.edu/jaffer/III/ilog.pdf)
2021-07-07 12:17:32 +09:00
Yoshua Wuyts
9f579968cd Add Integer::{log,log2,log10} variants 2021-06-25 18:52:46 +02:00
Gary Guo
37647d1733 Move flt2dec::{Formatted, Part} to dedicated module
They are used by integer formatting as well and is not exclusive to float.
2021-06-06 02:54:51 +01:00
Jubilee Young
74db93ed2d Preserve signed zero on roundtrip
This commit removes the previous mechanism of differentiating
between "Debug" and "Display" formattings for the sign of -0 so as
to comply with the IEEE 754 standard's requirements on external
character sequences preserving various attributes of a floating
point representation.

In addition, numerous tests are fixed.
2021-03-22 17:02:09 -07:00
Jubilee Young
fc9b234928 Add IEEE754 tests 2021-03-22 17:02:06 -07:00
Smitty
e10555c658 Re-enable all num tests on WASM
This was partially done by #47365, but a few tests
were missed in that PR.
2021-01-15 16:58:44 -05:00
Philippe Laflamme
64d695b753
Adds tests to ensure some base op traits exist.
These tests invoke the various op traits using all accepted types they
are implemented for as well as for references to those types.

This fixes #49660 and ensures the following implementations exist:

* `Add`, `Sub`, `Mul`, `Div`, `Rem`
  * `T op T`, `T op &T`, `&T op T` and `&T op &T`
  * for all integer and floating point types
* `AddAssign`, `SubAssign`, `MulAssign`, `DivAssign`, `RemAssign`
  * `&mut T op T` and `&mut T op &T`
  * for all integer and floating point types
* `Neg`
  * `op T` and `op &T`
  * for all signed integer and floating point types
* `Not`
  * `op T` and `op &T`
  * for `bool`
* `BitAnd`, `BitOr`, `BitXor`
  * `T op T`, `T op &T`, `&T op T` and `&T op &T`
  * for all integer types and bool
* `BitAndAssign`, `BitOrAssign`, `BitXorAssign`
  * `&mut T op T` and `&mut T op &T`
  * for all integer types and bool
* `Shl`, `Shr`
  * `L op R`, `L op &R`, `&L op R` and `&L op &R`
  * for all pairs of integer types
* `ShlAssign`, `ShrAssign`
  * `&mut L op R`, `&mut L op &R`
  * for all pairs of integer types
2021-01-13 23:14:00 -05:00
Philippe Laflamme
8ddad18283
Avoid ident concatenation in macro.
AFAIK it isn't currently possible to do this. It is also more in line with other tests in the surrounding modules.
2021-01-13 23:13:55 -05:00
Philippe Laflamme
872dc60ed2
Fix missing mod declaration for Wrapping tests. 2021-01-13 23:13:49 -05:00
Christiaan Dirkx
be554c4101 Make ui test that are run-pass and do not test the compiler itself library tests 2020-11-30 02:47:32 +01:00
bstrie
90a2e5e3fe Update tests to remove old numeric constants
Part of #68490.

Care has been taken to leave the old consts where appropriate, for testing backcompat regressions, module shadowing, etc. The intrinsics docs were accidentally referring to some methods on f64 as std::f64, which I changed due to being contrary with how we normally disambiguate the shadow module from the primitive. In one other place I changed std::u8 to std::ops since it was just testing path handling in macros.

For places which have legitimate uses of the old consts, deprecated attributes have been optimistically inserted. Although currently unnecessary, they exist to emphasize to any future deprecation effort the necessity of these specific symbols and prevent them from being accidentally removed.
2020-11-29 00:55:55 -05:00
Christiaan Dirkx
6554086526 Add u128 and i128 integer tests 2020-11-14 20:27:08 +01:00
Dylan DPC
d69ee57f97
Rollup merge of #77640 - ethanboxx:int_error_matching_attempt_2, r=KodrAus
Refactor IntErrorKind to avoid "underflow" terminology

This PR is a continuation of #76455

# Changes

- `Overflow` renamed to `PosOverflow` and `Underflow` renamed to `NegOverflow` after discussion in #76455
- Changed some of the parsing code to return `InvalidDigit` rather than `Empty` for strings "+" and "-". https://users.rust-lang.org/t/misleading-error-in-str-parse-for-int-types/49178
- Carry the problem `char` with the `InvalidDigit` variant.
- Necessary changes were made to the compiler as it depends on `int_error_matching`.
- Redid tests to match on specific errors.

r? ```@KodrAus```
2020-11-09 01:13:25 +01:00
chansuke
f9b139f9c4 Add mod nan for test 2020-11-05 12:57:18 +09:00
chansuke
97d5a1be3f Fix format 2020-11-05 08:40:04 +09:00
chansuke
5855fb7b79 Move f64::NAN ui tests into library 2020-11-05 08:32:07 +09:00
Ethan Brierley
ad2d93da1f Apply suggested changes 2020-10-26 18:14:12 +00:00
est31
a0fc455d30 Replace absolute paths with relative ones
Modern compilers allow reaching external crates
like std or core via relative paths in modules
outside of lib.rs and main.rs.
2020-10-13 14:16:45 +02:00
Ethan Brierley
f233abb909 Add comment to helper function 2020-10-07 08:02:36 +01:00
Ethan Brierley
1e7e2e40e4 remove OnlySign in favour of InvalidDigit 2020-10-06 22:42:33 +01:00
Ethan Brierley
83d294f06a Bring char along with InvalidDigit 2020-10-06 19:05:25 +01:00
Ethan Brierley
c027844795 Fill in things needed to stabilize int_error_matching 2020-10-06 14:06:25 +01:00
Jonas Schievink
389f7cf7d6
Rollup merge of #76745 - workingjubilee:move-wrapping-tests, r=matklad
Move Wrapping<T> ui tests into library

Part of #76268
r? @matklad
2020-10-03 00:31:08 +02:00
Jubilee Young
4e973966b9 Remove unnecessary mod-cfg 2020-10-02 11:40:57 -07:00
Mara Bos
74952b9f21 Fix FIXME in core::num test: Check sign of zero in min/max tests. 2020-09-24 22:29:32 +02:00
Mara Bos
1e2dba1e7c Use T::BITS instead of size_of::<T> * 8. 2020-09-19 06:54:42 +02:00
Jubilee Young
797cb9526a Fix to libstd test 2020-09-15 08:47:20 -07:00
Jubilee Young
247b73939a Move Wrapping<T> ui tests into library 2020-09-15 07:15:59 -07:00
Ayush Kumar Mishra
dc37b553ac Minor refactoring 2020-09-05 17:07:53 +05:30
Ayush Kumar Mishra
941dca8ed2 Add Arith Tests in Library 2020-09-05 16:52:52 +05:30
Ralf Jung
56129d39c0 flt2dec: properly handle uninitialized memory 2020-09-02 12:41:38 +02:00
Ralf Jung
7468f632ff also reduce some libcore test iteration counts 2020-07-31 11:56:08 +02:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00