Add a dedicated length-prefixing method to `Hasher`
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Fixes#94026
r? rust-lang/libs
---
The core of this change is the following two new methods on `Hasher`:
```rust
pub trait Hasher {
/// Writes a length prefix into this hasher, as part of being prefix-free.
///
/// If you're implementing [`Hash`] for a custom collection, call this before
/// writing its contents to this `Hasher`. That way
/// `(collection![1, 2, 3], collection![4, 5])` and
/// `(collection![1, 2], collection![3, 4, 5])` will provide different
/// sequences of values to the `Hasher`
///
/// The `impl<T> Hash for [T]` includes a call to this method, so if you're
/// hashing a slice (or array or vector) via its `Hash::hash` method,
/// you should **not** call this yourself.
///
/// This method is only for providing domain separation. If you want to
/// hash a `usize` that represents part of the *data*, then it's important
/// that you pass it to [`Hasher::write_usize`] instead of to this method.
///
/// # Examples
///
/// ```
/// #![feature(hasher_prefixfree_extras)]
/// # // Stubs to make the `impl` below pass the compiler
/// # struct MyCollection<T>(Option<T>);
/// # impl<T> MyCollection<T> {
/// # fn len(&self) -> usize { todo!() }
/// # }
/// # impl<'a, T> IntoIterator for &'a MyCollection<T> {
/// # type Item = T;
/// # type IntoIter = std::iter::Empty<T>;
/// # fn into_iter(self) -> Self::IntoIter { todo!() }
/// # }
///
/// use std:#️⃣:{Hash, Hasher};
/// impl<T: Hash> Hash for MyCollection<T> {
/// fn hash<H: Hasher>(&self, state: &mut H) {
/// state.write_length_prefix(self.len());
/// for elt in self {
/// elt.hash(state);
/// }
/// }
/// }
/// ```
///
/// # Note to Implementers
///
/// If you've decided that your `Hasher` is willing to be susceptible to
/// Hash-DoS attacks, then you might consider skipping hashing some or all
/// of the `len` provided in the name of increased performance.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_length_prefix(&mut self, len: usize) {
self.write_usize(len);
}
/// Writes a single `str` into this hasher.
///
/// If you're implementing [`Hash`], you generally do not need to call this,
/// as the `impl Hash for str` does, so you can just use that.
///
/// This includes the domain separator for prefix-freedom, so you should
/// **not** call `Self::write_length_prefix` before calling this.
///
/// # Note to Implementers
///
/// The default implementation of this method includes a call to
/// [`Self::write_length_prefix`], so if your implementation of `Hasher`
/// doesn't care about prefix-freedom and you've thus overridden
/// that method to do nothing, there's no need to override this one.
///
/// This method is available to be overridden separately from the others
/// as `str` being UTF-8 means that it never contains `0xFF` bytes, which
/// can be used to provide prefix-freedom cheaper than hashing a length.
///
/// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating
/// them into a buffer), then you can hash the bytes of the `str` followed
/// by a single `0xFF` byte.
///
/// If your `Hasher` works in chunks, you can also do this by being careful
/// about how you pad partial chunks. If the chunks are padded with `0x00`
/// bytes then just hashing an extra `0xFF` byte doesn't necessarily
/// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash
/// the same sequence of chunks. But if you pad with `0xFF` bytes instead,
/// ensuring at least one padding byte, then it can often provide
/// prefix-freedom cheaper than hashing the length would.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_str(&mut self, s: &str) {
self.write_length_prefix(s.len());
self.write(s.as_bytes());
}
}
```
With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`.
`write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach.
---
Compatibility:
Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Faster parsing for lower numbers for radix up to 16 (cont.)
( Continuation of https://github.com/rust-lang/rust/pull/83371 )
With LingMan's change I think this is potentially ready.
Implement provenance preserving methods on NonNull
### Description
Add the `addr`, `with_addr`, `map_addr` methods to the `NonNull` type, and map the address type to `NonZeroUsize`.
### Motivation
The `NonNull` type is useful for implementing pointer types which have the 0-niche. It is currently possible to implement these provenance preserving functions by calling `NonNull::as_ptr` and `new_unchecked`. The adding these methods makes it more ergonomic.
### Testing
Added a unit test of a non-null tagged pointer type. This is based on some real code I have elsewhere, that currently routes the pointer through a `NonZeroUsize` and back out to produce a usable pointer. I wanted to produce an ideal version of the same tagged pointer struct that preserved pointer provenance.
### Related
Extension of APIs proposed in #95228 . I can also split this out into a separate tracking issue if that is better (though I may need some pointers on how to do that).
**Description**
Add the `addr`, `with_addr, `map_addr` methods to the `NonNull` type,
and map the address type to `NonZeroUsize`.
**Motiviation**
The `NonNull` type is useful for implementing pointer types which have
the 0-niche. It is currently possible to implement these provenance
preserving functions by calling `NonNull::as_ptr` and `new_unchecked`.
The addition of these methods simply make it more ergonomic to use.
**Testing**
Added a unit test of a nonnull tagged pointer type. This is based on
some real code I have elsewhere, that currently routes the pointer
through a `NonZeroUsize` and back out to produce a usable pointer.
Add Iterator::collect_into
This PR adds `Iterator::collect_into` as proposed by ``@cormacrelf`` in #48597 (see https://github.com/rust-lang/rust/pull/48597#issuecomment-842083688).
Followup of #92982.
This adds the following method to the Iterator trait:
```rust
fn collect_into<E: Extend<Self::Item>>(self, collection: &mut E) -> &mut E
```
Implement `RawWaker` and `Waker` getters for underlying pointers
implement #87021
New APIs:
- `RawWaker::data(&self) -> *const ()`
- `RawWaker::vtable(&self) -> &'static RawWakerVTable`
- ~`Waker::as_raw_waker(&self) -> &RawWaker`~ `Waker::as_raw(&self) -> &RawWaker`
This third one is an auxiliary function to make the two APIs above more useful. Since we can only get `&Waker` in `Future::poll`, without this, we need to `transmute` it into `&RawWaker` (relying on `repr(transparent)`) in order to access its data/vtable pointers.
~Not sure if it should be named `as_raw` or `as_raw_waker`. Seems we always use `as_<something-raw>` instead of just `as_raw`. But `as_raw_waker` seems not quite consistent with `Waker::from_raw`.~ As suggested in https://github.com/rust-lang/rust/pull/91828#discussion_r770729837, use `as_raw`.
This covers:
impl<T> MaybeUninit<T> {
pub unsafe fn assume_init_read(&self) -> T { ... }
pub unsafe fn assume_init_drop(&mut self) { ... }
}
It does not cover the const-ness of `write` under
`const_maybe_uninit_write` nor the const-ness of
`assume_init_read` (this commit adds
`const_maybe_uninit_assume_init_read` for that).
FCP: https://github.com/rust-lang/rust/issues/63567#issuecomment-958590287.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Methods that were only blocked on `const_panic` have been stabilized.
The remaining methods of `duration_consts_2` are all related to floats,
and as such have been placed behind the `duration_consts_float` feature
gate.
Stabilize `const_raw_ptr_deref` for `*const T`
This stabilizes dereferencing immutable raw pointers in const contexts.
It does not stabilize `*mut T` dereferencing. This is behind the
same feature gate as mutable references.
closes https://github.com/rust-lang/rust/issues/51911
pub use core::simd;
A portable abstraction over SIMD has been a major pursuit in recent years for several programming languages. In Rust, `std::arch` offers explicit SIMD acceleration via compiler intrinsics, but it does so at the cost of having to individually maintain each and every single such API, and is almost completely `unsafe` to use. `core::simd` offers safe abstractions that are resolved to the appropriate SIMD instructions by LLVM during compilation, including scalar instructions if that is all that is available.
`core::simd` is enabled by the `#![portable_simd]` nightly feature tracked in https://github.com/rust-lang/rust/issues/86656 and is introduced here by pulling in the https://github.com/rust-lang/portable-simd repository as a subtree. We built the repository out-of-tree to allow faster compilation and a stochastic test suite backed by the proptest crate to verify that different targets, features, and optimizations produce the same result, so that using this library does not introduce any surprises. As these tests are technically non-deterministic, and thus can introduce overly interesting Heisenbugs if included in the rustc CI, they are visible in the commit history of the subtree but do nothing here. Some tests **are** introduced via the documentation, but these use deterministic asserts.
There are multiple unsolved problems with the library at the current moment, including a want for better documentation, technical issues with LLVM scalarizing and lowering to libm, room for improvement for the APIs, and so far I have not added the necessary plumbing for allowing the more experimental or libm-dependent APIs to be used. However, I thought it would be prudent to open this for review in its current condition, as it is both usable and it is likely I am going to learn something else needs to be fixed when bors tries this out.
The major types are
- `core::simd::Simd<T, N>`
- `core::simd::Mask<T, N>`
There is also the `LaneCount` struct, which, together with the SimdElement and SupportedLaneCount traits, limit the implementation's maximum support to vectors we know will actually compile and provide supporting logic for bitmasks. I'm hoping to simplify at least some of these out of the way as the compiler and library evolve.
These tests just verify some basic APIs of core::simd function, and
guarantees that attempting to access the wrong things doesn't work.
The majority of tests are stochastic, and so remain upstream, but
a few deterministic tests arrive in the subtree as doc tests.
This stabilizes dereferencing immutable raw pointers in const contexts.
It does not stabilize `*mut T` dereferencing. This is placed behind the
`const_raw_mut_ptr_deref` feature gate.
Add 'core::array::from_fn' and 'core::array::try_from_fn'
These auxiliary methods fill uninitialized arrays in a safe way and are particularly useful for elements that don't implement `Default`.
```rust
// Foo doesn't implement Default
struct Foo(usize);
let _array = core::array::from_fn::<_, _, 2>(|idx| Foo(idx));
```
Different from `FromIterator`, it is guaranteed that the array will be fully filled and no error regarding uninitialized state will be throw. In certain scenarios, however, the creation of an **element** can fail and that is why the `try_from_fn` function is also provided.
```rust
#[derive(Debug, PartialEq)]
enum SomeError {
Foo,
}
let array = core::array::try_from_fn(|i| Ok::<_, SomeError>(i));
assert_eq!(array, Ok([0, 1, 2, 3, 4]));
let another_array = core::array::try_from_fn(|_| Err(SomeError::Foo));
assert_eq!(another_array, Err(SomeError::Foo));
```
Constify ?-operator for Result and Option
Try to make `?`-operator usable in `const fn` with `Result` and `Option`, see #74935 . Note that the try-operator itself was constified in #87237.
TODO
* [x] Add tests for const T -> T conversions
* [x] cleanup commits
* [x] Remove `#![allow(incomplete_features)]`
* [?] Await decision in #86808 - I'm not sure
* [x] Await support for parsing `~const` in bootstrapping compiler
* [x] Tracking issue(s)? - #88674
Added the `Option::unzip()` method
* Adds the `Option::unzip()` method to turn an `Option<(T, U)>` into `(Option<T>, Option<U>)` under the `unzip_option` feature
* Adds tests for both `Option::unzip()` and `Option::zip()`, I noticed that `.zip()` didn't have any
* Adds `#[inline]` to a few of `Option`'s methods that were missing it
Add Integer::log variants
_This is another attempt at landing https://github.com/rust-lang/rust/pull/70835, which was approved by the libs team but failed on Android tests through Bors. The text copied here is from the original issue. The only change made so far is the addition of non-`checked_` variants of the log methods._
_Tracking issue: #70887_
---
This implements `{log,log2,log10}` methods for all integer types. The implementation was provided by `@substack` for use in the stdlib.
_Note: I'm not big on math, so this PR is a best effort written with limited knowledge. It's likely I'll be getting things wrong, but happy to learn and correct. Please bare with me._
## Motivation
Calculating the logarithm of a number is a generally useful operation. Currently the stdlib only provides implementations for floats, which means that if we want to calculate the logarithm for an integer we have to cast it to a float and then back to an int.
> would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure
_— [`@substack,` 2020-03-08](https://twitter.com/substack/status/1236445105197727744)_
At higher numbers converting from an integer to a float we also risk overflows. This means that Rust currently only provides log operations for a limited set of integers.
The process of doing log operations by converting between floats and integers is also prone to rounding errors. In the following example we're trying to calculate `base10` for an integer. We might try and calculate the `base2` for the values, and attempt [a base swap](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules) to arrive at `base10`. However because we're performing intermediate rounding we arrive at the wrong result:
```rust
// log10(900) = ~2.95 = 2
dbg!(900f32.log10() as u64);
// log base change rule: logb(x) = logc(x) / logc(b)
// log2(900) / log2(10) = 9/3 = 3
dbg!((900f32.log2() as u64) / (10f32.log2() as u64));
```
_[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)_
This is somewhat nuanced as a lot of the time it'll work well, but in real world code this could lead to some hard to track bugs. By providing correct log implementations directly on integers we can help prevent errors around this.
## Implementation notes
I checked whether LLVM intrinsics existed before implementing this, and none exist yet. ~~Also I couldn't really find a better way to write the `ilog` function. One option would be to make it a private method on the number, but I didn't see any precedent for that. I also didn't know where to best place the tests, so I added them to the bottom of the file. Even though they might seem like quite a lot they take no time to execute.~~
## References
- [Log rules](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules)
- [Rounding error playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)
- [substack's tweet asking about integer log2 in the stdlib](https://twitter.com/substack/status/1236445105197727744)
- [Integer Logarithm, A. Jaffer 2008](https://people.csail.mit.edu/jaffer/III/ilog.pdf)
core: add unstable no_fp_fmt_parse to disable float formatting code
In some projects (e.g. kernel), floating point is forbidden. They can disable
hardware floating point support and use `+soft-float` to avoid fp instructions
from being generated, but as libcore contains the formatting code for `f32`
and `f64`, some fp intrinsics are depended. One could define stubs for these
intrinsics that just panic [1], but it means that if any formatting functions
are accidentally used, mistake can only be caught during the runtime rather
than during compile-time or link-time, and they consume a lot of space without
LTO.
This patch provides an unstable cfg `no_fp_fmt_parse` to disable these.
A panicking stub is still provided for the `Debug` implementation (unfortunately)
because there are some SIMD types that use `#[derive(Debug)]`.
[1]: https://lkml.org/lkml/2021/4/14/1028
While stdlib implementations of the unchecked methods require unchecked
math, there is no reason to gate it behind this for external users. The
reasoning for a separate `step_trait_ext` feature is unclear, and as
such has been merged as well.
Stabilize `peekable_peek_mut`
Resolves#78302. Also adds some documentation on `std::iter::Iterator::peekable()` regarding the new method.
The feature was added in #77491 in Nov' 20, which is recently, but the feature seems reasonably small. Never did a stabilization-pr, excuse my ignorance if there is a protocol I'm not aware of.
Stabilize cmp_min_max_by
I would like to propose cmp::{min_by, min_by_key, max_by, max_by_key}
for stabilization.
These are relatively simple and seemingly uncontroversial functions and
have been unchanged in unstable for a while now.
Closes: #64460
I would like to propose cmp::{min_by, min_by_key, max_by, max_by_key}
for stabilization.
These are relatively simple and seemingly uncontroversial functions and
have been unchanged in unstable for a while now.