rust/tests at 4d1076c9f918297c97300a7ecf769dd7e6780be6 - rust

History

bors 8c4fc9d9a4 Auto merge of #94598 - scottmcm:prefix-free-hasher-methods, r=Amanieu Add a dedicated length-prefixing method to `Hasher` This accomplishes two main goals: - Make it clear who is responsible for prefix-freedom, including how they should do it - Make it feasible for a `Hasher` that doesn't care about Hash-DoS resistance to get better performance by not hashing lengths This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future. Fixes #94026 r? rust-lang/libs --- The core of this change is the following two new methods on `Hasher`: ```rust pub trait Hasher { /// Writes a length prefix into this hasher, as part of being prefix-free. /// /// If you're implementing [`Hash`] for a custom collection, call this before /// writing its contents to this `Hasher`. That way /// `(collection![1, 2, 3], collection![4, 5])` and /// `(collection![1, 2], collection![3, 4, 5])` will provide different /// sequences of values to the `Hasher` /// /// The `impl<T> Hash for [T]` includes a call to this method, so if you're /// hashing a slice (or array or vector) via its `Hash::hash` method, /// you should not call this yourself. /// /// This method is only for providing domain separation. If you want to /// hash a `usize` that represents part of the data, then it's important /// that you pass it to [`Hasher::write_usize`] instead of to this method. /// /// # Examples /// /// ``` /// #![feature(hasher_prefixfree_extras)] /// # // Stubs to make the `impl` below pass the compiler /// # struct MyCollection<T>(Option<T>); /// # impl<T> MyCollection<T> { /// # fn len(&self) -> usize { todo!() } /// # } /// # impl<'a, T> IntoIterator for &'a MyCollection<T> { /// # type Item = T; /// # type IntoIter = std::iter::Empty<T>; /// # fn into_iter(self) -> Self::IntoIter { todo!() } /// # } /// /// use std:#️⃣:{Hash, Hasher}; /// impl<T: Hash> Hash for MyCollection<T> { /// fn hash<H: Hasher>(&self, state: &mut H) { /// state.write_length_prefix(self.len()); /// for elt in self { /// elt.hash(state); /// } /// } /// } /// ``` /// /// # Note to Implementers /// /// If you've decided that your `Hasher` is willing to be susceptible to /// Hash-DoS attacks, then you might consider skipping hashing some or all /// of the `len` provided in the name of increased performance. #[inline] #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")] fn write_length_prefix(&mut self, len: usize) { self.write_usize(len); } /// Writes a single `str` into this hasher. /// /// If you're implementing [`Hash`], you generally do not need to call this, /// as the `impl Hash for str` does, so you can just use that. /// /// This includes the domain separator for prefix-freedom, so you should /// not call `Self::write_length_prefix` before calling this. /// /// # Note to Implementers /// /// The default implementation of this method includes a call to /// [`Self::write_length_prefix`], so if your implementation of `Hasher` /// doesn't care about prefix-freedom and you've thus overridden /// that method to do nothing, there's no need to override this one. /// /// This method is available to be overridden separately from the others /// as `str` being UTF-8 means that it never contains `0xFF` bytes, which /// can be used to provide prefix-freedom cheaper than hashing a length. /// /// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating /// them into a buffer), then you can hash the bytes of the `str` followed /// by a single `0xFF` byte. /// /// If your `Hasher` works in chunks, you can also do this by being careful /// about how you pad partial chunks. If the chunks are padded with `0x00` /// bytes then just hashing an extra `0xFF` byte doesn't necessarily /// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash /// the same sequence of chunks. But if you pad with `0xFF` bytes instead, /// ensuring at least one padding byte, then it can often provide /// prefix-freedom cheaper than hashing the length would. #[inline] #[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")] fn write_str(&mut self, s: &str) { self.write_length_prefix(s.len()); self.write(s.as_bytes()); } } ``` With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`. `write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach. --- Compatibility: Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.		2022-05-06 09:43:57 +00:00
..
fmt	Use implicit capture syntax in format_args	2022-03-10 10:23:40 -05:00
hash	Add a dedicated length-prefixing method to `Hasher`	2022-05-06 00:03:38 -07:00
iter	Rollup merge of #94115 - scottmcm:iter-process-by-ref, r=yaahc	2022-03-18 21:50:44 +01:00
num	Update `int_roundings` methods from feedback	2022-05-04 23:20:29 -04:00
ops
alloc.rs	fix Layout struct member naming style	2022-04-11 13:35:18 +08:00
any.rs	Use implicit capture syntax in format_args	2022-03-10 10:23:40 -05:00
array.rs	add cfg_panic bootstrap	2022-02-10 22:10:08 +00:00
ascii.rs
atomic.rs
bool.rs	Constify `bool::then{,_some}`	2021-12-15 00:11:23 +08:00
cell.rs	Use implicit capture syntax in format_args	2022-03-10 10:23:40 -05:00
char.rs	Debug print char 0 as '\0' rather than '\u{0}'	2022-03-27 04:49:10 -07:00
clone.rs
cmp.rs	Add test for StructuralEq for std::cmp::Ordering.	2022-03-16 14:01:48 -05:00
const_ptr.rs
convert.rs	Revert "Auto merge of #89450 - usbalbin:const_try_revert, r=oli-obk"	2021-12-12 12:34:59 +08:00
future.rs	add tests	2022-02-02 23:07:02 +09:00
intrinsics.rs	Switch bootstrap cfgs	2022-02-25 08:00:52 -05:00
lazy.rs	Use implicit capture syntax in format_args	2022-03-10 10:23:40 -05:00
lib.rs	Auto merge of #94598 - scottmcm:prefix-free-hasher-methods, r=Amanieu	2022-05-06 09:43:57 +00:00
macros.rs
manually_drop.rs
mem.rs	add cfg_panic bootstrap	2022-02-10 22:10:08 +00:00
nonzero.rs
ops.rs
option.rs	Constify (most) `Option` methods	2021-12-17 20:46:47 +08:00
pattern.rs
pin_macro.rs	Write {ui,} tests for `pin_macro` and `pin!`	2022-02-14 16:56:37 +01:00
pin.rs
ptr.rs	Rollup merge of #95556 - declanvk:nonnull-provenance, r=dtolnay	2022-04-02 03:34:24 +02:00
result.rs	Use implicit capture syntax in format_args	2022-03-10 10:23:40 -05:00
simd.rs	Miri can run this test now	2022-03-03 14:54:18 -05:00
slice.rs	Add slice::remainder	2022-04-17 17:19:45 +00:00
str_lossy.rs
str.rs
task.rs
time.rs
tuple.rs
unicode.rs
waker.rs	Implement data and vtable getters for `RawWaker`	2021-12-17 04:30:13 +08:00