Smarter HashMap/HashSet pre-allocation for extend/from_iter
HashMap/HashSet from_iter and extend are making totally different assumptions.
A more balanced decision may allocate half the lower hint (rounding up). For "well defined" iterators this effectively limits the worst case to two resizes (the initial reserve + one resize).
cc #36579
cc @bluss
hashmap: Store hashes as usize internally
We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
Fixes#36567
This is overdue, even if range and RangeArgument is still unstable.
The stability attributes are the same ones as the other unstable item
(Bound) here, they don't seem to matter.
We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
std: Stabilize and deprecate APIs for 1.13
This commit is intended to be backported to the 1.13 branch, and works with the
following APIs:
Stabilized
* `i32::checked_abs`
* `i32::wrapping_abs`
* `i32::overflowing_abs`
* `RefCell::try_borrow`
* `RefCell::try_borrow_mut`
Deprecated
* `BinaryHeap::push_pop`
* `BinaryHeap::replace`
* `SipHash13`
* `SipHash24`
* `SipHasher` - use `DefaultHasher` instead in the `std::collections::hash_map`
module
Closes#28147Closes#34767Closes#35057Closes#35070
This commit is intended to be backported to the 1.13 branch, and works with the
following APIs:
Stabilized
* `i32::checked_abs`
* `i32::wrapping_abs`
* `i32::overflowing_abs`
* `RefCell::try_borrow`
* `RefCell::try_borrow_mut`
* `DefaultHasher`
* `DefaultHasher::new`
* `DefaultHasher::default`
Deprecated
* `BinaryHeap::push_pop`
* `BinaryHeap::replace`
* `SipHash13`
* `SipHash24`
* `SipHasher` - use `DefaultHasher` instead in the `std::collections::hash_map`
module
Closes#28147Closes#34767Closes#35057Closes#35070
Clarify HashMap's capacity handling.
HashMap has two notions of "capacity":
- "Usable capacity": the number of elements a hash map can hold without
resizing. This is the meaning of "capacity" used in HashMap's API,
e.g. the `with_capacity()` function.
- "Internal capacity": the number of allocated slots. Except for the
zero case, it is always larger than the usable capacity (because some
slots must be left empty) and is always a power of two.
HashMap's code is confusing because it does a poor job of
distinguishing these two meanings. I propose using two different terms
for these two concepts. Because "capacity" is already used in HashMap's
API to mean "usable capacity", I will use a different word for "internal
capacity". I propose "span", though I'm happy to consider other names.
Clean up hasher discussion on HashMap
* We never want to make guarantees about protecting against attacks.
* "True randomness" is not the right terminology to be using in this
context.
* There is significantly more nuance to the performance of SipHash than
"somewhat slow".
r? @steveklabnik
Follow up to discussion on #35371
This commit does the following.
- Changes the terminology for capacities used within HashMap's code.
"Internal capacity" is now consistently "raw capacity", and "usable
capacity" is now consistently just "capacity". This makes the code
easier to understand.
- Reworks capacity and raw capacity computations. Raw capacity
computations are now handled in a single place:
`DefaultResizePolicy::raw_capacity()`. This function correctly returns
zero when given zero, which means that the following cases now result
in a capacity of zero when they previously did not.
* `Hash{Map,Set}::with_capacity(0)`
* `Hash{Map,Set}::with_capacity_and_hasher(0)`
* `Hash{Map,Set}::shrink_to_fit()`, when used with a hash map/set whose
elements have all been removed
- Strengthens the language used in the comments describing the above
functions, to make it clearer when they will result in a map/set with
a capacity of zero. The new language is based on the language used for
the corresponding functions in `Vec`.
- Adds tests for the above zero-capacity cases.
- Removes `test_resize_policy` because it is no longer useful.
The following `HashMap` creation functions don't allocate heap storage for elements.
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This is good, because it's surprisingly common to create a HashMap and never
use it. So that case should be cheap.
However, `HashSet` does not have the same behaviour. The corresponding creation
functions *do* allocate heap storage for the default number of non-zero
elements (which is 32 slots for 29 elements).
```
HashMap::new()
HashMap::default()
HashMap::with_hasher()
```
This commit gives `HashSet` the same behaviour as `HashMap`, by simply calling
the corresponding `HashMap` functions (something `HashSet` already does for
`with_capacity` and `with_capacity_and_hasher`). It also reformats one existing
`HashSet` construction to use a consistent single-line format.
This speeds up rustc itself by 1.01--1.04x on most of the non-tiny
rustc-benchmarks.