rust/src/libstd/collections/hash/map.rs

2325 lines
71 KiB
Rust
Raw Normal View History

// Copyright 2014-2015 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
use self::Entry::*;
use self::SearchResult::*;
use self::VacantEntryState::*;
use borrow::Borrow;
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
use clone::Clone;
use cmp::{max, Eq, PartialEq};
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
use default::Default;
std: Rename Show/String to Debug/Display This commit is an implementation of [RFC 565][rfc] which is a stabilization of the `std::fmt` module and the implementations of various formatting traits. Specifically, the following changes were performed: [rfc]: https://github.com/rust-lang/rfcs/blob/master/text/0565-show-string-guidelines.md * The `Show` trait is now deprecated, it was renamed to `Debug` * The `String` trait is now deprecated, it was renamed to `Display` * Many `Debug` and `Display` implementations were audited in accordance with the RFC and audited implementations now have the `#[stable]` attribute * Integers and floats no longer print a suffix * Smart pointers no longer print details that they are a smart pointer * Paths with `Debug` are now quoted and escape characters * The `unwrap` methods on `Result` now require `Display` instead of `Debug` * The `Error` trait no longer has a `detail` method and now requires that `Display` must be implemented. With the loss of `String`, this has moved into libcore. * `impl<E: Error> FromError<E> for Box<Error>` now exists * `derive(Show)` has been renamed to `derive(Debug)`. This is not currently warned about due to warnings being emitted on stage1+ While backwards compatibility is attempted to be maintained with a blanket implementation of `Display` for the old `String` trait (and the same for `Show`/`Debug`) this is still a breaking change due to primitives no longer implementing `String` as well as modifications such as `unwrap` and the `Error` trait. Most code is fairly straightforward to update with a rename or tweaks of method calls. [breaking-change] Closes #21436
2015-01-20 17:45:07 -06:00
use fmt::{self, Debug};
use hash::{Hash, SipHasher};
use iter::{self, Iterator, ExactSizeIterator, IntoIterator, FromIterator, Extend, Map};
2015-01-06 16:33:42 -06:00
use marker::Sized;
2015-01-03 21:42:21 -06:00
use mem::{self, replace};
use ops::{Deref, FnMut, FnOnce, Index};
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
use option::Option::{self, Some, None};
use rand::{self, Rng};
use result::Result::{self, Ok, Err};
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
use super::table::{
2015-01-03 21:42:21 -06:00
self,
Bucket,
EmptyBucket,
FullBucket,
FullBucketImm,
FullBucketMut,
RawTable,
SafeHash
};
use super::table::BucketState::{
Empty,
Full,
};
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
use super::state::HashState;
2015-02-04 20:17:19 -06:00
const INITIAL_LOG2_CAP: usize = 5;
2015-01-29 06:41:48 -06:00
#[unstable(feature = "std_misc")]
2015-02-04 20:17:19 -06:00
pub const INITIAL_CAPACITY: usize = 1 << INITIAL_LOG2_CAP; // 2^5
2014-07-15 18:39:32 -05:00
/// The default behavior of HashMap implements a load factor of 90.9%.
/// This behavior is characterized by the following condition:
2014-07-15 18:39:32 -05:00
///
/// - if size > 0.909 * capacity: grow the map
#[derive(Clone)]
struct DefaultResizePolicy;
2013-04-03 07:45:14 -05:00
2014-07-15 18:39:32 -05:00
impl DefaultResizePolicy {
fn new() -> DefaultResizePolicy {
DefaultResizePolicy
}
2014-07-15 18:39:32 -05:00
#[inline]
2015-02-04 20:17:19 -06:00
fn min_capacity(&self, usable_size: usize) -> usize {
// Here, we are rephrasing the logic by specifying the lower limit
// on capacity:
//
// - if `cap < size * 1.1`: grow the map
usable_size * 11 / 10
}
/// An inverse of `min_capacity`, approximately.
2014-07-15 18:39:32 -05:00
#[inline]
2015-02-04 20:17:19 -06:00
fn usable_capacity(&self, cap: usize) -> usize {
// As the number of entries approaches usable capacity,
// min_capacity(size) must be smaller than the internal capacity,
// so that the map is not resized:
// `min_capacity(usable_capacity(x)) <= x`.
// The left-hand side can only be smaller due to flooring by integer
// division.
//
// This doesn't have to be checked for overflow since allocation size
// in bytes will overflow earlier than multiplication by 10.
cap * 10 / 11
}
}
#[test]
fn test_resize_policy() {
let rp = DefaultResizePolicy;
for n in 0..1000 {
assert!(rp.min_capacity(rp.usable_capacity(n)) <= n);
assert!(rp.usable_capacity(rp.min_capacity(n)) <= n);
}
2014-07-15 18:39:32 -05:00
}
2014-07-15 18:39:32 -05:00
// The main performance trick in this hashmap is called Robin Hood Hashing.
// It gains its excellent performance from one essential operation:
2014-07-15 18:39:32 -05:00
//
// If an insertion collides with an existing element, and that element's
2014-07-15 18:39:32 -05:00
// "probe distance" (how far away the element is from its ideal location)
// is higher than how far we've already probed, swap the elements.
//
// This massively lowers variance in probe distance, and allows us to get very
// high load factors with good performance. The 90% load factor I use is rather
// conservative.
//
// > Why a load factor of approximately 90%?
//
// In general, all the distances to initial buckets will converge on the mean.
// At a load factor of α, the odds of finding the target bucket after k
// probes is approximately 1-α^k. If we set this equal to 50% (since we converge
// on the mean) and set k=8 (64-byte cache line / 8-byte hash), α=0.92. I round
// this down to make the math easier on the CPU and avoid its FPU.
// Since on average we start the probing in the middle of a cache line, this
// strategy pulls in two cache lines of hashes on every lookup. I think that's
// pretty good, but if you want to trade off some space, it could go down to one
// cache line on average with an α of 0.84.
//
// > Wait, what? Where did you get 1-α^k from?
//
// On the first probe, your odds of a collision with an existing element is α.
// The odds of doing this twice in a row is approximately α^2. For three times,
// α^3, etc. Therefore, the odds of colliding k times is α^k. The odds of NOT
// colliding after k tries is 1-α^k.
//
// The paper from 1986 cited below mentions an implementation which keeps track
// of the distance-to-initial-bucket histogram. This approach is not suitable
// for modern architectures because it requires maintaining an internal data
// structure. This allows very good first guesses, but we are most concerned
// with guessing entire cache lines, not individual indexes. Furthermore, array
// accesses are no longer linear and in one direction, as we have now. There
// is also memory and cache pressure that this would entail that would be very
// difficult to properly see in a microbenchmark.
//
// ## Future Improvements (FIXME!)
2014-07-15 18:39:32 -05:00
//
// Allow the load factor to be changed dynamically and/or at initialization.
//
// Also, would it be possible for us to reuse storage when growing the
// underlying table? This is exactly the use case for 'realloc', and may
// be worth exploring.
//
// ## Future Optimizations (FIXME!)
2014-07-15 18:39:32 -05:00
//
// Another possible design choice that I made without any real reason is
// parameterizing the raw table over keys and values. Technically, all we need
// is the size and alignment of keys and values, and the code should be just as
// efficient (well, we might need one for power-of-two size and one for not...).
// This has the potential to reduce code bloat in rust executables, without
// really losing anything except 4 words (key size, key alignment, val size,
// val alignment) which can be passed in to every call of a `RawTable` function.
// This would definitely be an avenue worth exploring if people start complaining
// about the size of rust executables.
//
// Annotate exceedingly likely branches in `table::make_hash`
// and `search_hashed` to reduce instruction cache pressure
// and mispredictions once it becomes possible (blocked on issue #11092).
//
// Shrinking the table could simply reallocate in place after moving buckets
// to the first half.
//
// The growth algorithm (fragment of the Proof of Correctness)
// --------------------
//
// The growth algorithm is basically a fast path of the naive reinsertion-
// during-resize algorithm. Other paths should never be taken.
//
// Consider growing a robin hood hashtable of capacity n. Normally, we do this
// by allocating a new table of capacity `2n`, and then individually reinsert
// each element in the old table into the new one. This guarantees that the
// new table is a valid robin hood hashtable with all the desired statistical
// properties. Remark that the order we reinsert the elements in should not
// matter. For simplicity and efficiency, we will consider only linear
// reinsertions, which consist of reinserting all elements in the old table
// into the new one by increasing order of index. However we will not be
// starting our reinsertions from index 0 in general. If we start from index
// i, for the purpose of reinsertion we will consider all elements with real
// index j < i to have virtual index n + j.
//
// Our hash generation scheme consists of generating a 64-bit hash and
// truncating the most significant bits. When moving to the new table, we
// simply introduce a new bit to the front of the hash. Therefore, if an
// elements has ideal index i in the old table, it can have one of two ideal
// locations in the new table. If the new bit is 0, then the new ideal index
// is i. If the new bit is 1, then the new ideal index is n + i. Intuitively,
// we are producing two independent tables of size n, and for each element we
// independently choose which table to insert it into with equal probability.
// However the rather than wrapping around themselves on overflowing their
// indexes, the first table overflows into the first, and the first into the
// second. Visually, our new table will look something like:
//
// [yy_xxx_xxxx_xxx|xx_yyy_yyyy_yyy]
//
// Where x's are elements inserted into the first table, y's are elements
// inserted into the second, and _'s are empty sections. We now define a few
// key concepts that we will use later. Note that this is a very abstract
// perspective of the table. A real resized table would be at least half
// empty.
//
// Theorem: A linear robin hood reinsertion from the first ideal element
// produces identical results to a linear naive reinsertion from the same
// element.
//
// FIXME(Gankro, pczarn): review the proof and put it all in a separate README.md
2014-07-15 18:39:32 -05:00
/// A hash map implementation which uses linear probing with Robin
/// Hood bucket stealing.
///
/// The hashes are all keyed by the thread-local random number generator
2014-07-15 18:39:32 -05:00
/// on creation by default. This means that the ordering of the keys is
/// randomized, but makes the tables more resistant to
/// denial-of-service attacks (Hash DoS). This behaviour can be
/// overridden with one of the constructors.
///
/// It is required that the keys implement the `Eq` and `Hash` traits, although
2015-05-03 14:50:37 -05:00
/// this can frequently be achieved by using `#[derive(PartialEq, Eq, Hash)]`.
/// If you implement these yourself, it is important that the following
/// property holds:
///
/// ```text
/// k1 == k2 -> hash(k1) == hash(k2)
/// ```
///
/// In other words, if two keys are equal, their hashes must be equal.
2014-07-15 18:39:32 -05:00
///
/// It is a logic error for a key to be modified in such a way that the key's
/// hash, as determined by the `Hash` trait, or its equality, as determined by
/// the `Eq` trait, changes while it is in the map. This is normally only
/// possible through `Cell`, `RefCell`, global state, I/O, or unsafe code.
///
2014-07-15 18:39:32 -05:00
/// Relevant papers/articles:
///
/// 1. Pedro Celis. ["Robin Hood Hashing"](https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf)
/// 2. Emmanuel Goossaert. ["Robin Hood
/// hashing"](http://codecapsule.com/2013/11/11/robin-hood-hashing/)
/// 3. Emmanuel Goossaert. ["Robin Hood hashing: backward shift
/// deletion"](http://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/)
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
///
/// // type inference lets us omit an explicit type signature (which
/// // would be `HashMap<&str, &str>` in this example).
/// let mut book_reviews = HashMap::new();
///
/// // review some books.
/// book_reviews.insert("Adventures of Huckleberry Finn", "My favorite book.");
/// book_reviews.insert("Grimms' Fairy Tales", "Masterpiece.");
/// book_reviews.insert("Pride and Prejudice", "Very enjoyable.");
/// book_reviews.insert("The Adventures of Sherlock Holmes", "Eye lyked it alot.");
///
/// // check for a specific one.
2015-05-03 10:44:24 -05:00
/// if !book_reviews.contains_key("Les Misérables") {
2014-07-15 18:39:32 -05:00
/// println!("We've got {} reviews, but Les Misérables ain't one.",
/// book_reviews.len());
/// }
///
/// // oops, this review has a lot of spelling mistakes, let's delete it.
2015-05-03 10:44:24 -05:00
/// book_reviews.remove("The Adventures of Sherlock Holmes");
2014-07-15 18:39:32 -05:00
///
/// // look up the values associated with some keys.
/// let to_find = ["Pride and Prejudice", "Alice's Adventure in Wonderland"];
2015-05-03 10:44:24 -05:00
/// for book in &to_find {
/// match book_reviews.get(book) {
2015-05-03 10:44:24 -05:00
/// Some(review) => println!("{}: {}", book, review),
/// None => println!("{} is unreviewed.", book)
2014-07-15 18:39:32 -05:00
/// }
/// }
///
/// // iterate over everything.
2015-05-03 10:44:24 -05:00
/// for (book, review) in &book_reviews {
/// println!("{}: \"{}\"", book, review);
2014-07-15 18:39:32 -05:00
/// }
/// ```
///
/// The easiest way to use `HashMap` with a custom type as key is to derive `Eq` and `Hash`.
2014-07-15 18:39:32 -05:00
/// We must also derive `PartialEq`.
///
/// ```
/// use std::collections::HashMap;
///
std: Rename Show/String to Debug/Display This commit is an implementation of [RFC 565][rfc] which is a stabilization of the `std::fmt` module and the implementations of various formatting traits. Specifically, the following changes were performed: [rfc]: https://github.com/rust-lang/rfcs/blob/master/text/0565-show-string-guidelines.md * The `Show` trait is now deprecated, it was renamed to `Debug` * The `String` trait is now deprecated, it was renamed to `Display` * Many `Debug` and `Display` implementations were audited in accordance with the RFC and audited implementations now have the `#[stable]` attribute * Integers and floats no longer print a suffix * Smart pointers no longer print details that they are a smart pointer * Paths with `Debug` are now quoted and escape characters * The `unwrap` methods on `Result` now require `Display` instead of `Debug` * The `Error` trait no longer has a `detail` method and now requires that `Display` must be implemented. With the loss of `String`, this has moved into libcore. * `impl<E: Error> FromError<E> for Box<Error>` now exists * `derive(Show)` has been renamed to `derive(Debug)`. This is not currently warned about due to warnings being emitted on stage1+ While backwards compatibility is attempted to be maintained with a blanket implementation of `Display` for the old `String` trait (and the same for `Show`/`Debug`) this is still a breaking change due to primitives no longer implementing `String` as well as modifications such as `unwrap` and the `Error` trait. Most code is fairly straightforward to update with a rename or tweaks of method calls. [breaking-change] Closes #21436
2015-01-20 17:45:07 -06:00
/// #[derive(Hash, Eq, PartialEq, Debug)]
/// struct Viking {
/// name: String,
/// country: String,
2014-07-15 18:39:32 -05:00
/// }
///
/// impl Viking {
/// /// Create a new Viking.
2014-12-29 18:38:07 -06:00
/// fn new(name: &str, country: &str) -> Viking {
/// Viking { name: name.to_string(), country: country.to_string() }
/// }
/// }
///
/// // Use a HashMap to store the vikings' health points.
2014-07-15 18:39:32 -05:00
/// let mut vikings = HashMap::new();
///
/// vikings.insert(Viking::new("Einar", "Norway"), 25);
/// vikings.insert(Viking::new("Olaf", "Denmark"), 24);
/// vikings.insert(Viking::new("Harald", "Iceland"), 12);
2014-07-15 18:39:32 -05:00
///
/// // Use derived implementation to print the status of the vikings.
2015-05-03 10:44:24 -05:00
/// for (viking, health) in &vikings {
/// println!("{:?} has {} hp", viking, health);
2014-07-15 18:39:32 -05:00
/// }
/// ```
#[derive(Clone)]
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub struct HashMap<K, V, S = RandomState> {
2014-07-15 18:39:32 -05:00
// All hashes are keyed on these values, to prevent hash collision attacks.
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
hash_state: S,
table: RawTable<K, V>,
2014-07-15 18:39:32 -05:00
resize_policy: DefaultResizePolicy,
}
2014-07-15 18:39:32 -05:00
/// Search for a pre-hashed key.
2014-12-07 13:15:25 -06:00
fn search_hashed<K, V, M, F>(table: M,
hash: SafeHash,
2014-12-07 13:15:25 -06:00
mut is_match: F)
-> SearchResult<K, V, M> where
2015-01-01 13:53:20 -06:00
M: Deref<Target=RawTable<K, V>>,
2014-12-07 13:15:25 -06:00
F: FnMut(&K) -> bool,
{
Add `core::num::wrapping` and fix overflow errors. Many of the core rust libraries have places that rely on integer wrapping behaviour. These places have been altered to use the wrapping_* methods: * core::hash::sip - A number of macros * core::str - The `maximal_suffix` method in `TwoWaySearcher` * rustc::util::nodemap - Implementation of FnvHash * rustc_back::sha2 - A number of macros and other places * rand::isaac - Isaac64Rng, changed to use the Wrapping helper type Some places had "benign" underflow. This is when underflow or overflow occurs, but the unspecified value is not used due to other conditions. * collections::bit::Bitv - underflow when `self.nbits` is zero. * collections::hash::{map,table} - Underflow when searching an empty table. Did cause undefined behaviour in this case due to an out-of-bounds ptr::offset based on the underflowed index. However the resulting pointers would never be read from. * syntax::ext::deriving::encodable - Underflow when calculating the index of the last field in a variant with no fields. These cases were altered to avoid the underflow, often by moving the underflowing operation to a place where underflow could not happen. There was one case that relied on the fact that unsigned arithmetic and two's complement arithmetic are identical with wrapping semantics. This was changed to use the wrapping_* methods. Finally, the calculation of variant discriminants could overflow if the preceeding discriminant was `U64_MAX`. The logic in `rustc::middle::ty` for this was altered to avoid the overflow completely, while the remaining places were changed to use wrapping methods. This is because `rustc::middle::ty::enum_variants` now throws an error when the calculated discriminant value overflows a `u64`. This behaviour can be triggered by the following code: ``` enum Foo { A = U64_MAX, B } ``` This commit also implements the remaining integer operators for Wrapped<T>.
2015-01-08 21:10:57 -06:00
// This is the only function where capacity can be zero. To avoid
// undefined behaviour when Bucket::new gets the raw bucket in this
// case, immediately return the appropriate search result.
if table.capacity() == 0 {
return TableRef(table);
}
2014-07-15 18:39:32 -05:00
let size = table.size();
let mut probe = Bucket::new(table, hash);
let ib = probe.index();
2014-07-15 18:39:32 -05:00
while probe.index() != ib + size {
let full = match probe.peek() {
Empty(b) => return TableRef(b.into_table()), // hit an empty bucket
Full(b) => b
2014-07-15 18:39:32 -05:00
};
2014-07-15 18:39:32 -05:00
if full.distance() + ib < full.index() {
// We can finish the search early if we hit any bucket
// with a lower distance to initial bucket than we've probed.
return TableRef(full.into_table());
}
2014-07-15 18:39:32 -05:00
// If the hash doesn't match, it can't be this one..
if hash == full.hash() {
2014-07-15 18:39:32 -05:00
// If the key doesn't match, it can't be this one..
if is_match(full.read().0) {
return FoundExisting(full);
}
}
2014-07-15 18:39:32 -05:00
probe = full.next();
}
TableRef(probe.into_table())
2014-07-15 18:39:32 -05:00
}
fn pop_internal<K, V>(starting_bucket: FullBucketMut<K, V>) -> (K, V) {
let (empty, retkey, retval) = starting_bucket.take();
2014-07-15 18:39:32 -05:00
let mut gap = match empty.gap_peek() {
Some(b) => b,
None => return (retkey, retval)
2014-07-15 18:39:32 -05:00
};
while gap.full().distance() != 0 {
gap = match gap.shift() {
Some(b) => b,
None => break
};
}
// Now we've done all our shifting. Return the value we grabbed earlier.
(retkey, retval)
}
/// Perform robin hood bucket stealing at the given `bucket`. You must
/// also pass the position of that bucket's initial bucket so we don't have
/// to recalculate it.
///
/// `hash`, `k`, and `v` are the elements to "robin hood" into the hashtable.
fn robin_hood<'a, K: 'a, V: 'a>(mut bucket: FullBucketMut<'a, K, V>,
2015-02-04 20:17:19 -06:00
mut ib: usize,
mut hash: SafeHash,
mut k: K,
mut v: V)
-> &'a mut V {
let starting_index = bucket.index();
let size = {
let table = bucket.table(); // FIXME "lifetime too short".
table.size()
};
// There can be at most `size - dib` buckets to displace, because
// in the worst case, there are `size` elements and we already are
// `distance` buckets away from the initial one.
let idx_end = starting_index + size - bucket.distance();
loop {
let (old_hash, old_key, old_val) = bucket.replace(hash, k, v);
loop {
let probe = bucket.next();
assert!(probe.index() != idx_end);
let full_bucket = match probe.peek() {
Empty(bucket) => {
// Found a hole!
let b = bucket.put(old_hash, old_key, old_val);
// Now that it's stolen, just read the value's pointer
// right out of the table!
return Bucket::at_index(b.into_table(), starting_index)
.peek()
.expect_full()
.into_mut_refs()
.1;
},
Full(bucket) => bucket
2014-07-15 18:39:32 -05:00
};
let probe_ib = full_bucket.index() - full_bucket.distance();
bucket = full_bucket;
// Robin hood! Steal the spot.
if ib < probe_ib {
ib = probe_ib;
hash = old_hash;
k = old_key;
v = old_val;
break;
}
}
}
}
/// A result that works like Option<FullBucket<..>> but preserves
/// the reference that grants us access to the table in any case.
enum SearchResult<K, V, M> {
// This is an entry that holds the given key:
FoundExisting(FullBucket<K, V, M>),
// There was no such entry. The reference is given back:
TableRef(M)
}
impl<K, V, M> SearchResult<K, V, M> {
fn into_option(self) -> Option<FullBucket<K, V, M>> {
match self {
FoundExisting(bucket) => Some(bucket),
TableRef(_) => None
}
}
}
impl<K, V, S> HashMap<K, V, S>
where K: Eq + Hash, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
fn make_hash<X: ?Sized>(&self, x: &X) -> SafeHash where X: Hash {
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
table::make_hash(&self.hash_state, x)
2014-07-15 18:39:32 -05:00
}
2014-07-15 18:39:32 -05:00
/// Search for a key, yielding the index if it's found in the hashtable.
/// If you already have the hash for the key lying around, use
/// search_hashed.
2015-01-05 15:16:49 -06:00
fn search<'a, Q: ?Sized>(&'a self, q: &Q) -> Option<FullBucketImm<'a, K, V>>
where K: Borrow<Q>, Q: Eq + Hash
{
let hash = self.make_hash(q);
search_hashed(&self.table, hash, |k| q.eq(k.borrow()))
.into_option()
}
2015-01-05 15:16:49 -06:00
fn search_mut<'a, Q: ?Sized>(&'a mut self, q: &Q) -> Option<FullBucketMut<'a, K, V>>
where K: Borrow<Q>, Q: Eq + Hash
{
let hash = self.make_hash(q);
search_hashed(&mut self.table, hash, |k| q.eq(k.borrow()))
.into_option()
2014-07-15 18:39:32 -05:00
}
// The caller should ensure that invariants by Robin Hood Hashing hold.
fn insert_hashed_ordered(&mut self, hash: SafeHash, k: K, v: V) {
2014-07-15 18:39:32 -05:00
let cap = self.table.capacity();
let mut buckets = Bucket::new(&mut self.table, hash);
2014-07-15 18:39:32 -05:00
let ib = buckets.index();
2014-07-15 18:39:32 -05:00
while buckets.index() != ib + cap {
// We don't need to compare hashes for value swap.
// Not even DIBs for Robin Hood.
2014-07-15 18:39:32 -05:00
buckets = match buckets.peek() {
Empty(empty) => {
2014-07-15 18:39:32 -05:00
empty.put(hash, k, v);
return;
}
Full(b) => b.into_bucket()
2014-07-15 18:39:32 -05:00
};
buckets.next();
}
panic!("Internal HashMap error: Out of space.");
2014-07-15 18:39:32 -05:00
}
}
impl<K: Hash + Eq, V> HashMap<K, V, RandomState> {
/// Creates an empty HashMap.
2014-07-15 18:39:32 -05:00
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
/// let mut map: HashMap<&str, isize> = HashMap::new();
2014-07-15 18:39:32 -05:00
/// ```
#[inline]
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub fn new() -> HashMap<K, V, RandomState> {
Default::default()
}
2014-07-15 18:39:32 -05:00
/// Creates an empty hash map with the given initial capacity.
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
/// let mut map: HashMap<&str, isize> = HashMap::with_capacity(10);
2014-07-15 18:39:32 -05:00
/// ```
#[inline]
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-02-04 20:17:19 -06:00
pub fn with_capacity(capacity: usize) -> HashMap<K, V, RandomState> {
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
HashMap::with_capacity_and_hash_state(capacity, Default::default())
2013-04-03 07:45:14 -05:00
}
2014-07-15 18:39:32 -05:00
}
impl<K, V, S> HashMap<K, V, S>
where K: Eq + Hash, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
2014-07-15 18:39:32 -05:00
/// Creates an empty hashmap which will use the given hasher to hash keys.
///
/// The created map has the default initial capacity.
2014-07-15 18:39:32 -05:00
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
2015-03-13 17:28:35 -05:00
/// # #![feature(std_misc)]
2014-07-15 18:39:32 -05:00
/// use std::collections::HashMap;
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// use std::collections::hash_map::RandomState;
2014-07-15 18:39:32 -05:00
///
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// let s = RandomState::new();
/// let mut map = HashMap::with_hash_state(s);
/// map.insert(1, 2);
2014-07-15 18:39:32 -05:00
/// ```
#[inline]
#[unstable(feature = "std_misc", reason = "hasher stuff is unclear")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub fn with_hash_state(hash_state: S) -> HashMap<K, V, S> {
2014-07-15 18:39:32 -05:00
HashMap {
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
hash_state: hash_state,
resize_policy: DefaultResizePolicy::new(),
table: RawTable::new(0),
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
}
/// Creates an empty HashMap with space for at least `capacity`
2014-07-15 18:39:32 -05:00
/// elements, using `hasher` to hash the keys.
///
/// Warning: `hasher` is normally randomly generated, and
/// is designed to allow HashMaps to be resistant to attacks that
/// cause many collisions and very poor performance. Setting it
/// manually using this function can expose a DoS attack vector.
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
2015-03-13 17:28:35 -05:00
/// # #![feature(std_misc)]
2014-07-15 18:39:32 -05:00
/// use std::collections::HashMap;
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// use std::collections::hash_map::RandomState;
2014-07-15 18:39:32 -05:00
///
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// let s = RandomState::new();
/// let mut map = HashMap::with_capacity_and_hash_state(10, s);
/// map.insert(1, 2);
2014-07-15 18:39:32 -05:00
/// ```
#[inline]
#[unstable(feature = "std_misc", reason = "hasher stuff is unclear")]
2015-02-04 20:17:19 -06:00
pub fn with_capacity_and_hash_state(capacity: usize, hash_state: S)
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
-> HashMap<K, V, S> {
let resize_policy = DefaultResizePolicy::new();
let min_cap = max(INITIAL_CAPACITY, resize_policy.min_capacity(capacity));
let internal_cap = min_cap.checked_next_power_of_two().expect("capacity overflow");
assert!(internal_cap >= capacity, "capacity overflow");
2014-07-15 18:39:32 -05:00
HashMap {
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
hash_state: hash_state,
resize_policy: resize_policy,
table: RawTable::new(internal_cap),
2014-07-15 18:39:32 -05:00
}
2014-03-26 20:58:08 -05:00
}
/// Returns the number of elements the map can hold without reallocating.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
/// let map: HashMap<isize, isize> = HashMap::with_capacity(100);
/// assert!(map.capacity() >= 100);
/// ```
#[inline]
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-02-04 20:17:19 -06:00
pub fn capacity(&self) -> usize {
self.resize_policy.usable_capacity(self.table.capacity())
}
/// Reserves capacity for at least `additional` more elements to be inserted
/// in the `HashMap`. The collection may reserve more space to avoid
/// frequent reallocations.
2014-07-15 18:39:32 -05:00
///
/// # Panics
///
2015-02-04 20:17:19 -06:00
/// Panics if the new allocation size overflows `usize`.
2014-07-15 18:39:32 -05:00
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
/// let mut map: HashMap<&str, isize> = HashMap::new();
2014-07-15 18:39:32 -05:00
/// map.reserve(10);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-02-04 20:17:19 -06:00
pub fn reserve(&mut self, additional: usize) {
let new_size = self.len().checked_add(additional).expect("capacity overflow");
let min_cap = self.resize_policy.min_capacity(new_size);
// An invalid value shouldn't make us run out of space. This includes
// an overflow check.
assert!(new_size <= min_cap);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
if self.table.capacity() < min_cap {
let new_capacity = max(min_cap.next_power_of_two(), INITIAL_CAPACITY);
self.resize(new_capacity);
}
2014-07-15 18:39:32 -05:00
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2014-07-15 18:39:32 -05:00
/// Resizes the internal vectors to a new capacity. It's your responsibility to:
/// 1) Make sure the new capacity is enough for all the elements, accounting
/// for the load factor.
/// 2) Ensure new_capacity is a power of two or zero.
2015-02-04 20:17:19 -06:00
fn resize(&mut self, new_capacity: usize) {
2014-07-15 18:39:32 -05:00
assert!(self.table.size() <= new_capacity);
assert!(new_capacity.is_power_of_two() || new_capacity == 0);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
let mut old_table = replace(&mut self.table, RawTable::new(new_capacity));
2014-07-15 18:39:32 -05:00
let old_size = old_table.size();
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
if old_table.capacity() == 0 || old_table.size() == 0 {
2014-07-15 18:39:32 -05:00
return;
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
// Grow the table.
// Specialization of the other branch.
let mut bucket = Bucket::first(&mut old_table);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
// "So a few of the first shall be last: for many be called,
// but few chosen."
//
// We'll most likely encounter a few buckets at the beginning that
// have their initial buckets near the end of the table. They were
// placed at the beginning as the probe wrapped around the table
// during insertion. We must skip forward to a bucket that won't
// get reinserted too early and won't unfairly steal others spot.
// This eliminates the need for robin hood.
loop {
bucket = match bucket.peek() {
Full(full) => {
if full.distance() == 0 {
// This bucket occupies its ideal spot.
// It indicates the start of another "cluster".
bucket = full.into_bucket();
break;
}
// Leaving this bucket in the last cluster for later.
full.into_bucket()
}
Empty(b) => {
// Encountered a hole between clusters.
b.into_bucket()
}
};
bucket.next();
}
// This is how the buckets might be laid out in memory:
// ($ marks an initialized bucket)
// ________________
// |$$$_$$$$$$_$$$$$|
//
// But we've skipped the entire initial cluster of buckets
// and will continue iteration in this order:
// ________________
// |$$$$$$_$$$$$
// ^ wrap around once end is reached
// ________________
// $$$_____________|
// ^ exit once table.size == 0
loop {
bucket = match bucket.peek() {
Full(bucket) => {
let h = bucket.hash();
let (b, k, v) = bucket.take();
self.insert_hashed_ordered(h, k, v);
{
let t = b.table(); // FIXME "lifetime too short".
if t.size() == 0 { break }
};
b.into_bucket()
}
Empty(b) => b.into_bucket()
};
bucket.next();
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
2014-07-15 18:39:32 -05:00
assert_eq!(self.table.size(), old_size);
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
/// Shrinks the capacity of the map as much as possible. It will drop
/// down as much as possible while maintaining the internal rules
/// and possibly leaving some space in accordance with the resize policy.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map: HashMap<isize, isize> = HashMap::with_capacity(100);
/// map.insert(1, 2);
/// map.insert(3, 4);
/// assert!(map.capacity() >= 100);
/// map.shrink_to_fit();
/// assert!(map.capacity() >= 2);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn shrink_to_fit(&mut self) {
let min_capacity = self.resize_policy.min_capacity(self.len());
let min_capacity = max(min_capacity.next_power_of_two(), INITIAL_CAPACITY);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2014-07-15 18:39:32 -05:00
// An invalid value shouldn't make us run out of space.
debug_assert!(self.len() <= min_capacity);
if self.table.capacity() != min_capacity {
let old_table = replace(&mut self.table, RawTable::new(min_capacity));
let old_size = old_table.size();
// Shrink the table. Naive algorithm for resizing:
for (h, k, v) in old_table.into_iter() {
self.insert_hashed_nocheck(h, k, v);
}
debug_assert_eq!(self.table.size(), old_size);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
2013-04-03 07:45:14 -05:00
}
2014-07-15 18:39:32 -05:00
/// Insert a pre-hashed key-value pair, without first checking
/// that there's enough room in the buckets. Returns a reference to the
/// newly insert value.
///
/// If the key already exists, the hashtable will be returned untouched
/// and a reference to the existing element will be returned.
fn insert_hashed_nocheck(&mut self, hash: SafeHash, k: K, v: V) -> &mut V {
self.insert_or_replace_with(hash, k, v, |_, _, _| ())
}
2014-12-07 13:15:25 -06:00
fn insert_or_replace_with<'a, F>(&'a mut self,
hash: SafeHash,
k: K,
v: V,
mut found_existing: F)
-> &'a mut V where
F: FnMut(&mut K, &mut V, V),
{
2014-07-15 18:39:32 -05:00
// Worst case, we'll find one empty bucket among `size + 1` buckets.
let size = self.table.size();
let mut probe = Bucket::new(&mut self.table, hash);
let ib = probe.index();
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2014-07-15 18:39:32 -05:00
loop {
let mut bucket = match probe.peek() {
Empty(bucket) => {
2014-07-15 18:39:32 -05:00
// Found a hole!
return bucket.put(hash, k, v).into_mut_refs().1;
}
Full(bucket) => bucket
2014-07-15 18:39:32 -05:00
};
// hash matches?
2014-07-15 18:39:32 -05:00
if bucket.hash() == hash {
// key matches?
if k == *bucket.read_mut().0 {
let (bucket_k, bucket_v) = bucket.into_mut_refs();
debug_assert!(k == *bucket_k);
2014-07-15 18:39:32 -05:00
// Key already exists. Get its reference.
found_existing(bucket_k, bucket_v, v);
return bucket_v;
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
2014-07-15 18:39:32 -05:00
}
let robin_ib = bucket.index() as isize - bucket.distance() as isize;
2013-04-10 15:11:35 -05:00
if (ib as isize) < robin_ib {
2014-07-15 18:39:32 -05:00
// Found a luckier bucket than me. Better steal his spot.
2015-02-04 20:17:19 -06:00
return robin_hood(bucket, robin_ib as usize, hash, k, v);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
probe = bucket.next();
assert!(probe.index() != ib + size + 1);
}
}
2014-07-15 18:39:32 -05:00
/// An iterator visiting all keys in arbitrary order.
/// Iterator element type is `&'a K`.
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
2015-01-25 15:05:03 -06:00
/// map.insert("a", 1);
2014-07-15 18:39:32 -05:00
/// map.insert("b", 2);
/// map.insert("c", 3);
///
/// for key in map.keys() {
/// println!("{}", key);
/// }
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn keys<'a>(&'a self) -> Keys<'a, K, V> {
2014-12-02 13:28:35 -06:00
fn first<A, B>((a, _): (A, B)) -> A { a }
let first: fn((&'a K,&'a V)) -> &'a K = first; // coerce to fn ptr
2014-12-02 13:28:35 -06:00
Keys { inner: self.iter().map(first) }
2014-07-15 18:39:32 -05:00
}
/// An iterator visiting all values in arbitrary order.
/// Iterator element type is `&'a V`.
///
/// # Examples
2014-07-15 18:39:32 -05:00
///
/// ```
/// use std::collections::HashMap;
///
2014-07-15 18:39:32 -05:00
/// let mut map = HashMap::new();
2015-01-25 15:05:03 -06:00
/// map.insert("a", 1);
2014-07-15 18:39:32 -05:00
/// map.insert("b", 2);
/// map.insert("c", 3);
///
/// for val in map.values() {
/// println!("{}", val);
2014-07-15 18:39:32 -05:00
/// }
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn values<'a>(&'a self) -> Values<'a, K, V> {
2014-12-02 13:28:35 -06:00
fn second<A, B>((_, b): (A, B)) -> B { b }
let second: fn((&'a K,&'a V)) -> &'a V = second; // coerce to fn ptr
2014-12-02 13:28:35 -06:00
Values { inner: self.iter().map(second) }
}
2014-07-15 18:39:32 -05:00
/// An iterator visiting all key-value pairs in arbitrary order.
/// Iterator element type is `(&'a K, &'a V)`.
///
/// # Examples
///
/// ```
2014-07-15 18:39:32 -05:00
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
2015-01-25 15:05:03 -06:00
/// map.insert("a", 1);
2014-07-15 18:39:32 -05:00
/// map.insert("b", 2);
/// map.insert("c", 3);
///
/// for (key, val) in map.iter() {
/// println!("key: {} val: {}", key, val);
/// }
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn iter(&self) -> Iter<K, V> {
Iter { inner: self.table.iter() }
}
2014-07-15 18:39:32 -05:00
/// An iterator visiting all key-value pairs in arbitrary order,
/// with mutable references to the values.
/// Iterator element type is `(&'a K, &'a mut V)`.
///
/// # Examples
///
/// ```
2014-07-15 18:39:32 -05:00
/// use std::collections::HashMap;
///
2014-07-15 18:39:32 -05:00
/// let mut map = HashMap::new();
2015-01-25 15:05:03 -06:00
/// map.insert("a", 1);
2014-07-15 18:39:32 -05:00
/// map.insert("b", 2);
/// map.insert("c", 3);
///
/// // Update all values
2014-09-14 22:27:36 -05:00
/// for (_, val) in map.iter_mut() {
2014-07-15 18:39:32 -05:00
/// *val *= 2;
/// }
///
2014-07-15 18:39:32 -05:00
/// for (key, val) in map.iter() {
/// println!("key: {} val: {}", key, val);
/// }
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn iter_mut(&mut self) -> IterMut<K, V> {
IterMut { inner: self.table.iter_mut() }
}
/// Gets the given key's corresponding entry in the map for in-place manipulation.
2015-05-09 06:27:23 -05:00
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut letters = HashMap::new();
///
/// for ch in "a short treatise on fungi".chars() {
/// let counter = letters.entry(ch).or_insert(0);
/// *counter += 1;
/// }
///
/// assert_eq!(letters[&'s'], 2);
/// assert_eq!(letters[&'t'], 3);
/// assert_eq!(letters[&'u'], 1);
/// assert_eq!(letters.get(&'y'), None);
/// ```
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn entry(&mut self, key: K) -> Entry<K, V> {
// Gotta resize now.
self.reserve(1);
let hash = self.make_hash(&key);
search_entry_hashed(&mut self.table, hash, key)
}
/// Returns the number of elements in the map.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut a = HashMap::new();
/// assert_eq!(a.len(), 0);
/// a.insert(1, "a");
/// assert_eq!(a.len(), 1);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-02-04 20:17:19 -06:00
pub fn len(&self) -> usize { self.table.size() }
/// Returns true if the map contains no elements.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut a = HashMap::new();
/// assert!(a.is_empty());
/// a.insert(1, "a");
/// assert!(!a.is_empty());
/// ```
#[inline]
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn is_empty(&self) -> bool { self.len() == 0 }
/// Clears the map, returning all key-value pairs as an iterator. Keeps the
/// allocated memory for reuse.
///
/// # Examples
///
/// ```
2015-03-13 17:28:35 -05:00
/// # #![feature(std_misc)]
/// use std::collections::HashMap;
///
/// let mut a = HashMap::new();
/// a.insert(1, "a");
/// a.insert(2, "b");
///
/// for (k, v) in a.drain().take(1) {
/// assert!(k == 1 || k == 2);
/// assert!(v == "a" || v == "b");
/// }
///
/// assert!(a.is_empty());
/// ```
#[inline]
#[unstable(feature = "std_misc",
reason = "matches collection reform specification, waiting for dust to settle")]
pub fn drain(&mut self) -> Drain<K, V> {
fn last_two<A, B, C>((_, b, c): (A, B, C)) -> (B, C) { (b, c) }
let last_two: fn((SafeHash, K, V)) -> (K, V) = last_two; // coerce to fn pointer
Drain {
inner: self.table.drain().map(last_two),
}
}
/// Clears the map, removing all key-value pairs. Keeps the allocated memory
/// for reuse.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut a = HashMap::new();
/// a.insert(1, "a");
/// a.clear();
/// assert!(a.is_empty());
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
#[inline]
pub fn clear(&mut self) {
self.drain();
}
/// Returns a reference to the value corresponding to the key.
///
/// The key may be any borrowed form of the map's key type, but
/// `Hash` and `Eq` on the borrowed form *must* match those for
/// the key type.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert(1, "a");
/// assert_eq!(map.get(&1), Some(&"a"));
/// assert_eq!(map.get(&2), None);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-05 15:16:49 -06:00
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
where K: Borrow<Q>, Q: Hash + Eq
{
self.search(k).map(|bucket| bucket.into_refs().1)
}
/// Returns true if the map contains a value for the specified key.
///
/// The key may be any borrowed form of the map's key type, but
/// `Hash` and `Eq` on the borrowed form *must* match those for
/// the key type.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert(1, "a");
/// assert_eq!(map.contains_key(&1), true);
/// assert_eq!(map.contains_key(&2), false);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-05 15:16:49 -06:00
pub fn contains_key<Q: ?Sized>(&self, k: &Q) -> bool
where K: Borrow<Q>, Q: Hash + Eq
{
self.search(k).is_some()
}
/// Returns a mutable reference to the value corresponding to the key.
///
/// The key may be any borrowed form of the map's key type, but
/// `Hash` and `Eq` on the borrowed form *must* match those for
/// the key type.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert(1, "a");
/// if let Some(x) = map.get_mut(&1) {
/// *x = "b";
/// }
/// assert_eq!(map[&1], "b");
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-05 15:16:49 -06:00
pub fn get_mut<Q: ?Sized>(&mut self, k: &Q) -> Option<&mut V>
where K: Borrow<Q>, Q: Hash + Eq
{
self.search_mut(k).map(|bucket| bucket.into_mut_refs().1)
}
/// Inserts a key-value pair into the map. If the key already had a value
/// present in the map, that value is returned. Otherwise, `None` is returned.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// assert_eq!(map.insert(37, "a"), None);
/// assert_eq!(map.is_empty(), false);
///
/// map.insert(37, "b");
/// assert_eq!(map.insert(37, "c"), Some("b"));
/// assert_eq!(map[&37], "c");
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn insert(&mut self, k: K, v: V) -> Option<V> {
let hash = self.make_hash(&k);
self.reserve(1);
let mut retval = None;
self.insert_or_replace_with(hash, k, v, |_, val_ref, val| {
retval = Some(replace(val_ref, val));
});
retval
}
/// Removes a key from the map, returning the value at the key if the key
/// was previously in the map.
///
/// The key may be any borrowed form of the map's key type, but
/// `Hash` and `Eq` on the borrowed form *must* match those for
/// the key type.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert(1, "a");
/// assert_eq!(map.remove(&1), Some("a"));
/// assert_eq!(map.remove(&1), None);
/// ```
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-05 15:16:49 -06:00
pub fn remove<Q: ?Sized>(&mut self, k: &Q) -> Option<V>
where K: Borrow<Q>, Q: Hash + Eq
{
if self.table.size() == 0 {
return None
}
self.search_mut(k).map(|bucket| pop_internal(bucket).1)
}
}
fn search_entry_hashed<'a, K: Eq, V>(table: &'a mut RawTable<K,V>, hash: SafeHash, k: K)
-> Entry<'a, K, V>
{
// Worst case, we'll find one empty bucket among `size + 1` buckets.
let size = table.size();
let mut probe = Bucket::new(table, hash);
let ib = probe.index();
loop {
let bucket = match probe.peek() {
Empty(bucket) => {
// Found a hole!
return Vacant(VacantEntry {
hash: hash,
key: k,
elem: NoElem(bucket),
});
},
Full(bucket) => bucket
};
// hash matches?
if bucket.hash() == hash {
// key matches?
if k == *bucket.read().0 {
return Occupied(OccupiedEntry{
elem: bucket,
});
}
}
let robin_ib = bucket.index() as isize - bucket.distance() as isize;
if (ib as isize) < robin_ib {
// Found a luckier bucket than me. Better steal his spot.
return Vacant(VacantEntry {
hash: hash,
key: k,
2015-02-04 20:17:19 -06:00
elem: NeqElem(bucket, robin_ib as usize),
});
}
probe = bucket.next();
assert!(probe.index() != ib + size + 1);
}
2014-07-15 18:39:32 -05:00
}
impl<K, V, S> PartialEq for HashMap<K, V, S>
where K: Eq + Hash, V: PartialEq, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
fn eq(&self, other: &HashMap<K, V, S>) -> bool {
if self.len() != other.len() { return false; }
self.iter().all(|(key, value)|
other.get(key).map_or(false, |v| *value == *v)
)
}
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> Eq for HashMap<K, V, S>
where K: Eq + Hash, V: Eq, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> Debug for HashMap<K, V, S>
where K: Eq + Hash + Debug, V: Debug, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
2014-07-15 18:39:32 -05:00
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_map().entries(self.iter()).finish()
2014-07-15 18:39:32 -05:00
}
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> Default for HashMap<K, V, S>
where K: Eq + Hash,
S: HashState + Default,
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
fn default() -> HashMap<K, V, S> {
HashMap::with_hash_state(Default::default())
}
2014-07-15 18:39:32 -05:00
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, Q: ?Sized, V, S> Index<&'a Q> for HashMap<K, V, S>
where K: Eq + Hash + Borrow<Q>,
Q: Eq + Hash,
S: HashState,
{
type Output = V;
#[inline]
fn index(&self, index: &Q) -> &V {
self.get(index).expect("no entry found for key")
}
}
/// HashMap iterator.
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub struct Iter<'a, K: 'a, V: 'a> {
inner: table::Iter<'a, K, V>
}
// FIXME(#19839) Remove in favor of `#[derive(Clone)]`
2014-12-30 18:29:27 -06:00
impl<'a, K, V> Clone for Iter<'a, K, V> {
fn clone(&self) -> Iter<'a, K, V> {
Iter {
inner: self.inner.clone()
}
}
}
/// HashMap mutable values iterator.
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub struct IterMut<'a, K: 'a, V: 'a> {
inner: table::IterMut<'a, K, V>
}
/// HashMap move iterator.
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub struct IntoIter<K, V> {
inner: iter::Map<table::IntoIter<K, V>, fn((SafeHash, K, V)) -> (K, V)>
}
/// HashMap keys iterator.
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub struct Keys<'a, K: 'a, V: 'a> {
inner: Map<Iter<'a, K, V>, fn((&'a K, &'a V)) -> &'a K>
}
// FIXME(#19839) Remove in favor of `#[derive(Clone)]`
impl<'a, K, V> Clone for Keys<'a, K, V> {
fn clone(&self) -> Keys<'a, K, V> {
Keys {
inner: self.inner.clone()
}
}
}
/// HashMap values iterator.
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub struct Values<'a, K: 'a, V: 'a> {
inner: Map<Iter<'a, K, V>, fn((&'a K, &'a V)) -> &'a V>
}
// FIXME(#19839) Remove in favor of `#[derive(Clone)]`
impl<'a, K, V> Clone for Values<'a, K, V> {
fn clone(&self) -> Values<'a, K, V> {
Values {
inner: self.inner.clone()
}
}
}
/// HashMap drain iterator.
#[unstable(feature = "std_misc",
reason = "matches collection reform specification, waiting for dust to settle")]
pub struct Drain<'a, K: 'a, V: 'a> {
inner: iter::Map<table::Drain<'a, K, V>, fn((SafeHash, K, V)) -> (K, V)>
}
/// A view into a single occupied location in a HashMap.
#[stable(feature = "rust1", since = "1.0.0")]
pub struct OccupiedEntry<'a, K: 'a, V: 'a> {
elem: FullBucket<K, V, &'a mut RawTable<K, V>>,
}
/// A view into a single empty location in a HashMap.
#[stable(feature = "rust1", since = "1.0.0")]
pub struct VacantEntry<'a, K: 'a, V: 'a> {
hash: SafeHash,
key: K,
elem: VacantEntryState<K, V, &'a mut RawTable<K, V>>,
}
/// A view into a single location in a map, which may be vacant or occupied.
#[stable(feature = "rust1", since = "1.0.0")]
pub enum Entry<'a, K: 'a, V: 'a> {
/// An occupied Entry.
#[stable(feature = "rust1", since = "1.0.0")]
Occupied(OccupiedEntry<'a, K, V>),
/// A vacant Entry.
#[stable(feature = "rust1", since = "1.0.0")]
Vacant(VacantEntry<'a, K, V>),
}
/// Possible states of a VacantEntry.
enum VacantEntryState<K, V, M> {
/// The index is occupied, but the key to insert has precedence,
/// and will kick the current one out on insertion.
2015-02-04 20:17:19 -06:00
NeqElem(FullBucket<K, V, M>, usize),
/// The index is genuinely vacant.
NoElem(EmptyBucket<K, V, M>),
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V, S> IntoIterator for &'a HashMap<K, V, S>
where K: Eq + Hash, S: HashState
{
type Item = (&'a K, &'a V);
type IntoIter = Iter<'a, K, V>;
fn into_iter(self) -> Iter<'a, K, V> {
self.iter()
}
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V, S> IntoIterator for &'a mut HashMap<K, V, S>
where K: Eq + Hash, S: HashState
{
type Item = (&'a K, &'a mut V);
type IntoIter = IterMut<'a, K, V>;
fn into_iter(mut self) -> IterMut<'a, K, V> {
self.iter_mut()
}
}
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> IntoIterator for HashMap<K, V, S>
where K: Eq + Hash, S: HashState
2015-01-31 10:41:32 -06:00
{
type Item = (K, V);
type IntoIter = IntoIter<K, V>;
2015-01-31 10:41:32 -06:00
/// Creates a consuming iterator, that is, one that moves each key-value
/// pair out of the map in arbitrary order. The map cannot be used after
/// calling this.
///
/// # Examples
///
/// ```
/// use std::collections::HashMap;
///
/// let mut map = HashMap::new();
/// map.insert("a", 1);
/// map.insert("b", 2);
/// map.insert("c", 3);
///
/// // Not possible with .iter()
/// let vec: Vec<(&str, isize)> = map.into_iter().collect();
/// ```
2015-01-31 08:17:50 -06:00
fn into_iter(self) -> IntoIter<K, V> {
fn last_two<A, B, C>((_, b, c): (A, B, C)) -> (B, C) { (b, c) }
let last_two: fn((SafeHash, K, V)) -> (K, V) = last_two;
IntoIter {
inner: self.table.into_iter().map(last_two)
}
2015-01-31 10:41:32 -06:00
}
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-01 21:33:39 -06:00
impl<'a, K, V> Iterator for Iter<'a, K, V> {
type Item = (&'a K, &'a V);
#[inline] fn next(&mut self) -> Option<(&'a K, &'a V)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> ExactSizeIterator for Iter<'a, K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-01 21:33:39 -06:00
impl<'a, K, V> Iterator for IterMut<'a, K, V> {
type Item = (&'a K, &'a mut V);
#[inline] fn next(&mut self) -> Option<(&'a K, &'a mut V)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> ExactSizeIterator for IterMut<'a, K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-01 21:33:39 -06:00
impl<K, V> Iterator for IntoIter<K, V> {
type Item = (K, V);
#[inline] fn next(&mut self) -> Option<(K, V)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V> ExactSizeIterator for IntoIter<K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-01 21:33:39 -06:00
impl<'a, K, V> Iterator for Keys<'a, K, V> {
type Item = &'a K;
#[inline] fn next(&mut self) -> Option<(&'a K)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> ExactSizeIterator for Keys<'a, K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
2015-01-01 21:33:39 -06:00
impl<'a, K, V> Iterator for Values<'a, K, V> {
type Item = &'a V;
#[inline] fn next(&mut self) -> Option<(&'a V)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> ExactSizeIterator for Values<'a, K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> Iterator for Drain<'a, K, V> {
2015-01-01 21:33:39 -06:00
type Item = (K, V);
#[inline] fn next(&mut self) -> Option<(K, V)> { self.inner.next() }
#[inline] fn size_hint(&self) -> (usize, Option<usize>) { self.inner.size_hint() }
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<'a, K, V> ExactSizeIterator for Drain<'a, K, V> {
#[inline] fn len(&self) -> usize { self.inner.len() }
}
impl<'a, K, V> Entry<'a, K, V> {
#[unstable(feature = "std_misc",
reason = "will soon be replaced by or_insert")]
#[deprecated(since = "1.0",
2015-03-20 12:43:01 -05:00
reason = "replaced with more ergonomic `or_insert` and `or_insert_with`")]
/// Returns a mutable reference to the entry if occupied, or the VacantEntry if vacant
pub fn get(self) -> Result<&'a mut V, VacantEntry<'a, K, V>> {
match self {
Occupied(entry) => Ok(entry.into_mut()),
Vacant(entry) => Err(entry),
}
}
#[stable(feature = "rust1", since = "1.0.0")]
/// Ensures a value is in the entry by inserting the default if empty, and returns
/// a mutable reference to the value in the entry.
2015-03-20 12:43:01 -05:00
pub fn or_insert(self, default: V) -> &'a mut V {
match self {
Occupied(entry) => entry.into_mut(),
Vacant(entry) => entry.insert(default),
}
}
#[stable(feature = "rust1", since = "1.0.0")]
/// Ensures a value is in the entry by inserting the result of the default function if empty,
/// and returns a mutable reference to the value in the entry.
2015-03-20 12:43:01 -05:00
pub fn or_insert_with<F: FnOnce() -> V>(self, default: F) -> &'a mut V {
match self {
Occupied(entry) => entry.into_mut(),
Vacant(entry) => entry.insert(default()),
}
}
}
impl<'a, K, V> OccupiedEntry<'a, K, V> {
/// Gets a reference to the value in the entry.
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn get(&self) -> &V {
self.elem.read().1
}
/// Gets a mutable reference to the value in the entry.
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn get_mut(&mut self) -> &mut V {
self.elem.read_mut().1
}
/// Converts the OccupiedEntry into a mutable reference to the value in the entry
/// with a lifetime bound to the map itself
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn into_mut(self) -> &'a mut V {
self.elem.into_mut_refs().1
}
/// Sets the value of the entry, and returns the entry's old value
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn insert(&mut self, mut value: V) -> V {
let old_value = self.get_mut();
mem::swap(&mut value, old_value);
value
}
/// Takes the value out of the entry, and returns it
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn remove(self) -> V {
pop_internal(self.elem).1
}
}
impl<'a, K: 'a, V: 'a> VacantEntry<'a, K, V> {
/// Sets the value of the entry with the VacantEntry's key,
/// and returns a mutable reference to it
2015-02-04 18:36:02 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub fn insert(self, value: V) -> &'a mut V {
match self.elem {
NeqElem(bucket, ib) => {
robin_hood(bucket, ib, self.hash, self.key, value)
}
NoElem(bucket) => {
bucket.put(self.hash, self.key, value).into_mut_refs().1
}
}
}
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> FromIterator<(K, V)> for HashMap<K, V, S>
where K: Eq + Hash, S: HashState + Default
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
fn from_iter<T: IntoIterator<Item=(K, V)>>(iterable: T) -> HashMap<K, V, S> {
let iter = iterable.into_iter();
let lower = iter.size_hint().0;
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
let mut map = HashMap::with_capacity_and_hash_state(lower,
Default::default());
2014-07-15 18:39:32 -05:00
map.extend(iter);
map
}
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
impl<K, V, S> Extend<(K, V)> for HashMap<K, V, S>
where K: Eq + Hash, S: HashState
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
{
fn extend<T: IntoIterator<Item=(K, V)>>(&mut self, iter: T) {
2014-07-15 18:39:32 -05:00
for (k, v) in iter {
self.insert(k, v);
}
}
}
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// `RandomState` is the default state for `HashMap` types.
///
/// A particular instance `RandomState` will create the same instances of
/// `Hasher`, but the hashers created by two different `RandomState`
/// instances are unlikely to produce the same result for the same values.
#[derive(Clone)]
#[unstable(feature = "std_misc",
reason = "hashing an hash maps may be altered")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub struct RandomState {
k0: u64,
k1: u64,
}
#[unstable(feature = "std_misc",
reason = "hashing an hash maps may be altered")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
impl RandomState {
/// Constructs a new `RandomState` that is initialized with random keys.
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
#[inline]
#[allow(deprecated)]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub fn new() -> RandomState {
let mut r = rand::thread_rng();
RandomState { k0: r.gen(), k1: r.gen() }
}
}
#[unstable(feature = "std_misc",
reason = "hashing an hash maps may be altered")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
impl HashState for RandomState {
type Hasher = SipHasher;
#[inline]
fn hasher(&self) -> SipHasher {
SipHasher::new_with_keys(self.k0, self.k1)
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
}
}
#[stable(feature = "rust1", since = "1.0.0")]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
impl Default for RandomState {
#[inline]
fn default() -> RandomState {
RandomState::new()
}
}
#[cfg(test)]
2013-04-03 07:45:14 -05:00
mod test_map {
use prelude::v1::*;
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
use super::HashMap;
use super::Entry::{Occupied, Vacant};
use iter::{range_inclusive, repeat};
use cell::RefCell;
2015-04-10 13:39:53 -05:00
use rand::{thread_rng, Rng};
2013-05-30 12:03:11 -05:00
#[test]
fn test_create_capacity_zero() {
let mut m = HashMap::with_capacity(0);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 1).is_none());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert!(m.contains_key(&1));
assert!(!m.contains_key(&0));
2013-05-30 12:03:11 -05:00
}
#[test]
fn test_insert() {
let mut m = HashMap::new();
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(m.len(), 0);
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 2).is_none());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(m.len(), 1);
2015-01-25 15:05:03 -06:00
assert!(m.insert(2, 4).is_none());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(m.len(), 2);
assert_eq!(*m.get(&1).unwrap(), 2);
assert_eq!(*m.get(&2).unwrap(), 4);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
thread_local! { static DROP_VECTOR: RefCell<Vec<isize>> = RefCell::new(Vec::new()) }
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
#[derive(Hash, PartialEq, Eq)]
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
struct Dropable {
2015-02-04 20:17:19 -06:00
k: usize
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
impl Dropable {
2015-02-04 20:17:19 -06:00
fn new(k: usize) -> Dropable {
DROP_VECTOR.with(|slot| {
slot.borrow_mut()[k] += 1;
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
Dropable { k: k }
}
}
impl Drop for Dropable {
fn drop(&mut self) {
DROP_VECTOR.with(|slot| {
slot.borrow_mut()[self.k] -= 1;
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
}
impl Clone for Dropable {
fn clone(&self) -> Dropable {
Dropable::new(self.k)
}
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
#[test]
fn test_drops() {
DROP_VECTOR.with(|slot| {
2015-01-25 15:05:03 -06:00
*slot.borrow_mut() = repeat(0).take(200).collect();
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
{
let mut m = HashMap::new();
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 0);
}
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2015-02-04 20:17:19 -06:00
for i in 0..100 {
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
let d1 = Dropable::new(i);
let d2 = Dropable::new(i+100);
m.insert(d1, d2);
}
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 1);
}
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2015-02-04 20:17:19 -06:00
for i in 0..50 {
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
let k = Dropable::new(i);
let v = m.remove(&k);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert!(v.is_some());
DROP_VECTOR.with(|v| {
assert_eq!(v.borrow()[i], 1);
assert_eq!(v.borrow()[i+100], 1);
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..50 {
assert_eq!(v.borrow()[i], 0);
assert_eq!(v.borrow()[i+100], 0);
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2015-02-04 20:17:19 -06:00
for i in 50..100 {
assert_eq!(v.borrow()[i], 1);
assert_eq!(v.borrow()[i+100], 1);
}
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 0);
}
});
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
#[test]
fn test_move_iter_drops() {
DROP_VECTOR.with(|v| {
*v.borrow_mut() = repeat(0).take(200).collect();
});
let hm = {
let mut hm = HashMap::new();
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 0);
}
});
2015-02-04 20:17:19 -06:00
for i in 0..100 {
let d1 = Dropable::new(i);
let d2 = Dropable::new(i+100);
hm.insert(d1, d2);
}
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 1);
}
});
hm
};
// By the way, ensure that cloning doesn't screw up the dropping.
drop(hm.clone());
{
2014-09-14 22:27:36 -05:00
let mut half = hm.into_iter().take(50);
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 1);
}
});
2015-01-23 12:16:03 -06:00
for _ in half.by_ref() {}
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
let nk = (0..100).filter(|&i| {
v.borrow()[i] == 1
}).count();
2015-02-04 20:17:19 -06:00
let nv = (0..100).filter(|&i| {
v.borrow()[i+100] == 1
}).count();
assert_eq!(nk, 50);
assert_eq!(nv, 50);
});
};
DROP_VECTOR.with(|v| {
2015-02-04 20:17:19 -06:00
for i in 0..200 {
assert_eq!(v.borrow()[i], 0);
}
});
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
#[test]
fn test_empty_pop() {
let mut m: HashMap<isize, bool> = HashMap::new();
assert_eq!(m.remove(&0), None);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
#[test]
fn test_lots_of_insertions() {
let mut m = HashMap::new();
// Try this a few times to make sure we never screw up the hashmap's
// internal state.
2015-01-25 15:05:03 -06:00
for _ in 0..10 {
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert!(m.is_empty());
2015-01-25 15:05:03 -06:00
for i in range_inclusive(1, 1000) {
assert!(m.insert(i, i).is_none());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
for j in range_inclusive(1, i) {
let r = m.get(&j);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(r, Some(&j));
}
for j in range_inclusive(i+1, 1000) {
let r = m.get(&j);
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(r, None);
}
}
2015-01-25 15:05:03 -06:00
for i in range_inclusive(1001, 2000) {
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert!(!m.contains_key(&i));
}
// remove forwards
2015-01-25 15:05:03 -06:00
for i in range_inclusive(1, 1000) {
assert!(m.remove(&i).is_some());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
for j in range_inclusive(1, i) {
assert!(!m.contains_key(&j));
}
for j in range_inclusive(i+1, 1000) {
assert!(m.contains_key(&j));
}
}
2015-01-25 15:05:03 -06:00
for i in range_inclusive(1, 1000) {
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert!(!m.contains_key(&i));
}
2015-01-25 15:05:03 -06:00
for i in range_inclusive(1, 1000) {
assert!(m.insert(i, i).is_none());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
}
// remove backwards
for i in (1..1001).rev() {
assert!(m.remove(&i).is_some());
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
for j in range_inclusive(i, 1000) {
assert!(!m.contains_key(&j));
}
for j in range_inclusive(1, i-1) {
assert!(m.contains_key(&j));
}
}
}
2013-04-03 07:45:14 -05:00
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_find_mut() {
let mut m = HashMap::new();
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 12).is_none());
assert!(m.insert(2, 8).is_none());
assert!(m.insert(5, 14).is_none());
2013-04-03 07:45:14 -05:00
let new = 100;
match m.get_mut(&5) {
None => panic!(), Some(x) => *x = new
2013-04-03 07:45:14 -05:00
}
assert_eq!(m.get(&5), Some(&new));
2013-04-03 07:45:14 -05:00
}
2013-03-24 18:07:36 -05:00
2013-04-03 07:45:14 -05:00
#[test]
fn test_insert_overwrite() {
let mut m = HashMap::new();
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 2).is_none());
assert_eq!(*m.get(&1).unwrap(), 2);
2015-01-25 15:05:03 -06:00
assert!(!m.insert(1, 3).is_none());
assert_eq!(*m.get(&1).unwrap(), 3);
2013-04-03 07:45:14 -05:00
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_insert_conflicts() {
let mut m = HashMap::with_capacity(4);
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 2).is_none());
assert!(m.insert(5, 3).is_none());
assert!(m.insert(9, 4).is_none());
assert_eq!(*m.get(&9).unwrap(), 4);
assert_eq!(*m.get(&5).unwrap(), 3);
assert_eq!(*m.get(&1).unwrap(), 2);
2013-04-03 07:45:14 -05:00
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_conflict_remove() {
let mut m = HashMap::with_capacity(4);
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 2).is_none());
assert_eq!(*m.get(&1).unwrap(), 2);
assert!(m.insert(5, 3).is_none());
assert_eq!(*m.get(&1).unwrap(), 2);
assert_eq!(*m.get(&5).unwrap(), 3);
assert!(m.insert(9, 4).is_none());
assert_eq!(*m.get(&1).unwrap(), 2);
assert_eq!(*m.get(&5).unwrap(), 3);
assert_eq!(*m.get(&9).unwrap(), 4);
assert!(m.remove(&1).is_some());
assert_eq!(*m.get(&9).unwrap(), 4);
assert_eq!(*m.get(&5).unwrap(), 3);
2013-04-03 07:45:14 -05:00
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_is_empty() {
let mut m = HashMap::with_capacity(4);
2015-01-25 15:05:03 -06:00
assert!(m.insert(1, 2).is_none());
2013-04-03 07:45:14 -05:00
assert!(!m.is_empty());
assert!(m.remove(&1).is_some());
2013-04-03 07:45:14 -05:00
assert!(m.is_empty());
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_pop() {
let mut m = HashMap::new();
2015-01-25 15:05:03 -06:00
m.insert(1, 2);
assert_eq!(m.remove(&1), Some(2));
assert_eq!(m.remove(&1), None);
2013-04-03 07:45:14 -05:00
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_iterate() {
let mut m = HashMap::with_capacity(4);
2015-02-04 20:17:19 -06:00
for i in 0..32 {
assert!(m.insert(i, i*2).is_none());
}
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
assert_eq!(m.len(), 32);
let mut observed: u32 = 0;
Performance-oriented hashtable. Previously, rust's hashtable was totally unoptimized. It used an Option per key-value pair, and used very naive open allocation. The old hashtable had very high variance in lookup time. For an example, see the 'find_nonexisting' benchmark below. This is fixed by keys in 'lucky' spots with a low probe sequence length getting their good spots stolen by keys with long probe sequence lengths. This reduces hashtable probe length variance, while maintaining the same mean. Also, other optimization liberties were taken. Everything is as cache aware as possible, and this hashtable should perform extremely well for both large and small keys and values. Benchmarks: comprehensive_old_hashmap 378 ns/iter (+/- 8) comprehensive_new_hashmap 206 ns/iter (+/- 4) 1.8x faster old_hashmap_as_queue 238 ns/iter (+/- 8) new_hashmap_as_queue 119 ns/iter (+/- 2) 2x faster old_hashmap_insert 172 ns/iter (+/- 8) new_hashmap_insert 146 ns/iter (+/- 11) 1.17x faster old_hashmap_find_existing 50 ns/iter (+/- 12) new_hashmap_find_existing 35 ns/iter (+/- 6) 1.43x faster old_hashmap_find_notexisting 49 ns/iter (+/- 49) new_hashmap_find_notexisting 34 ns/iter (+/- 4) 1.44x faster Memory usage of old hashtable (64-bit assumed): aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words Memory usage of new hashtable: (aligned(sizeof(K)) + aligned(sizeof(V)) + 8)/0.9 + 6.5 words BUT accesses are much more cache friendly. In fact, if the probe sequence length is below 8, only two cache lines worth of hashes will be pulled into cache. This is unlike the old version which would have to stride over the stoerd keys and values, and would be more cache unfriendly the bigger the stored values got. And did you notice the higher load factor? We can now reasonably get a load factor of 0.9 with very good performance.
2014-02-28 21:23:53 -06:00
2015-01-31 11:20:46 -06:00
for (k, v) in &m {
assert_eq!(*v, *k * 2);
observed |= 1 << *k;
}
assert_eq!(observed, 0xFFFF_FFFF);
2013-04-03 07:45:14 -05:00
}
2012-12-10 11:00:52 -06:00
#[test]
fn test_keys() {
2015-01-25 15:05:03 -06:00
let vec = vec![(1, 'a'), (2, 'b'), (3, 'c')];
2015-02-04 20:17:19 -06:00
let map: HashMap<_, _> = vec.into_iter().collect();
let keys: Vec<_> = map.keys().cloned().collect();
assert_eq!(keys.len(), 3);
assert!(keys.contains(&1));
assert!(keys.contains(&2));
assert!(keys.contains(&3));
}
#[test]
fn test_values() {
2015-01-25 15:05:03 -06:00
let vec = vec![(1, 'a'), (2, 'b'), (3, 'c')];
2015-02-04 20:17:19 -06:00
let map: HashMap<_, _> = vec.into_iter().collect();
let values: Vec<_> = map.values().cloned().collect();
assert_eq!(values.len(), 3);
assert!(values.contains(&'a'));
assert!(values.contains(&'b'));
assert!(values.contains(&'c'));
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_find() {
let mut m = HashMap::new();
2015-01-25 15:05:03 -06:00
assert!(m.get(&1).is_none());
m.insert(1, 2);
match m.get(&1) {
None => panic!(),
Some(v) => assert_eq!(*v, 2)
2013-04-03 07:45:14 -05:00
}
}
2012-12-10 11:00:52 -06:00
2013-04-03 07:45:14 -05:00
#[test]
fn test_eq() {
let mut m1 = HashMap::new();
2015-01-25 15:05:03 -06:00
m1.insert(1, 2);
m1.insert(2, 3);
m1.insert(3, 4);
2012-12-10 11:00:52 -06:00
let mut m2 = HashMap::new();
2015-01-25 15:05:03 -06:00
m2.insert(1, 2);
m2.insert(2, 3);
2012-12-10 11:00:52 -06:00
2013-04-03 07:45:14 -05:00
assert!(m1 != m2);
2012-12-10 11:00:52 -06:00
2015-01-25 15:05:03 -06:00
m2.insert(3, 4);
assert_eq!(m1, m2);
2013-04-03 07:45:14 -05:00
}
#[test]
fn test_show() {
2015-02-04 20:17:19 -06:00
let mut map = HashMap::new();
let empty: HashMap<i32, i32> = HashMap::new();
2015-01-25 15:05:03 -06:00
map.insert(1, 2);
map.insert(3, 4);
let map_str = format!("{:?}", map);
assert!(map_str == "{1: 2, 3: 4}" ||
map_str == "{3: 4, 1: 2}");
assert_eq!(format!("{:?}", empty), "{}");
}
2013-04-03 07:45:14 -05:00
#[test]
fn test_expand() {
let mut m = HashMap::new();
assert_eq!(m.len(), 0);
2013-04-03 07:45:14 -05:00
assert!(m.is_empty());
2015-02-04 20:17:19 -06:00
let mut i = 0;
let old_cap = m.table.capacity();
while old_cap == m.table.capacity() {
2013-04-03 07:45:14 -05:00
m.insert(i, i);
i += 1;
}
2013-04-03 07:45:14 -05:00
assert_eq!(m.len(), i);
2013-04-03 07:45:14 -05:00
assert!(!m.is_empty());
}
#[test]
fn test_behavior_resize_policy() {
let mut m = HashMap::new();
assert_eq!(m.len(), 0);
assert_eq!(m.table.capacity(), 0);
assert!(m.is_empty());
m.insert(0, 0);
m.remove(&0);
assert!(m.is_empty());
let initial_cap = m.table.capacity();
m.reserve(initial_cap);
let cap = m.table.capacity();
assert_eq!(cap, initial_cap * 2);
2015-02-04 20:17:19 -06:00
let mut i = 0;
for _ in 0..cap * 3 / 4 {
m.insert(i, i);
i += 1;
}
// three quarters full
assert_eq!(m.len(), i);
assert_eq!(m.table.capacity(), cap);
for _ in 0..cap / 4 {
m.insert(i, i);
i += 1;
}
// half full
let new_cap = m.table.capacity();
assert_eq!(new_cap, cap * 2);
for _ in 0..cap / 2 - 1 {
i -= 1;
m.remove(&i);
assert_eq!(m.table.capacity(), new_cap);
}
// A little more than one quarter full.
m.shrink_to_fit();
assert_eq!(m.table.capacity(), cap);
// again, a little more than half full
for _ in 0..cap / 2 - 1 {
i -= 1;
m.remove(&i);
}
m.shrink_to_fit();
assert_eq!(m.len(), i);
assert!(!m.is_empty());
assert_eq!(m.table.capacity(), initial_cap);
}
#[test]
fn test_reserve_shrink_to_fit() {
let mut m = HashMap::new();
2015-02-04 20:17:19 -06:00
m.insert(0, 0);
m.remove(&0);
assert!(m.capacity() >= m.len());
2015-02-04 20:17:19 -06:00
for i in 0..128 {
m.insert(i, i);
}
m.reserve(256);
let usable_cap = m.capacity();
2015-02-04 20:17:19 -06:00
for i in 128..(128 + 256) {
m.insert(i, i);
assert_eq!(m.capacity(), usable_cap);
}
2015-02-04 20:17:19 -06:00
for i in 100..(128 + 256) {
assert_eq!(m.remove(&i), Some(i));
}
m.shrink_to_fit();
assert_eq!(m.len(), 100);
assert!(!m.is_empty());
assert!(m.capacity() >= m.len());
2015-02-04 20:17:19 -06:00
for i in 0..100 {
assert_eq!(m.remove(&i), Some(i));
}
m.shrink_to_fit();
m.insert(0, 0);
assert_eq!(m.len(), 1);
assert!(m.capacity() >= m.len());
assert_eq!(m.remove(&0), Some(0));
}
#[test]
fn test_from_iter() {
2015-01-25 15:05:03 -06:00
let xs = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)];
2015-02-04 20:17:19 -06:00
let map: HashMap<_, _> = xs.iter().cloned().collect();
2015-01-31 11:20:46 -06:00
for &(k, v) in &xs {
assert_eq!(map.get(&k), Some(&v));
}
}
#[test]
fn test_size_hint() {
2015-01-25 15:05:03 -06:00
let xs = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)];
2015-02-04 20:17:19 -06:00
let map: HashMap<_, _> = xs.iter().cloned().collect();
let mut iter = map.iter();
for _ in iter.by_ref().take(3) {}
assert_eq!(iter.size_hint(), (3, Some(3)));
}
#[test]
fn test_iter_len() {
2015-01-25 15:05:03 -06:00
let xs = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)];
2015-02-04 20:17:19 -06:00
let map: HashMap<_, _> = xs.iter().cloned().collect();
let mut iter = map.iter();
for _ in iter.by_ref().take(3) {}
assert_eq!(iter.len(), 3);
}
#[test]
fn test_mut_size_hint() {
2015-01-25 15:05:03 -06:00
let xs = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)];
2015-02-04 20:17:19 -06:00
let mut map: HashMap<_, _> = xs.iter().cloned().collect();
2014-09-14 22:27:36 -05:00
let mut iter = map.iter_mut();
for _ in iter.by_ref().take(3) {}
assert_eq!(iter.size_hint(), (3, Some(3)));
}
#[test]
fn test_iter_mut_len() {
2015-01-25 15:05:03 -06:00
let xs = [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)];
2015-02-04 20:17:19 -06:00
let mut map: HashMap<_, _> = xs.iter().cloned().collect();
let mut iter = map.iter_mut();
for _ in iter.by_ref().take(3) {}
assert_eq!(iter.len(), 3);
}
#[test]
fn test_index() {
2015-02-04 20:17:19 -06:00
let mut map = HashMap::new();
map.insert(1, 2);
map.insert(2, 1);
map.insert(3, 4);
assert_eq!(map[&2], 1);
}
#[test]
#[should_panic]
fn test_index_nonexistent() {
2015-02-04 20:17:19 -06:00
let mut map = HashMap::new();
map.insert(1, 2);
map.insert(2, 1);
map.insert(3, 4);
map[&4];
}
#[test]
fn test_entry(){
2015-01-25 15:05:03 -06:00
let xs = [(1, 10), (2, 20), (3, 30), (4, 40), (5, 50), (6, 60)];
2015-02-04 20:17:19 -06:00
let mut map: HashMap<_, _> = xs.iter().cloned().collect();
// Existing key (insert)
match map.entry(1) {
Vacant(_) => unreachable!(),
Occupied(mut view) => {
assert_eq!(view.get(), &10);
assert_eq!(view.insert(100), 10);
}
}
assert_eq!(map.get(&1).unwrap(), &100);
assert_eq!(map.len(), 6);
// Existing key (update)
match map.entry(2) {
Vacant(_) => unreachable!(),
Occupied(mut view) => {
let v = view.get_mut();
let new_v = (*v) * 10;
*v = new_v;
}
}
assert_eq!(map.get(&2).unwrap(), &200);
assert_eq!(map.len(), 6);
// Existing key (take)
match map.entry(3) {
Vacant(_) => unreachable!(),
Occupied(view) => {
assert_eq!(view.remove(), 30);
}
}
assert_eq!(map.get(&3), None);
assert_eq!(map.len(), 5);
// Inexistent key (insert)
match map.entry(10) {
Occupied(_) => unreachable!(),
Vacant(view) => {
assert_eq!(*view.insert(1000), 1000);
}
}
assert_eq!(map.get(&10).unwrap(), &1000);
assert_eq!(map.len(), 6);
}
#[test]
fn test_entry_take_doesnt_corrupt() {
2015-02-19 11:57:25 -06:00
#![allow(deprecated)] //rand
// Test for #19292
2015-02-04 20:17:19 -06:00
fn check(m: &HashMap<isize, ()>) {
for k in m.keys() {
assert!(m.contains_key(k),
"{} is in keys() but not in the map?", k);
}
}
let mut m = HashMap::new();
2015-04-10 13:39:53 -05:00
let mut rng = thread_rng();
// Populate the map with some items.
2015-02-04 20:17:19 -06:00
for _ in 0..50 {
let x = rng.gen_range(-10, 10);
m.insert(x, ());
}
2015-02-04 20:17:19 -06:00
for i in 0..1000 {
let x = rng.gen_range(-10, 10);
match m.entry(x) {
Vacant(_) => {},
Occupied(e) => {
println!("{}: remove {}", i, x);
e.remove();
},
}
check(&m);
}
}
2013-04-03 07:45:14 -05:00
}