rust/src/libstd/collections/mod.rs

452 lines
19 KiB
Rust
Raw Normal View History

std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
// Copyright 2013-2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
2014-10-05 12:32:18 -05:00
//! Collection types.
//!
//! Rust's standard collection library provides efficient implementations of the
//! most common general purpose programming data structures. By using the
//! standard implementations, it should be possible for two libraries to
//! communicate without significant data conversion.
//!
//! To get this out of the way: you should probably just use `Vec` or `HashMap`.
//! These two collections cover most use cases for generic data storage and
//! processing. They are exceptionally good at doing what they do. All the other
//! collections in the standard library have specific use cases where they are
//! the optimal choice, but these cases are borderline *niche* in comparison.
//! Even when `Vec` and `HashMap` are technically suboptimal, they're probably a
//! good enough choice to get started.
2014-10-05 12:32:18 -05:00
//!
//! Rust's collections can be grouped into four major categories:
//!
//! * Sequences: `Vec`, `VecDeque`, `LinkedList`
//! * Maps: `HashMap`, `BTreeMap`
//! * Sets: `HashSet`, `BTreeSet`
//! * Misc: `BinaryHeap`
2014-10-05 12:32:18 -05:00
//!
//! # When Should You Use Which Collection?
//!
//! These are fairly high-level and quick break-downs of when each collection
//! should be considered. Detailed discussions of strengths and weaknesses of
//! individual collections can be found on their own documentation pages.
2014-10-05 12:32:18 -05:00
//!
//! ### Use a `Vec` when:
//! * You want to collect items up to be processed or sent elsewhere later, and
//! don't care about any properties of the actual values being stored.
//! * You want a sequence of elements in a particular order, and will only be
//! appending to (or near) the end.
2014-10-05 12:32:18 -05:00
//! * You want a stack.
//! * You want a resizable array.
//! * You want a heap-allocated array.
//!
//! ### Use a `VecDeque` when:
//! * You want a `Vec` that supports efficient insertion at both ends of the
//! sequence.
2014-10-05 12:32:18 -05:00
//! * You want a queue.
//! * You want a double-ended queue (deque).
//!
//! ### Use a `LinkedList` when:
//! * You want a `Vec` or `VecDeque` of unknown size, and can't tolerate
//! amortization.
//! * You want to efficiently split and append lists.
//! * You are *absolutely* certain you *really*, *truly*, want a doubly linked
//! list.
2014-10-05 12:32:18 -05:00
//!
//! ### Use a `HashMap` when:
//! * You want to associate arbitrary keys with an arbitrary value.
//! * You want a cache.
//! * You want a map, with no extra functionality.
//!
//! ### Use a `BTreeMap` when:
//! * You're interested in what the smallest or largest key-value pair is.
//! * You want to find the largest or smallest key that is smaller or larger
2015-06-17 08:09:18 -05:00
//! than something.
2014-10-05 12:32:18 -05:00
//! * You want to be able to get all of the entries in order on-demand.
//! * You want a sorted map.
//!
//! ### Use the `Set` variant of any of these `Map`s when:
//! * You just want to remember which keys you've seen.
//! * There is no meaningful value to associate with your keys.
//! * You just want a set.
//!
//! ### Use a `BinaryHeap` when:
//!
//! * You want to store a bunch of elements, but only ever want to process the
//! "biggest" or "most important" one at any given time.
2014-10-05 12:32:18 -05:00
//! * You want a priority queue.
//!
//! # Performance
//!
//! Choosing the right collection for the job requires an understanding of what
//! each collection is good at. Here we briefly summarize the performance of
//! different collections for certain important operations. For further details,
//! see each type's documentation, and note that the names of actual methods may
//! differ from the tables below on certain collections.
//!
//! Throughout the documentation, we will follow a few conventions. For all
//! operations, the collection's size is denoted by n. If another collection is
//! involved in the operation, it contains m elements. Operations which have an
//! *amortized* cost are suffixed with a `*`. Operations with an *expected*
//! cost are suffixed with a `~`.
//!
//! All amortized costs are for the potential need to resize when capacity is
//! exhausted. If a resize occurs it will take O(n) time. Our collections never
//! automatically shrink, so removal operations aren't amortized. Over a
//! sufficiently large series of operations, the average cost per operation will
//! deterministically equal the given cost.
//!
//! Only HashMap has expected costs, due to the probabilistic nature of hashing.
//! It is theoretically possible, though very unlikely, for HashMap to
//! experience worse performance.
//!
//! ## Sequences
//!
//! | | get(i) | insert(i) | remove(i) | append | split_off(i) |
//! |--------------|----------------|-----------------|----------------|--------|----------------|
//! | Vec | O(1) | O(n-i)* | O(n-i) | O(m)* | O(n-i) |
//! | VecDeque | O(1) | O(min(i, n-i))* | O(min(i, n-i)) | O(m)* | O(min(i, n-i)) |
//! | LinkedList | O(min(i, n-i)) | O(min(i, n-i)) | O(min(i, n-i)) | O(1) | O(min(i, n-i)) |
//!
//! Note that where ties occur, Vec is generally going to be faster than VecDeque, and VecDeque
//! is generally going to be faster than LinkedList.
//!
//! ## Maps
//!
//! For Sets, all operations have the cost of the equivalent Map operation.
//!
//! | | get | insert | remove | predecessor |
//! |----------|-----------|----------|----------|-------------|
//! | HashMap | O(1)~ | O(1)~* | O(1)~ | N/A |
//! | BTreeMap | O(log n) | O(log n) | O(log n) | O(log n) |
//!
//! Note that BTreeMap's precise performance depends on the value of B.
//!
2014-10-05 12:32:18 -05:00
//! # Correct and Efficient Usage of Collections
//!
//! Of course, knowing which collection is the right one for the job doesn't
//! instantly permit you to use it correctly. Here are some quick tips for
//! efficient and correct usage of the standard collections in general. If
//! you're interested in how to use a specific collection in particular, consult
//! its documentation for detailed discussion and code examples.
2014-10-05 12:32:18 -05:00
//!
//! ## Capacity Management
//!
//! Many collections provide several constructors and methods that refer to
//! "capacity". These collections are generally built on top of an array.
//! Optimally, this array would be exactly the right size to fit only the
//! elements stored in the collection, but for the collection to do this would
//! be very inefficient. If the backing array was exactly the right size at all
//! times, then every time an element is inserted, the collection would have to
//! grow the array to fit it. Due to the way memory is allocated and managed on
//! most computers, this would almost surely require allocating an entirely new
//! array and copying every single element from the old one into the new one.
//! Hopefully you can see that this wouldn't be very efficient to do on every
//! operation.
//!
//! Most collections therefore use an *amortized* allocation strategy. They
//! generally let themselves have a fair amount of unoccupied space so that they
//! only have to grow on occasion. When they do grow, they allocate a
//! substantially larger array to move the elements into so that it will take a
//! while for another grow to be required. While this strategy is great in
//! general, it would be even better if the collection *never* had to resize its
//! backing array. Unfortunately, the collection itself doesn't have enough
//! information to do this itself. Therefore, it is up to us programmers to give
//! it hints.
//!
//! Any `with_capacity` constructor will instruct the collection to allocate
//! enough space for the specified number of elements. Ideally this will be for
//! exactly that many elements, but some implementation details may prevent
//! this. `Vec` and `VecDeque` can be relied on to allocate exactly the
//! requested amount, though. Use `with_capacity` when you know exactly how many
//! elements will be inserted, or at least have a reasonable upper-bound on that
//! number.
//!
//! When anticipating a large influx of elements, the `reserve` family of
//! methods can be used to hint to the collection how much room it should make
//! for the coming items. As with `with_capacity`, the precise behavior of
//! these methods will be specific to the collection of interest.
//!
//! For optimal performance, collections will generally avoid shrinking
//! themselves. If you believe that a collection will not soon contain any more
//! elements, or just really need the memory, the `shrink_to_fit` method prompts
//! the collection to shrink the backing array to the minimum size capable of
//! holding its elements.
//!
//! Finally, if ever you're interested in what the actual capacity of the
//! collection is, most collections provide a `capacity` method to query this
//! information on demand. This can be useful for debugging purposes, or for
//! use with the `reserve` methods.
2014-10-05 12:32:18 -05:00
//!
//! ## Iterators
//!
//! Iterators are a powerful and robust mechanism used throughout Rust's
//! standard libraries. Iterators provide a sequence of values in a generic,
//! safe, efficient and convenient way. The contents of an iterator are usually
//! *lazily* evaluated, so that only the values that are actually needed are
//! ever actually produced, and no allocation need be done to temporarily store
//! them. Iterators are primarily consumed using a `for` loop, although many
//! functions also take iterators where a collection or sequence of values is
//! desired.
//!
//! All of the standard collections provide several iterators for performing
//! bulk manipulation of their contents. The three primary iterators almost
//! every collection should provide are `iter`, `iter_mut`, and `into_iter`.
//! Some of these are not provided on collections where it would be unsound or
//! unreasonable to provide them.
2014-10-05 12:32:18 -05:00
//!
//! `iter` provides an iterator of immutable references to all the contents of a
//! collection in the most "natural" order. For sequence collections like `Vec`,
//! this means the items will be yielded in increasing order of index starting
//! at 0. For ordered collections like `BTreeMap`, this means that the items
//! will be yielded in sorted order. For unordered collections like `HashMap`,
//! the items will be yielded in whatever order the internal representation made
//! most convenient. This is great for reading through all the contents of the
//! collection.
2014-10-05 12:32:18 -05:00
//!
//! ```
//! let vec = vec![1, 2, 3, 4];
2014-10-05 12:32:18 -05:00
//! for x in vec.iter() {
//! println!("vec contained {}", x);
//! }
//! ```
//!
//! `iter_mut` provides an iterator of *mutable* references in the same order as
//! `iter`. This is great for mutating all the contents of the collection.
2014-10-05 12:32:18 -05:00
//!
//! ```
//! let mut vec = vec![1, 2, 3, 4];
2014-10-05 12:32:18 -05:00
//! for x in vec.iter_mut() {
//! *x += 1;
//! }
//! ```
//!
//! `into_iter` transforms the actual collection into an iterator over its
//! contents by-value. This is great when the collection itself is no longer
//! needed, and the values are needed elsewhere. Using `extend` with `into_iter`
//! is the main way that contents of one collection are moved into another.
//! `extend` automatically calls `into_iter`, and takes any `T: IntoIterator`.
//! Calling `collect` on an iterator itself is also a great way to convert one
//! collection into another. Both of these methods should internally use the
//! capacity management tools discussed in the previous section to do this as
//! efficiently as possible.
2014-10-05 12:32:18 -05:00
//!
//! ```
//! let mut vec1 = vec![1, 2, 3, 4];
//! let vec2 = vec![10, 20, 30, 40];
//! vec1.extend(vec2);
2014-10-05 12:32:18 -05:00
//! ```
//!
//! ```
//! use std::collections::VecDeque;
2014-10-05 12:32:18 -05:00
//!
//! let vec = vec![1, 2, 3, 4];
//! let buf: VecDeque<_> = vec.into_iter().collect();
2014-10-05 12:32:18 -05:00
//! ```
//!
//! Iterators also provide a series of *adapter* methods for performing common
//! threads to sequences. Among the adapters are functional favorites like `map`,
//! `fold`, `skip`, and `take`. Of particular interest to collections is the
//! `rev` adapter, that reverses any iterator that supports this operation. Most
//! collections provide reversible iterators as the way to iterate over them in
//! reverse order.
2014-10-05 12:32:18 -05:00
//!
//! ```
//! let vec = vec![1, 2, 3, 4];
2014-10-05 12:32:18 -05:00
//! for x in vec.iter().rev() {
//! println!("vec contained {}", x);
//! }
//! ```
//!
//! Several other collection methods also return iterators to yield a sequence
//! of results but avoid allocating an entire collection to store the result in.
//! This provides maximum flexibility as `collect` or `extend` can be called to
//! "pipe" the sequence into any collection if desired. Otherwise, the sequence
//! can be looped over with a `for` loop. The iterator can also be discarded
//! after partial use, preventing the computation of the unused items.
2014-10-05 12:32:18 -05:00
//!
//! ## Entries
//!
//! The `entry` API is intended to provide an efficient mechanism for
//! manipulating the contents of a map conditionally on the presence of a key or
//! not. The primary motivating use case for this is to provide efficient
//! accumulator maps. For instance, if one wishes to maintain a count of the
//! number of times each key has been seen, they will have to perform some
//! conditional logic on whether this is the first time the key has been seen or
//! not. Normally, this would require a `find` followed by an `insert`,
//! effectively duplicating the search effort on each insertion.
//!
//! When a user calls `map.entry(&key)`, the map will search for the key and
//! then yield a variant of the `Entry` enum.
//!
//! If a `Vacant(entry)` is yielded, then the key *was not* found. In this case
//! the only valid operation is to `insert` a value into the entry. When this is
//! done, the vacant entry is consumed and converted into a mutable reference to
2015-06-17 08:09:18 -05:00
//! the value that was inserted. This allows for further manipulation of the
//! value beyond the lifetime of the search itself. This is useful if complex
//! logic needs to be performed on the value regardless of whether the value was
//! just inserted.
//!
//! If an `Occupied(entry)` is yielded, then the key *was* found. In this case,
//! the user has several options: they can `get`, `insert`, or `remove` the
//! value of the occupied entry. Additionally, they can convert the occupied
//! entry into a mutable reference to its value, providing symmetry to the
//! vacant `insert` case.
2014-10-05 12:32:18 -05:00
//!
//! ### Examples
//!
//! Here are the two primary ways in which `entry` is used. First, a simple
//! example where the logic performed on the values is trivial.
2014-10-05 12:32:18 -05:00
//!
//! #### Counting the number of times each character in a string occurs
//!
//! ```
//! use std::collections::btree_map::BTreeMap;
2014-10-05 12:32:18 -05:00
//!
//! let mut count = BTreeMap::new();
//! let message = "she sells sea shells by the sea shore";
//!
//! for c in message.chars() {
2015-03-20 12:43:01 -05:00
//! *count.entry(c).or_insert(0) += 1;
2014-10-05 12:32:18 -05:00
//! }
//!
//! assert_eq!(count.get(&'s'), Some(&8));
2014-10-05 12:32:18 -05:00
//!
//! println!("Number of occurrences of each character");
//! for (char, count) in &count {
2014-10-05 12:32:18 -05:00
//! println!("{}: {}", char, count);
//! }
//! ```
//!
//! When the logic to be performed on the value is more complex, we may simply
//! use the `entry` API to ensure that the value is initialized, and perform the
//! logic afterwards.
2014-10-05 12:32:18 -05:00
//!
//! #### Tracking the inebriation of customers at a bar
//!
//! ```
//! use std::collections::btree_map::BTreeMap;
2014-10-05 12:32:18 -05:00
//!
//! // A client of the bar. They have a blood alcohol level.
//! struct Person { blood_alcohol: f32 }
2014-10-05 12:32:18 -05:00
//!
//! // All the orders made to the bar, by client id.
//! let orders = vec![1,2,1,2,3,4,1,2,2,3,4,1,1,1];
//!
//! // Our clients.
//! let mut blood_alcohol = BTreeMap::new();
//!
//! for id in orders {
2014-10-05 12:32:18 -05:00
//! // If this is the first time we've seen this customer, initialize them
//! // with no blood alcohol. Otherwise, just retrieve them.
//! let person = blood_alcohol.entry(id).or_insert(Person { blood_alcohol: 0.0 });
2014-10-05 12:32:18 -05:00
//!
//! // Reduce their blood alcohol level. It takes time to order and drink a beer!
//! person.blood_alcohol *= 0.9;
//!
//! // Check if they're sober enough to have another beer.
//! if person.blood_alcohol > 0.3 {
//! // Too drunk... for now.
//! println!("Sorry {}, I have to cut you off", id);
2014-10-05 12:32:18 -05:00
//! } else {
//! // Have another!
//! person.blood_alcohol += 0.1;
//! }
//! }
//! ```
//!
//! # Insert and complex keys
//!
//! If we have a more complex key, calls to `insert()` will
//! not update the value of the key. For example:
//!
//! ```
//! use std::cmp::Ordering;
//! use std::collections::BTreeMap;
//! use std::hash::{Hash, Hasher};
//!
//! #[derive(Debug)]
//! struct Foo {
//! a: u32,
//! b: &'static str,
//! }
//!
//! // we will compare `Foo`s by their `a` value only.
//! impl PartialEq for Foo {
//! fn eq(&self, other: &Self) -> bool { self.a == other.a }
//! }
//!
//! impl Eq for Foo {}
//!
//! // we will hash `Foo`s by their `a` value only.
//! impl Hash for Foo {
//! fn hash<H: Hasher>(&self, h: &mut H) { self.a.hash(h); }
//! }
//!
//! impl PartialOrd for Foo {
//! fn partial_cmp(&self, other: &Self) -> Option<Ordering> { self.a.partial_cmp(&other.a) }
//! }
//!
//! impl Ord for Foo {
//! fn cmp(&self, other: &Self) -> Ordering { self.a.cmp(&other.a) }
//! }
//!
//! let mut map = BTreeMap::new();
//! map.insert(Foo { a: 1, b: "baz" }, ());
//!
//! // We already have a Foo with an a of 1, so this will be updating the value.
//! map.insert(Foo { a: 1, b: "xyz" }, ());
//!
//! // ... but the key hasn't changed. b is still "baz", not "xyz"
//! assert_eq!(map.keys().next().unwrap().b, "baz");
//! ```
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
2015-01-23 23:48:20 -06:00
#![stable(feature = "rust1", since = "1.0.0")]
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use core_collections::Bound;
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use core_collections::{BinaryHeap, BTreeMap, BTreeSet};
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use core_collections::{LinkedList, VecDeque};
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use core_collections::{binary_heap, btree_map, btree_set};
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use core_collections::{linked_list, vec_deque};
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use self::hash_map::HashMap;
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use self::hash_set::HashSet;
std: Recreate a `collections` module As with the previous commit with `librand`, this commit shuffles around some `collections` code. The new state of the world is similar to that of librand: * The libcollections crate now only depends on libcore and liballoc. * The standard library has a new module, `std::collections`. All functionality of libcollections is reexported through this module. I would like to stress that this change is purely cosmetic. There are very few alterations to these primitives. There are a number of notable points about the new organization: * std::{str, slice, string, vec} all moved to libcollections. There is no reason that these primitives shouldn't be necessarily usable in a freestanding context that has allocation. These are all reexported in their usual places in the standard library. * The `hashmap`, and transitively the `lru_cache`, modules no longer reside in `libcollections`, but rather in libstd. The reason for this is because the `HashMap::new` contructor requires access to the OSRng for initially seeding the hash map. Beyond this requirement, there is no reason that the hashmap could not move to libcollections. I do, however, have a plan to move the hash map to the collections module. The `HashMap::new` function could be altered to require that the `H` hasher parameter ascribe to the `Default` trait, allowing the entire `hashmap` module to live in libcollections. The key idea would be that the default hasher would be different in libstd. Something along the lines of: // src/libstd/collections/mod.rs pub type HashMap<K, V, H = RandomizedSipHasher> = core_collections::HashMap<K, V, H>; This is not possible today because you cannot invoke static methods through type aliases. If we modified the compiler, however, to allow invocation of static methods through type aliases, then this type definition would essentially be switching the default hasher from `SipHasher` in libcollections to a libstd-defined `RandomizedSipHasher` type. This type's `Default` implementation would randomly seed the `SipHasher` instance, and otherwise perform the same as `SipHasher`. This future state doesn't seem incredibly far off, but until that time comes, the hashmap module will live in libstd to not compromise on functionality. * In preparation for the hashmap moving to libcollections, the `hash` module has moved from libstd to libcollections. A previously snapshotted commit enables a distinct `Writer` trait to live in the `hash` module which `Hash` implementations are now parameterized over. Due to using a custom trait, the `SipHasher` implementation has lost its specialized methods for writing integers. These can be re-added backwards-compatibly in the future via default methods if necessary, but the FNV hashing should satisfy much of the need for speedier hashing. A list of breaking changes: * HashMap::{get, get_mut} no longer fails with the key formatted into the error message with `{:?}`, instead, a generic message is printed. With backtraces, it should still be not-too-hard to track down errors. * The HashMap, HashSet, and LruCache types are now available through std::collections instead of the collections crate. * Manual implementations of hash should be parameterized over `hash::Writer` instead of just `Writer`. [breaking-change]
2014-05-29 20:50:12 -05:00
mod hash;
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub mod hash_map {
//! A hashmap
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use super::hash::map::*;
}
2015-01-23 23:48:20 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub mod hash_set {
//! A hashset
2015-11-16 10:54:28 -06:00
#[stable(feature = "rust1", since = "1.0.0")]
pub use super::hash::set::*;
}
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
/// Experimental support for providing custom hash algorithms to a HashMap and
/// HashSet.
#[unstable(feature = "hashmap_hasher", reason = "module was recently added",
issue = "27713")]
#[rustc_deprecated(since = "1.7.0", reason = "support moved to std::hash")]
#[allow(deprecated)]
std: Stabilize the std::hash module This commit aims to prepare the `std::hash` module for alpha by formalizing its current interface whileholding off on adding `#[stable]` to the new APIs. The current usage with the `HashMap` and `HashSet` types is also reconciled by separating out composable parts of the design. The primary goal of this slight redesign is to separate the concepts of a hasher's state from a hashing algorithm itself. The primary change of this commit is to separate the `Hasher` trait into a `Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was actually just a factory for various states, but hashing had very little control over how these states were used. Additionally the old `Hasher` trait was actually fairly unrelated to hashing. This commit redesigns the existing `Hasher` trait to match what the notion of a `Hasher` normally implies with the following definition: trait Hasher { type Output; fn reset(&mut self); fn finish(&self) -> Output; } This `Hasher` trait emphasizes that hashing algorithms may produce outputs other than a `u64`, so the output type is made generic. Other than that, however, very little is assumed about a particular hasher. It is left up to implementors to provide specific methods or trait implementations to feed data into a hasher. The corresponding `Hash` trait becomes: trait Hash<H: Hasher> { fn hash(&self, &mut H); } The old default of `SipState` was removed from this trait as it's not something that we're willing to stabilize until the end of time, but the type parameter is always required to implement `Hasher`. Note that the type parameter `H` remains on the trait to enable multidispatch for specialization of hashing for particular hashers. Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is simply used as part `derive` and the implementations for all primitive types. With these definitions, the old `Hasher` trait is realized as a new `HashState` trait in the `collections::hash_state` module as an unstable addition for now. The current definition looks like: trait HashState { type Hasher: Hasher; fn hasher(&self) -> Hasher; } The purpose of this trait is to emphasize that the one piece of functionality for implementors is that new instances of `Hasher` can be created. This conceptually represents the two keys from which more instances of a `SipHasher` can be created, and a `HashState` is what's stored in a `HashMap`, not a `Hasher`. Implementors of custom hash algorithms should implement the `Hasher` trait, and only hash algorithms intended for use in hash maps need to implement or worry about the `HashState` trait. The entire module and `HashState` infrastructure remains `#[unstable]` due to it being recently redesigned, but some other stability decision made for the `std::hash` module are: * The `Writer` trait remains `#[experimental]` as it's intended to be replaced with an `io::Writer` (more details soon). * The top-level `hash` function is `#[unstable]` as it is intended to be generic over the hashing algorithm instead of hardwired to `SipHasher` * The inner `sip` module is now private as its one export, `SipHasher` is reexported in the `hash` module. And finally, a few changes were made to the default parameters on `HashMap`. * The `RandomSipHasher` default type parameter was renamed to `RandomState`. This renaming emphasizes that it is not a hasher, but rather just state to generate hashers. It also moves away from the name "sip" as it may not always be implemented as `SipHasher`. This type lives in the `std::collections::hash_map` module as `#[unstable]` * The associated `Hasher` type of `RandomState` is creatively called... `Hasher`! This concrete structure lives next to `RandomState` as an implemenation of the "default hashing algorithm" used for a `HashMap`. Under the hood this is currently implemented as `SipHasher`, but it draws an explicit interface for now and allows us to modify the implementation over time if necessary. There are many breaking changes outlined above, and as a result this commit is a: [breaking-change]
2014-12-09 14:37:23 -06:00
pub mod hash_state {
pub use super::hash::state::*;
}