Commit Graph

649 Commits

Author SHA1 Message Date
bors
2eebeb8137 auto merge of #12081 : cgaebel/rust/robinhood-hashing, r=alexcrichton
Partially addresses #11783.

Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

```
comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(Option)+sizeof(K)+sizeof(V))/0.75 + 48ish bytes

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 112ish bytes

Timing of building librustc:

compile_and_link: x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.457 s   parsing
time: 0.028 s   gated feature checking
time: 0.000 s   crate injection
time: 0.108 s   configuration 1
time: 1.049 s   expansion
time: 0.219 s   configuration 2
time: 0.222 s   maybe building test harness
time: 0.223 s   prelude injection
time: 0.268 s   assinging node ids and indexing ast
time: 0.075 s   external crate/lib resolution
time: 0.026 s   language item collection
time: 1.016 s   resolution
time: 0.038 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.030 s   looking for macro registrar
time: 0.061 s   freevar finding
time: 0.138 s   region resolution
time: 0.110 s   type collecting
time: 0.072 s   variance inference
time: 0.126 s   coherence checking
time: 9.110 s   type checking
time: 0.186 s   const marking
time: 0.049 s   const checking
time: 0.418 s   privacy checking
time: 0.057 s   effect checking
time: 0.033 s   loop checking
time: 1.293 s   compute moves
time: 0.182 s   match checking
time: 0.242 s   liveness checking
time: 0.866 s   borrow checking
time: 0.150 s   kind checking
time: 0.013 s   reachability checking
time: 0.175 s   death checking
time: 0.461 s   lint checking
time: 13.112 s  translation
  time: 4.352 s llvm function passes
  time: 96.702 s    llvm module passes
  time: 50.574 s    codegen passes
time: 154.611 s LLVM passes
  time: 2.821 s running linker
time: 15.750 s  linking


compile_and_link: x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.422 s   parsing
time: 0.031 s   gated feature checking
time: 0.000 s   crate injection
time: 0.126 s   configuration 1
time: 1.014 s   expansion
time: 0.251 s   configuration 2
time: 0.249 s   maybe building test harness
time: 0.273 s   prelude injection
time: 0.279 s   assinging node ids and indexing ast
time: 0.076 s   external crate/lib resolution
time: 0.033 s   language item collection
time: 1.028 s   resolution
time: 0.036 s   lifetime resolution
time: 0.000 s   looking for entry point
time: 0.029 s   looking for macro registrar
time: 0.063 s   freevar finding
time: 0.133 s   region resolution
time: 0.111 s   type collecting
time: 0.077 s   variance inference
time: 0.565 s   coherence checking
time: 8.953 s   type checking
time: 0.176 s   const marking
time: 0.050 s   const checking
time: 0.401 s   privacy checking
time: 0.063 s   effect checking
time: 0.032 s   loop checking
time: 1.291 s   compute moves
time: 0.172 s   match checking
time: 0.249 s   liveness checking
time: 0.831 s   borrow checking
time: 0.121 s   kind checking
time: 0.013 s   reachability checking
time: 0.179 s   death checking
time: 0.503 s   lint checking
time: 14.385 s  translation
  time: 4.495 s llvm function passes
  time: 92.234 s    llvm module passes
  time: 51.172 s    codegen passes
time: 150.809 s LLVM passes
  time: 2.542 s running linker
time: 15.109 s  linking
```

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.

Please review this very closely. This is my first major contribution to Rust. Sorry for the ugly diff!
2014-03-12 18:06:47 -07:00
Clark Gaebel
5bdbd21009 Performance-oriented hashtable.
Previously, rust's hashtable was totally unoptimized. It used an Option
per key-value pair, and used very naive open allocation.

The old hashtable had very high variance in lookup time. For an example,
see the 'find_nonexisting' benchmark below. This is fixed by keys in
'lucky' spots with a low probe sequence length getting their good spots
stolen by keys with long probe sequence lengths. This reduces hashtable
probe length variance, while maintaining the same mean.

Also, other optimization liberties were taken. Everything is as cache
aware as possible, and this hashtable should perform extremely well for
both large and small keys and values.

Benchmarks:

comprehensive_old_hashmap         378 ns/iter (+/- 8)
comprehensive_new_hashmap         206 ns/iter (+/- 4)
1.8x faster

old_hashmap_as_queue              238 ns/iter (+/- 8)
new_hashmap_as_queue              119 ns/iter (+/- 2)
2x faster

old_hashmap_insert                172 ns/iter (+/- 8)
new_hashmap_insert                146 ns/iter (+/- 11)
1.17x faster

old_hashmap_find_existing         50 ns/iter (+/- 12)
new_hashmap_find_existing         35 ns/iter (+/- 6)
1.43x faster

old_hashmap_find_notexisting      49 ns/iter (+/- 49)
new_hashmap_find_notexisting      34 ns/iter (+/- 4)
1.44x faster

Memory usage of old hashtable (64-bit assumed):

aligned(8+sizeof(K)+sizeof(V))/0.75 + 6 words

Memory usage of new hashtable:

(aligned(sizeof(K))
+ aligned(sizeof(V))
+ 8)/0.9 + 6.5 words

BUT accesses are much more cache friendly. In fact, if the probe
sequence length is below 8, only two cache lines worth of hashes will be
pulled into cache. This is unlike the old version which would have to
stride over the stoerd keys and values, and would be more cache
unfriendly the bigger the stored values got.

And did you notice the higher load factor? We can now reasonably get a
load factor of 0.9 with very good performance.
2014-03-12 18:30:11 -04:00
Erick Tryzelaar
9959188d0e Use generic impls for Hash 2014-03-12 13:39:47 -07:00
Huon Wilson
198caa87cd Update users for the std::rand -> librand move. 2014-03-12 11:31:43 +11:00
Daniel Micay
4d7d101a76 create a sensible comparison trait hierarchy
* `Ord` inherits from `Eq`
* `TotalOrd` inherits from `TotalEq`
* `TotalOrd` inherits from `Ord`
* `TotalEq` inherits from `Eq`

This is a partial implementation of #12517.
2014-03-07 22:45:22 -05:00
Alex Crichton
42389b7069 collections: Correct with_capacity_and_hasher
The arguments were accidentally swapped in the wrong order.

Closes #12743
2014-03-06 18:07:52 -08:00
Kang Seonghoon
1c52c81846 fix typos with with repeated words, just like this sentence. 2014-03-06 20:19:14 +09:00
Alex Crichton
02882fbd7e std: Change assert_eq!() to use {} instead of {:?}
Formatting via reflection has been a little questionable for some time now, and
it's a little unfortunate that one of the standard macros will silently use
reflection when you weren't expecting it. This adds small bits of code bloat to
libraries, as well as not always being necessary. In light of this information,
this commit switches assert_eq!() to using {} in the error message instead of
{:?}.

In updating existing code, there were a few error cases that I encountered:

* It's impossible to define Show for [T, ..N]. I think DST will alleviate this
  because we can define Show for [T].
* A few types here and there just needed a #[deriving(Show)]
* Type parameters needed a Show bound, I often moved this to `assert!(a == b)`
* `Path` doesn't implement `Show`, so assert_eq!() cannot be used on two paths.
  I don't think this is much of a regression though because {:?} on paths looks
  awful (it's a byte array).

Concretely speaking, this shaved 10K off a 656K binary. Not a lot, but sometime
significant for smaller binaries.
2014-02-28 23:01:54 -08:00
bors
31e9c947a3 auto merge of #12544 : erickt/rust/hash, r=acrichto
This PR allows `HashMap`s to work with custom hashers. Also with this patch are:

* a couple generic implementations of `Hash` for a variety of types.
* added `Default`, `Clone` impls to the hashers.
* added a `HashMap::with_hasher()` constructor.
2014-02-28 00:26:34 -08:00
Erick Tryzelaar
72b5e30f6c collections: allow HashMap to work with generic hashers 2014-02-27 19:02:52 -08:00
Bruno de Oliveira Abinader
4da6d041c2 Replaced list::push() with unshift() based on List<T> 2014-02-27 08:36:07 -04:00
Bruno de Oliveira Abinader
1c27c90119 Refactored list::append() to be based on List<T> 2014-02-27 08:36:06 -04:00
Bruno de Oliveira Abinader
65f1993230 Refactored list::tail() to be based on List<T> 2014-02-27 08:35:47 -04:00
Bruno de Oliveira Abinader
fed034c402 Refactored list::head() to be based on List<T> 2014-02-27 08:35:47 -04:00
Bruno de Oliveira Abinader
45fd63a8b7 Replaced list::has() with list::contains() 2014-02-27 08:35:47 -04:00
Bruno de Oliveira Abinader
223f309fc3 Removed deprecated list::{iter,each}() functions 2014-02-27 08:35:47 -04:00
Bruno de Oliveira Abinader
a14d72d49e Implemented list::is_empty() based on Container trait 2014-02-27 08:35:47 -04:00
Bruno de Oliveira Abinader
e589fcffcc Implemented list::len() based on Container trait 2014-02-27 08:35:46 -04:00
Bruno de Oliveira Abinader
197116d7ce Removed list::any() in favor of iter().any() 2014-02-27 08:35:46 -04:00
Bruno de Oliveira Abinader
d681907064 Removed list::find() in favor of iter().find() 2014-02-27 08:35:46 -04:00
Bruno de Oliveira Abinader
a09a4b882d Removed list::foldl() in favor of iter().fold() 2014-02-27 08:35:45 -04:00
Bruno de Oliveira Abinader
52524bfe88 Implemented Items<'a, T> for List<T> 2014-02-27 08:35:18 -04:00
Bruno de Oliveira Abinader
2b362768ff Modified list::from_vec() to return List<T> 2014-02-27 08:35:17 -04:00
Bruno de Oliveira Abinader
1e6151a770 Renamed variables 2014-02-27 08:35:16 -04:00
bors
25d68366b7 auto merge of #12522 : erickt/rust/hash, r=alexcrichton
This patch series does a couple things:

* replaces manual `Hash` implementations with `#[deriving(Hash)]`
* adds `Hash` back to `std::prelude`
* minor cleanup of whitespace and variable names.
2014-02-25 06:41:36 -08:00
Brendan Zabarauskas
3cc95314c3 Remove std::default::Default from the prelude 2014-02-24 21:22:26 -08:00
Erick Tryzelaar
f12ff1964b std: minor whitespace cleanup 2014-02-24 19:52:29 -08:00
Alex Crichton
6485917d7c Move extra::json to libserialize
This also inverts the dependency between libserialize and libcollections.

cc #8784
2014-02-24 09:51:39 -08:00
Alex Crichton
8761f79485 Remove deriving(ToStr)
This has been superseded by deriving(Show).

cc #9806
2014-02-24 00:15:17 -08:00
Alex Crichton
b78b749810 Remove all ToStr impls, add Show impls
This commit changes the ToStr trait to:

    impl<T: fmt::Show> ToStr for T {
        fn to_str(&self) -> ~str { format!("{}", *self) }
    }

The ToStr trait has been on the chopping block for quite awhile now, and this is
the final nail in its coffin. The trait and the corresponding method are not
being removed as part of this commit, but rather any implementations of the
`ToStr` trait are being forbidden because of the generic impl. The new way to
get the `to_str()` method to work is to implement `fmt::Show`.

Formatting into a `&mut Writer` (as `format!` does) is much more efficient than
`ToStr` when building up large strings. The `ToStr` trait forces many
intermediate allocations to be made while the `fmt::Show` trait allows
incremental buildup in the same heap allocated buffer. Additionally, the
`fmt::Show` trait is much more extensible in terms of interoperation with other
`Writer` instances and in more situations. By design the `ToStr` trait requires
at least one allocation whereas the `fmt::Show` trait does not require any
allocations.

Closes #8242
Closes #9806
2014-02-23 20:51:56 -08:00
Huon Wilson
efaf4db24c Transition to new Hash, removing IterBytes and std::to_bytes. 2014-02-24 07:44:10 +11:00
Alex Crichton
2a14e084cf Move std::{trie, hashmap} to libcollections
These two containers are indeed collections, so their place is in
libcollections, not in libstd. There will always be a hash map as part of the
standard distribution of Rust, but by moving it out of the standard library it
makes libstd that much more portable to more platforms and environments.

This conveniently also removes the stuttering of 'std::hashmap::HashMap',
although 'collections::HashMap' is only one character shorter.
2014-02-23 00:35:11 -08:00
bors
2fa7d6b44f auto merge of #12415 : HeroesGrave/rust/move-enum-set, r=alexcrichton
Part of #8784

Also removed the one glob import.
2014-02-21 05:26:58 -08:00
Liigo Zhuang
53b9d1a324 move extra::test to libtest 2014-02-20 16:03:58 +08:00
HeroesGrave
5bf8d3289f move enum_set to libcollections. #8784 2014-02-20 19:38:01 +13:00
Brendan Zabarauskas
cf0654c47c Improve naming of tuple getters, and add mutable tuple getter
Renames the `n*` and `n*_ref` tuple getters to `val*` and `ref*` respectively, and adds `mut*` getters.
2014-02-17 00:57:56 +11:00
bors
6b025c803c auto merge of #12272 : alexcrichton/rust/snapshot, r=kballard
This notably contains the `extern mod` => `extern crate` change.

Closes #9880
2014-02-15 14:06:26 -08:00
Corey Richardson
49e11630fa std: clean up ptr a bit 2014-02-15 12:11:41 -05:00
Alex Crichton
a41b0c2529 extern mod => extern crate
This was previously implemented, and it just needed a snapshot to go through
2014-02-14 22:55:21 -08:00
lpy
665555d58f return value/use extra::test::black_box in benchmarks 2014-02-14 07:45:34 -08:00
Michael Darakananda
bf1464c413 Removed num::Orderable 2014-02-13 20:12:59 -05:00
Nif Ward
184367093f Includes new add method that uses .clone() for support.
Added new tests for bsearch methods and changed "add" to "insert"

Fixed failure on div_floor.
2014-02-11 15:59:33 -05:00
bors
38ed4674e8 auto merge of #11956 : edwardw/rust/issue-7556, r=cmr
Closes #7556.

Also move ``std::util::Void`` to ``std::any::Void``. It makes more sense to me.
2014-02-10 14:56:47 -08:00
Edward Wang
e9ff91e9be Move replace and swap to std::mem. Get rid of std::util
Also move Void to std::any, move drop to std::mem and reexport in
prelude.
2014-02-11 05:21:35 +08:00
Bruno de Oliveira Abinader
cb1fad3b28 Implement List's any() function
This is needed for cases where we only need to know if a list item
matches the given predicate (eg. in Servo, we need to know if attributes
from different DOM elements are equal).
2014-02-10 08:36:48 -04:00
bors
d0affa5c8d auto merge of #12131 : brunoabinader/rust/list-find-doc-typo, r=alexcrichton
Replace ```v``` with ```ls```.
2014-02-09 18:46:28 -08:00
Bruno de Oliveira Abinader
66c036c293 Fixed a typo in list's find() documentation. 2014-02-09 12:45:54 -04:00
Brian Anderson
c7710cdf45 std: Add move_val_init to mem. Replace direct intrinsic usage 2014-02-09 00:17:41 -08:00
HeroesGrave
d81bb441da moved collections from libextra into libcollections 2014-02-07 19:49:26 +13:00