Some modest running-time improvements to `std::collections::BitSet` on bit-sets of varying set-membership densities. This is work originally from [here](https://github.com/rayglover/alt_collections). (Benchmarks copied below)
```
std::collections::BitSet / alt_collections::BitSet
copy_dense ... 3.08x
copy_sparse ... 4.22x
count_dense ... 11.01x
count_sparse ... 8.11x
from_bytes ... 1.47x
intersect_dense ... 6.54x
intersect_sparse ... 4.37x
union_dense ... 5.53x
union_sparse ... 5.60x
```
The exception is `from_bytes`, which I've left unaltered since the optimization is rather obscure.
Compiling with the cpu feature `popcnt` gave a further ~10% improvement on my machine, but this wasn't factored in to the benchmarks above.
Similar improvements could be made to `BitVec`, although that would probably require more substantial changes.
criticism welcome!
Using regular pointer arithmetic to iterate collections of zero-sized types
doesn't work, because we'd get the same pointer all the time. Our
current solution is to convert the pointer to an integer, add an offset
and then convert back, but this inhibits certain optimizations.
What we should do instead is to convert the pointer to one that points
to an i8\*, and then use a LLVM GEP instructions without the inbounds
flag to perform the pointer arithmetic. This allows to generate pointers
that point outside allocated objects without causing UB (as long as you
don't dereference them), and it wraps around using two's complement,
i.e. it behaves exactly like the wrapping_* operations we're currently
using, with the added benefit of LLVM being able to better optimize the
resulting IR.
Using regular pointer arithmetic to iterate collections of zero-sized types
doesn't work, because we'd get the same pointer all the time. Our
current solution is to convert the pointer to an integer, add an offset
and then convert back, but this inhibits certain optimizations.
What we should do instead is to convert the pointer to one that points
to an i8*, and then use a LLVM GEP instructions without the inbounds
flag to perform the pointer arithmetic. This allows to generate pointers
that point outside allocated objects without causing UB (as long as you
don't dereference them), and it wraps around using two's complement,
i.e. it behaves exactly like the wrapping_* operations we're currently
using, with the added benefit of LLVM being able to better optimize the
resulting IR.
The functions BitSet::{iter,union,symmetric_difference} each had docs that claimed u32s were output when their actual output each end up being usizes.
r? @steveklabnik
We don't need to copy any elements if `at` is behind the last element
in the map. The last element is at index `self.v.len() - 1`, so we
should not copy if `at` is greater **or equals** `self.v.len()`.
r? @Gankro
Several Minor API / Reference Documentation Fixes
- Fix a few small errors in the reference.
- Fix paper cuts in the API docs.
Fixes#24882Fixes#25233Fixes#25250
We don't need to copy any elements if `at` is behind the last element
in the map. The last element is at index `self.v.len() - 1`, so we
should not copy if `at` is greater or equals `self.v.len()`.
I was profiling my code again and this time AsRef<str> for String
was eating up a considerable chunk of my runtime; adding the inline
annotation made the program run almost twice as fast!
While I was at it I also added the annotation to other implementations
of AsRef as well as AsMut.
Ideally this trait implementation would be unstable, requiring crates to opt-in
if they would like the functionality, but that's not currently how stability
works so the implementation needs to be removed entirely.
This may come back at a future date, but for now the conservative option is to
remove it.
[breaking-change]
Ideally this trait implementation would be unstable, requiring crates to opt-in
if they would like the functionality, but that's not currently how stability
works so the implementation needs to be removed entirely.
This may come back at a future date, but for now the conservative option is to
remove it.
[breaking-change]
collections: Convert SliceConcatExt to use associated types
Coherence now allows this, we have `SliceConcatExt<T> for [V] where T: Sized + Clone` and` SliceConcatExt<str> for [S]`, these don't conflict because
str is never Sized.
According to rust-lang/rfcs#235, `VecDeque` should have this method (`VecDeque` was called `RingBuf` at the time) but it was never implemented.
I marked this stable since "1.0.0" because it's stable in `Vec`.
Coherence now allows this, we have SliceConcatExt<T> for [V] where T: Sized
+ Clone and SliceConcatExt<str> for [S], these don't conflict because
str is never Sized.
This commit does two things: it adds an example for indexing vectors, and it changes the \"Examples\" section to use full sentences.
This change was spurred by someone in the #rust IRC channel asking if there was a `.set()` method for changing the `i`-th value of a vector (they had missed that `Vec` implements `IndexMut`, which is easy to do if you're not aware of that trait).
This makes the `bit::vec::bench::bench_bit_vec_big_union` benchmark go
from `774 ns/iter (+/- 190)` to `602 ns/iter (+/- 5)`.
(There's room for more work here too: if one can guarantee 128-bit
alignment for the vector, the compiler actually optimises `union`,
`intersection` etc. to SIMD instructions, which end up being ~5x faster
that the original version, and 4x faster than the optimised version in
this patch.)
This makes the `bit::vec::bench::bench_bit_vec_big_union` benchmark go
from `774 ns/iter (+/- 190)` to `602 ns/iter (+/- 5)`.
(There's room for more work here too: if one can guarantee 128-bit
alignment for the vector, the compiler actually optimises `union`,
`intersection` etc. to SIMD instructions, which end up being ~5x faster
that the original version, and 4x faster than the optimised version in
this patch.)
collections: Implement String::drain(range) according to RFC 574
`.drain(range)` is unstable and under feature(collections_drain).
This adds a safe way to remove any range of a String as efficiently as
possible.
As noted in the code, this drain iterator has none of the memory safety
issues of the vector version.
RFC tracking issue is #23055
These implementations were intended to be unstable, but currently the stability
attributes cannot handle a stable trait with an unstable `impl` block. This
commit also audits the rest of the standard library for explicitly-`#[unstable]`
impl blocks. No others were removed but some annotations were changed to
`#[stable]` as they're defacto stable anyway.
One particularly interesting `impl` marked `#[stable]` as part of this commit
is the `Add<&[T]>` impl for `Vec<T>`, which uses `push_all` and implicitly
clones all elements of the vector provided.
Closes#24791
[breaking-change]